TechHui

Hawaiʻi's Technology Community

AppEngine Datastore Overview (based on Java SDK)



In this post, I'd like to share some of my development experience with AppEngine's data storage facility called BigTable. As you might already know, BigTable is different from a widely known relational persistence (RDBMS) in a way that it deals with Entities and Keys as opposed to tables and rows. It might not seem too obvious at first, but object-oriented persistence is radically different from its relational counterpart. Here I would like to point out some of more notable features that I encountered during my 6 months development exercise.


High-level API (provided by DataNucleus)

AppEngine supports two high-level mechanisms for persistence, JDO and
JPA. Both mechanisms facilitate object manipulation and provide storage mappings. For example, here is a way to save an object using JDO:


DBAccount dba = new DBAccount("user")
PersistenceManager pm = PMFactory.getPersistenceManager()
try { pm.makePersistent(dba) }
finally { pm.close() }

This approach also supports object graphs. When persisting an object graph, it is recommended to do so within a transaction, which will make sure whole object graph is persisted atomically. Without a transaction, you might end up having orphan objects in your datastore. Here is an example of a transactional write using object nesting:


DBAccount dba = new DBAccount("user")
dba.addProject(new DBProject("My Project"))
Transaction tx = pm.currentTransaction()
try {
tx.begin()
pm.makePersistent(dba)
tx.commit()
} catch(...) { if(tx.isAlive()) tx.rollback() }
finally { pm.close() }

For more in-depth overview, check out DataNucleus website. Also, if you are curious about other forms of SaaS Persistence and how your application data can be ported to other platforms, take a look at DataNucleus product
diagram.


Low-level API (Provided by AppEngine Datastore)

AppEngine also provides a low-level API to manipulate data. It is mostly based on the concept of DatastoreService,
Entity and
Key. Here is a brief JavaDoc excerpt:

To tie all these together, here is a simple way to fetch an Entity:

DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
Key key = KeyFactory.createKey(DBAccount.class.getSimpleName(), customerEmail)
Entity entity = ds.get(key);
entity.getProperty("name");

Notice how we get a handle to DatastoreService, create
Key using simple class name and customer email, and use this to fetch an
Entity object.

Keys and Entities

As demonstrated in above example, Low-level API provides a Key generation facility. Understanding how to properly use Keys and Entities is fundamental to operating datastore correctly.

Keys mostly require three things:
  • Ancestor Key
  • Entity Kind
  • Key Name
Ancestor Key is only used by child Entities.
Entity Kind is a unique Entity identifier, logically related to simple Java class name.
Key Name is the last bit of information used to identify a particular entity. For top-level Entities, Key Name must be unique for all Entities of a given Kind. For child Entities,
Key Name in combination with
Ancestor Key must be unique for all Entities of a given Kind.


To demonstrate, here is how to create Keys both top-level and child Entities:

Key accountKey = KeyFactory.createKey(DBAccount.class.getSimpleName(), customerEmail);
Key projectKey = KeyFactory.createKey(accountKey, DBProject.class.getSimpleName(), projectId)

Notice how accountKey uses only 2 pieces of information. (It is assumed that
customerEmail is unique across all accounts.) Also notice how
projectKey is using
accountKey, simple class name, and
projectId. Given
accountKey and
projectId it should be logically possible to find single Entity of DBProject Kind.


Key Builder

When dealing with multiple level object trees, Key creation could get quite laborious. There is a KeyBuilder utility class to assist the process. Here is what using a KeyBuilder looks like:


KeyFactory.Builder kb = new KeyFactory.Builder(DBAccount.class.getSimpleName(), customerEmail);
kb.addChild(DBProject.class.getSimpleName(), projectId);
Key projectKey = kb.getKey()

Printing projectKey should give output somewhat like this:


DBAccount('customer@email.com')/DBProject('someUniqueProjectId')

which gives a bit of an insight into how keys are actually represented :)

1000 get, 500 put/delete limit

One of the most notable rules of the AppEngine is that there is a limit on number of entities manipulated by a Datastore call. I believe 'get' is set to 1000 and 'put/delete' to 500. Here is how one can delete 500 or less Entities using low-level API:

DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
Query query = new Query(DBProject.class.getSimpleName());
query.setKeysOnly();
Set keys = new HashSet();
for (Entity entity : ds.prepare(query).asList(FetchOptions.Builder.withLimit(500))) {
keys.add(entity.getKey());
}
ds.delete(keys);

Notice query.setKeysOnly(), which speeds up results by fetching keys only. Also notice that
ds.delete(keys) takes a Collection, which is a faster way to delete data. (Here is a
link to clever example how to get around 1000/500 limit)


Annotated Data Models

DataNucleus provides API to 'annotate' classes so that they can be 'enhanced' with datastore related code. An enhancer tool comes with AppEngine Eclipse plugin and is executed after classes are compiled into bytecode. Here is what a possible declaration looks like using
Groovy and javax.jdo.annotations.*:


@PersistenceCapable(identityType = IdentityType.APPLICATION)
class DBAccount implements Serializable {
@PrimaryKey
Key key;

@Persistent
String email;

@Persistent
Integer age;

@Persistent
Set[DBProject] projects = new HashSet[DBProject]()
}

@PersistenceCapable(identityType = IdentityType.APPLICATION)
class DBProject implements Serializable {
@PrimaryKey
Key key;

@Persistent
String projectId;

@Persistent
Date createdOn;

//some more data here..
}

I like using Groovy because it provides all getters and setters automatically, which facilitates a clean data model design :)

Object Access by Key

A particular DBProject class can be accessed in a number of ways. First, you might want to retrieve DBAccount along with all of its projects:

DBAccount dba = pm.getObjectById(DBAccount.class, customerEmail)
dba.getProjects()

Note that this particular example uses lazy fetch, so second line 'touches' projects in order for them to be fetched. There is a way to annotate a class to specify a fetch mechanism explicitly.
Another way would be to fetch a desired DBProject instance directly, like this

Key projectKey = KeyFactory.createKey(accountKey, "DBProject", projectId)
DBProject dbp = pm.getObjectById(DBProject.class, projectKey)

Note, that if you just want to change some attributes of this object and persist that, you can simply do the following:

DBProject dbp = new DBProject("some new data here")
dbp.setKey(projectKey)
pm.makePersistent(dbp)

If DBProject already exists with the given key, your object will be updated with new data. Otherwise new instance is created.

Object Access by Query

In order to get a list of objects that satisfy a given requirement, one can issue a Query. Both DataNucleus and low-level API provide a way to query objects. DataNucleus supports a variety of mechanisms, including
JDOQL, SQL, and JPQL.

Here is an example:

Query q = pm.newQuery(DBProject.class, "createdOn > monthAgo");
q.declareParameters("java.util.Date monthAgo");
List list = (List) q.execute(monthAgo);

This should return a list of all projects that were created less than a month ago.

Briefly about Indexes

Datastore also provides indexing capabilities. Some indexes are provided automatically while others must be explicitly
configured.
Your query will not return any results unless properties in question are indexed one way or the other.


Here is an example of an index:

Note, that autoGenerate specifies whether manual index should overwrite an automatically generated index configuration. The
ancestor attribute is set to
true if the index supports a query that filters Entities by the Entity group parent,
false otherwise.


Misc Tips

There is a lot more detail to using AppEngine's Datastore to manage data, most of which can be found here. It's also quite useful to have DataNucleus
documentation handy, which provides even more fine-grained detail.


One thing to remember is that AppEngine implements only a subset of DataNucleus API. It is important to go over
unsupported features before making big architectural decisions.


Summary

Overall, AppEngine's datastore is a fun and simple way to persist data, however it takes a bit of getting used to. Certain concepts and limitations might seem alien at first, until one acquires a deeper understanding of underlying technologies and best practices, which takes a bit of time and effort :)

Thanks for reading and Aloha!

Konstantin
Ikayzo - Design • Build • Localize | Web • Desktop • Mobile


Resources:
(Key image courtesy of Freeiconsweb)

Views: 1699

Comment

You need to be a member of TechHui to add comments!

Join TechHui

Comment by Russell Castagnaro on February 5, 2010 at 5:26pm
Thanks! Nice intro

Sponsors

web design, web development, localization

© 2024   Created by Daniel Leuck.   Powered by

Badges  |  Report an Issue  |  Terms of Service