Hi,
I have a custom index of a large amount of content that works by creating a 
32 bit hash for sections of text. Each document id is stored against this 
hash and lookups involve hashing the input and retrieving the matching ids. 
Currently I use node.js to serve the index and hadoop to generate it. 
However this is an expensive operation in terms of processing and requires 
an SSD drive for decent serving performance. The scale of the index is as 
follows:

Up to 4.5 billions keys
An average of 8 document ids per key, delta-encoded and then variable 
integer encoded.
Lookups on average involve retrieving values for 3500 keys

Having read the datastore docs it seems like this could be a possible 
schema:

from google.appengine.ext import db

class Index(db.Model):
    hash=db.IntegerProperty(required=True)
    values=db.BlobProperty(required=True) 

I would be grateful if anyone could give me some advice or tips on how this 
might perform on AppEngine in terms of query performance, cost and 
minimizing metadata/index overhead. It sounds like 4.5 billion*metadata 
storage could be the killer.

Cheers,
Donovan


-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to