Just curious, what is this huge database about? Can we check it online? -- [email protected] http://supercobrablogger.blogspot.com/
On Sat, Dec 18, 2010 at 7:20 AM, Donovan Hide <[email protected]> wrote: > Hi, > I have a custom index of a large amount of content that works by creating a > 32 bit hash for sections of text. Each document id is stored against this > hash and lookups involve hashing the input and retrieving the matching ids. > Currently I use node.js to serve the index and hadoop to generate it. > However this is an expensive operation in terms of processing and requires > an SSD drive for decent serving performance. The scale of the index is as > follows: > Up to 4.5 billions keys > An average of 8 document ids per key, delta-encoded and then variable > integer encoded. > Lookups on average involve retrieving values for 3500 keys > Having read the datastore docs it seems like this could be a possible > schema: > from google.appengine.ext import db > class Index(db.Model): > hash=db.IntegerProperty(required=True) > values=db.BlobProperty(required=True) > I would be grateful if anyone could give me some advice or tips on how this > might perform on AppEngine in terms of query performance, cost and > minimizing metadata/index overhead. It sounds like 4.5 billion*metadata > storage could be the killer. > Cheers, > Donovan > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
