Hi, it's not public yet, but the purpose of the application is the exact comparison of large corpora.
Cheers, Donovan. On 18 December 2010 13:56, supercobra <[email protected]> wrote: > Just curious, what is this huge database about? Can we check it online? > > -- [email protected] > http://supercobrablogger.blogspot.com/ > > > > On Sat, Dec 18, 2010 at 7:20 AM, Donovan Hide <[email protected]> wrote: >> Hi, >> I have a custom index of a large amount of content that works by creating a >> 32 bit hash for sections of text. Each document id is stored against this >> hash and lookups involve hashing the input and retrieving the matching ids. >> Currently I use node.js to serve the index and hadoop to generate it. >> However this is an expensive operation in terms of processing and requires >> an SSD drive for decent serving performance. The scale of the index is as >> follows: >> Up to 4.5 billions keys >> An average of 8 document ids per key, delta-encoded and then variable >> integer encoded. >> Lookups on average involve retrieving values for 3500 keys >> Having read the datastore docs it seems like this could be a possible >> schema: >> from google.appengine.ext import db >> class Index(db.Model): >> hash=db.IntegerProperty(required=True) >> values=db.BlobProperty(required=True) >> I would be grateful if anyone could give me some advice or tips on how this >> might perform on AppEngine in terms of query performance, cost and >> minimizing metadata/index overhead. It sounds like 4.5 billion*metadata >> storage could be the killer. >> Cheers, >> Donovan >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Google App Engine" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/google-appengine?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
