Hi,

it's not public yet, but the purpose of the application is the exact
comparison of large corpora.

Cheers,
Donovan.

On 18 December 2010 13:56, supercobra <[email protected]> wrote:
> Just curious, what is this huge database about? Can we check it online?
>
> -- [email protected]
> http://supercobrablogger.blogspot.com/
>
>
>
> On Sat, Dec 18, 2010 at 7:20 AM, Donovan Hide <[email protected]> wrote:
>> Hi,
>> I have a custom index of a large amount of content that works by creating a
>> 32 bit hash for sections of text. Each document id is stored against this
>> hash and lookups involve hashing the input and retrieving the matching ids.
>> Currently I use node.js to serve the index and hadoop to generate it.
>> However this is an expensive operation in terms of processing and requires
>> an SSD drive for decent serving performance. The scale of the index is as
>> follows:
>> Up to 4.5 billions keys
>> An average of 8 document ids per key, delta-encoded and then variable
>> integer encoded.
>> Lookups on average involve retrieving values for 3500 keys
>> Having read the datastore docs it seems like this could be a possible
>> schema:
>> from google.appengine.ext import db
>> class Index(db.Model):
>>     hash=db.IntegerProperty(required=True)
>>     values=db.BlobProperty(required=True)
>> I would be grateful if anyone could give me some advice or tips on how this
>> might perform on AppEngine in terms of query performance, cost and
>> minimizing metadata/index overhead. It sounds like 4.5 billion*metadata
>> storage could be the killer.
>> Cheers,
>> Donovan
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to