Steve Loughran wrote:
aakash shah wrote:
We can assume that this record has only one key->value mapping. Value
will be updated every minute. Currently we have 1 Million these (
key->value ) pairs but I have to make sure that we can scale it upto
10 million of these ( key-> value ) pairs.
Every 10 minute I will be updating all of these value using their
keys. This is the reason I cannot go for database as a solution.
I wouldn't be so quick to dismiss a database. All your big telcos run
their mobile phone systems on databases, where the big issue is having
enough memory for the DB to stay in memory; some dedicated databases
(e.g. TimesTen) are designed to have bounded latency on lookup so you
can predict how long operations will take.
That said, if you are only doing atomic updates of a single record,
there's less need for the advanced features. Assuming >1 machine, some
kind of distributed hash table may work
I was thinking about going with memcache pool. In the mean-time I
heard about hadoop and wanted to get advice from this mailing list
regarding memcache pool vs hadoop for this specific problem.
It's not an area Hadoop deals with at all.
The record size sounds too small for HDFS, unless the records are in
turn grouped to something optimal for the block size. For records that
size, I would also consider a) writing them out again instead of doing
updates, b) testing for physical (disk) bottlenecks.
Also, there's memcacheddb as an alternative for persistent hashing:
http://memcachedb.org/benchmark.html
Bill