Nathan Sobo <ns...@pivotallabs.com> wrote: ns> Is a key-value store actually inappropriate for this problem?
No. One way to do it is to use a single KV key to store multiple addresses worth of info. Pick a relatively big number, 50K subscribers/key, though it may vary. Use a key naming scheme so that you can pre-calculate all keys for a given list, e.g. bucket = list-subscribers, key = name + range index #, or perhaps list name + start-of-hash-range + end-of-hash-range. How do you know the range index # or start & ends of range? One method would be hashing, MD5 or SHA1 or whatever. If you store all addresses for a list with a fixed number of hash hunks, e.g. 100, then each hash hunk will have roughly 20K entries for a 2M subscriber list. To find all subscribers, fetch 100 known keys. If you want to keep addresses in sorted order, it's more work but also doable. A naive plan is to make your hash function F(addr) = first letter of 'addr'. Keys get clumpy that way, but only slightly more creativity can get around it. To find a particular subscriber, hash that subscriber's address and fetch 1 key. You're also getting a lot of uninteresting data with that key's value, but if event is uncommon, it's not a problem. (If that event actually is common, consider moving the commonly-queried data elsewhere. Or duplicate that info in another smaller key somewhere else.) Similar logic for list maintenance events (add subscriber, delete subscriber). -Scott _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com