Incidentally, this is also how flickr handles writes - when you upload a photo it gets written to wherever your other photos go. When someone tags it or adds it to a group, it gets copied into that group.
Unless, of course, it's all changed since the last time I looked for information about how flickr actually works. -- Jeremiah Peschka Sent with Sparrow On Wednesday, February 9, 2011 at 10:09 AM, Alexander Sicular wrote: > The only way this is functional is if you implement a uniformly random > hash function so that you know which key any given address will hash > to. Separately, churn will eat you up if you constantly need to take > addys out of your keys. Also, as mentioned elsewhere, Riak links won't > work at these numbers. > > Check out more tech slides/blogs on how twitter does this. Basically > double/reverse/reciprocal look up lists with recipient notification. > When @aplusk tweets twitter does like 4million (or however many > followers he has) writes. I recommend @rk's qcon sf 2010 talk on nosql > at twitter. > > Best, alexander > > On 2011-02-09, Scott Lystig Fritchie <fritc...@snookles.com> wrote: > > Nathan Sobo <ns...@pivotallabs.com> wrote: > > > > ns> Is a key-value store actually inappropriate for this problem? > > > > No. One way to do it is to use a single KV key to store multiple > > addresses worth of info. Pick a relatively big number, 50K > > subscribers/key, though it may vary. Use a key naming scheme so that > > you can pre-calculate all keys for a given list, e.g. bucket = > > list-subscribers, key = name + range index #, or perhaps list name > > + start-of-hash-range + end-of-hash-range. > > > > How do you know the range index # or start & ends of range? One method > > would be hashing, MD5 or SHA1 or whatever. If you store all addresses > > for a list with a fixed number of hash hunks, e.g. 100, then each hash > > hunk will have roughly 20K entries for a 2M subscriber list. To find > > all subscribers, fetch 100 known keys. > > > > If you want to keep addresses in sorted order, it's more work but also > > doable. A naive plan is to make your hash function F(addr) = first > > letter of 'addr'. Keys get clumpy that way, but only slightly more > > creativity can get around it. > > > > To find a particular subscriber, hash that subscriber's address and > > fetch 1 key. You're also getting a lot of uninteresting data with that > > key's value, but if event is uncommon, it's not a problem. (If that > > event actually is common, consider moving the commonly-queried data > > elsewhere. Or duplicate that info in another smaller key somewhere > > else.) Similar logic for list maintenance events (add subscriber, > > delete subscriber). > > > > -Scott > > > > _______________________________________________ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > -- > Sent from my mobile device > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com