On 2010-03-15 00:57, Toby DiPasquale wrote: > I'm actually just trying to build a little URL shortener to use as a > demo for an upcoming presentation I'm doing on Cassandra. The counter > is to be used as the short key for a new URL submitted to the system: > increment the counter and then use the Base-62 encoding of that > counter value as the key for that new URL. That's why I'm able to skip > some values, but can't have any two clients reading the same value.
OK, so then counters shouldn't be managed in Cassandra at all. Each server should have a prefix and its own counter, maintained entirely locally. The "prefix" could be a character or as few bits as you need to uniquely identify machines in your cluster. When a server gets a URL to shorten and increments its counter, it inserts the <prefix + counter, URL> item into Cassandra. I'm guessing this would be with a row with the key <prefix + counter> and a single column in the row storing the URL. Any server translating a short URL to the full one queries Cassandra with the combined <prefix + counter> key. It's irrelevant whether the server that generated the original mapping is online or even existent anymore. My suggested approach avoids a slow read in Cassandra when storing each URL and doesn't introduce any new SPOF. You lose a tiny bit of data density, but I don't think it's significant. If this were for a production system, I would use memcached as a write-through cache to avoid reading from Cassandra. You could even use memcached to maintain the counters. You'd have to be careful to avoid re-using values on memcached restart or clearing, but there are many ways to do that. -- David Strauss | da...@fourkitchens.com Four Kitchens | http://fourkitchens.com | +1 512 454 6659 [office] | +1 512 870 8453 [direct]
signature.asc
Description: OpenPGP digital signature