On 2010-03-15 00:57, Toby DiPasquale wrote:
> I'm actually just trying to build a little URL shortener to use as a
> demo for an upcoming presentation I'm doing on Cassandra. The counter
> is to be used as the short key for a new URL submitted to the system:
> increment the counter and then use the Base-62 encoding of that
> counter value as the key for that new URL. That's why I'm able to skip
> some values, but can't have any two clients reading the same value.

OK, so then counters shouldn't be managed in Cassandra at all. Each
server should have a prefix and its own counter, maintained entirely
locally. The "prefix" could be a character or as few bits as you need to
uniquely identify machines in your cluster.

When a server gets a URL to shorten and increments its counter, it
inserts the <prefix + counter, URL> item into Cassandra. I'm guessing
this would be with a row with the key <prefix + counter> and a single
column in the row storing the URL.

Any server translating a short URL to the full one queries Cassandra
with the combined <prefix + counter> key. It's irrelevant whether the
server that generated the original mapping is online or even existent
anymore.

My suggested approach avoids a slow read in Cassandra when storing each
URL and doesn't introduce any new SPOF. You lose a tiny bit of data
density, but I don't think it's significant.

If this were for a production system, I would use memcached as a
write-through cache to avoid reading from Cassandra. You could even use
memcached to maintain the counters. You'd have to be careful to avoid
re-using values on memcached restart or clearing, but there are many
ways to do that.

-- 
David Strauss
   | da...@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to