We're using Cassandra as the back end for a home grown session management system. That system was originally built back in 2005 using BerkelyDB/Java and a data distribution system that used UDP multicast. Maintenance was becoming increasingly painful.
I wrote a prototype replacement service using Cassandra 0.6 but decided to wait for the availability of official TTL support in 0.7 before switching over. The new system has been running in production now for a little over a week. My main issue is that Cassandra is using far more disk space than I expected it to. The vast bulk of disk space seems to be used for *Index.db files. I'm hoping that the 10-day GCGraceSeconds interval that kicks in on Friday will help me there. Most of our apps that use this service generate their own session keys. I assume by hashing and salting a user ID and/or calling something like java.util.UUID.randomUUID(). My schema is currently very simple -- there's a single CF containing a (binary) payload column and a column that indicates whether or not the data has been compressed. We have a few rogue apps that store humongous XML documents in the session and compression helps to deal with that. That's also why memcached wasn't going to work in our scenario. On Tue, Feb 1, 2011 at 12:18 PM, Kallin Nagelberg <kallin.nagelb...@gmail.com> wrote: > Hey, > I am currently investigating Cassandra for storing what are > effectively web sessions. Our production environment has about 10 high > end servers behind a load balancer, and we'd like to add distributed > session support. My main concerns are performance, consistency, and > the ability to create unique session keys. The last thing we would > want is users picking up each others sessions. After spending a few > days investigating Cassandra I'm thinking of creating a single > keyspace with a single super-column-family. The scf would store a few > standard columns, and a supercolumn of arbitrary session attributes, > like: > > 0s809sdf8s908sf90s: { > prop1: x, > created : timestamp, > lastAccessed: timestamp, > prop2: y, > arbirtraryProperties : { > someRandomProperty1:xxyyzz, > someRandomProperty2:xxyyzz, > someRandomProperty3:xxyyzz > } > > Does this sound like a reasonable use case? We are on a tight timeline > and I'm currently on the fence about getting something up and running > like this on a tight timeline. > > Thanks, > -Kal >