We're using Cassandra as the back end for a home grown session
management system. That system was originally built back in 2005 using
BerkelyDB/Java and a data distribution system that used UDP multicast.
Maintenance was becoming increasingly painful.

I wrote a prototype replacement service using Cassandra 0.6 but
decided to wait for the availability of official TTL support in 0.7
before switching over.

The new system has been running in production now for a little over a
week. My main issue is that Cassandra is using far more disk space
than I expected it to. The vast bulk of disk space seems to be used
for *Index.db files. I'm hoping that the 10-day GCGraceSeconds
interval that kicks in on Friday will help me there.

Most of our apps that use this service generate their own session
keys. I assume by hashing and salting a user ID and/or calling
something like java.util.UUID.randomUUID().

My schema is currently very simple -- there's a single CF containing a
(binary) payload column and a column that indicates whether or not the
data has been compressed. We have a few rogue apps that store
humongous XML documents in the session and compression helps to deal
with that. That's also why memcached wasn't going to work in our
scenario.



On Tue, Feb 1, 2011 at 12:18 PM, Kallin Nagelberg
<kallin.nagelb...@gmail.com> wrote:
> Hey,
> I am currently investigating Cassandra for storing what are
> effectively web sessions. Our production environment has about 10 high
> end servers behind a load balancer, and we'd like to add distributed
> session support. My main concerns are performance, consistency, and
> the ability to create unique session keys. The last thing we would
> want is users picking up each others sessions. After spending a few
> days investigating Cassandra I'm thinking of creating a single
> keyspace with a single super-column-family. The scf would store a few
> standard columns, and a supercolumn of arbitrary session attributes,
> like:
>
> 0s809sdf8s908sf90s: {
> prop1: x,
> created : timestamp,
> lastAccessed: timestamp,
> prop2: y,
> arbirtraryProperties : {
>             someRandomProperty1:xxyyzz,
>             someRandomProperty2:xxyyzz,
>             someRandomProperty3:xxyyzz
> }
>
> Does this sound like a reasonable use case? We are on a tight timeline
> and I'm currently on the fence about getting something up and running
> like this on a tight timeline.
>
> Thanks,
> -Kal
>

Reply via email to