I was able to successfully merge this patch into the 0.6 branch a few weeks ago by doing the following:
- Downloading the patch - Checking out the trunk of Cassandra from github - Rolling back (checking out) the git repo to the same date that the patch was submitted to Jira - Applying the patch - Committing to Git - Merging forward to the 0.6 branch - Resolve one or two minor conflicts. R On Wed, Mar 31, 2010 at 2:46 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > Sounds like you want to follow > https://issues.apache.org/jira/browse/CASSANDRA-699. There is a patch > there but I wouldn't recommend merging it if Java scares you. :) > > On Wed, Mar 31, 2010 at 1:39 PM, Mike Gallamore > <mike.e.gallam...@googlemail.com> wrote: > > Hello everyone, > > > > I saw a thread on the incubator user chat that started a few months ago: > > > http://www.mail-archive.com/cassandra-u...@incubator.apache.org/msg02047.html > > . It looks like this is the new official user mailing list so I'll add my > > thoughts/question here. > > > > Is there any way to set a TTL on data stored in Cassandra? Deleting old > > SSTables isn't enough for my needs. I need the data to go away after a > fixed > > period of time. Here is what I'm trying to do and my reasoning why I > think > > Cassandra and not something like Flare/Memcache mets my need: > > > > I'm building a reputation system. We get lots of data at my work (in the > > 10's of GB of reputation data a day). The trick is that old data is not > > useful as a senders ip address might have changed, they might have had a > bot > > on their system and no have removed it, etc. So I need to be able to keep > > data for a fixed period of time and then afterwords it isn't > needed/ideally > > would be GC'd out. > > > > We want to do one thing if we either never heard of the individual or at > > least not since the expiry time, and another thing based on the > reputation > > data that is stored in Cassandra if it is current. So ideally a Cassandra > > call for a key for someone who's reputation is expired would return > nothing > > and we'd reply with our default reputation for that individual. There > really > > is no point using network bandwidth to return all the fields associated > with > > that key only to look at a timestamp and end up ignoring it anyways. > > Similarly the latency of requesting first the timestamp and then the data > in > > two separate requests is prohibitive. > > > > Why Cassandra: > > > > Our data is complex and is hard to handle completely in a key/value > sense. > > In the past we were doing this and just encoding the complex structure > > inside of JSON but this isn't ideal. It is very nice algorithmically to > be > > able to say: give me this column, or update this element of this hash > etc, > > rather than having to pull the old version, decode, modify, re-encode and > > push back to a cache based system. > > Our data is large (in the low TB's at the moment, but expected to grow to > > 50-100TB of live data) > > Need quick response for both searches and writes: typically for each > thing > > we track we get a request for the reputation, the message gets processed > and > > then we get feedback back from the recipient. So reads and writes are > > symmetric. > > High request rate: millions per hour > > hundreds of millions of unique reputations (this is way crawling though > the > > data with a script purging old data doesn't make sense) > > Availablity/load balancing a must. Data needs to be replicated a disk > copy > > is useful so if we have a power outage we don't lose the system. > > It would be interesting to keep a local subset of our data at customers > > sites and have them "replicate up" there data rather than send there > > feedback in a different manner that then has to be processed and pumped > into > > our datastore (hopefully this is possible with Cassandra with some > creative > > choices of how the data is hashed between nodes) > > > > Does the capability to set an expiry time exist? If not is there any > plans > > to add it? My java experience is very limited (I'm accessing Cassandra > via > > thrift/Perl) so it isn't something I'd be able to jump in and run with > > myself. > > >