Re: expiring data out of Cassandra/time to live

Ryan Daum Wed, 31 Mar 2010 13:54:05 -0700

On that topic, what exactly is keeping this feature out of the official
releases?


On Wed, Mar 31, 2010 at 3:43 PM, Daniel Kluesing <d...@bluekai.com> wrote:

>  We also applied this patch to the 0.6 branch and have been running it for
> a bit over a week. Works well, would love to see it get into trunk/0.7
> proper.
>
>
>
> *From:* Ryan Daum [mailto:r...@thimbleware.com]
> *Sent:* Wednesday, March 31, 2010 11:49 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: expiring data out of Cassandra/time to live
>
>
>
> I was able to successfully merge this patch into the 0.6 branch a few weeks
> ago by doing the following:
>
>
>
>    - Downloading the patch
>    - Checking out the trunk of Cassandra from github
>    - Rolling back (checking out) the git repo to the same date that the
>    patch was submitted to Jira
>    - Applying the patch
>    - Committing to Git
>    - Merging forward to the 0.6 branch
>    - Resolve one or two minor conflicts.
>
>
>
> R
>
>
>
> On Wed, Mar 31, 2010 at 2:46 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>
> Sounds like you want to follow
> https://issues.apache.org/jira/browse/CASSANDRA-699.  There is a patch
> there but I wouldn't recommend merging it if Java scares you. :)
>
>
> On Wed, Mar 31, 2010 at 1:39 PM, Mike Gallamore
> <mike.e.gallam...@googlemail.com> wrote:
> > Hello everyone,
> >
> > I saw a thread on the incubator user chat that started a few months ago:
> >
> http://www.mail-archive.com/cassandra-u...@incubator.apache.org/msg02047.html
> > . It looks like this is the new official user mailing list so I'll add my
> > thoughts/question here.
> >
> > Is there any way to set a TTL on data stored in Cassandra? Deleting old
> > SSTables isn't enough for my needs. I need the data to go away after a
> fixed
> > period of time. Here is what I'm trying to do and my reasoning why I
> think
> > Cassandra and not something like Flare/Memcache mets my need:
> >
> > I'm building a reputation system. We get lots of data at my work (in the
> > 10's of GB of reputation data a day). The trick is that old data is not
> > useful as a senders ip address might have changed, they might have had a
> bot
> > on their system and no have removed it, etc. So I need to be able to keep
> > data for a fixed period of time and then afterwords it isn't
> needed/ideally
> > would be GC'd out.
> >
> > We want to do one thing if we either never heard of the individual or at
> > least not since the expiry time, and another thing based on the
> reputation
> > data that is stored in Cassandra if it is current. So ideally a Cassandra
> > call for a key for someone who's reputation is expired would return
> nothing
> > and we'd reply with our default reputation for that individual. There
> really
> > is no point using network bandwidth to return all the fields associated
> with
> > that key only to look at a timestamp and end up ignoring it anyways.
> > Similarly the latency of requesting first the timestamp and then the data
> in
> > two separate requests is prohibitive.
> >
> > Why Cassandra:
> >
> > Our data is complex and is hard to handle completely in a key/value
> sense.
> > In the past we were doing this and just encoding the complex structure
> > inside of JSON but this isn't ideal. It is very nice algorithmically to
> be
> > able to say: give me this column, or update this element of this hash
> etc,
> > rather than having to pull the old version, decode, modify, re-encode and
> > push back to a cache based system.
> > Our data is large (in the low TB's at the moment, but expected to grow to
> > 50-100TB of live data)
> > Need quick response for both searches and writes: typically for each
> thing
> > we track we get a request for the reputation, the message gets processed
> and
> > then we get feedback back from the recipient. So reads and writes are
> > symmetric.
> > High request rate: millions per hour
> > hundreds of millions of unique reputations (this is way crawling though
> the
> > data with a script purging old data doesn't make sense)
> > Availablity/load balancing a must. Data needs to be replicated a disk
> copy
> > is useful so if we have a power outage we don't lose the system.
> > It would be interesting to keep a local subset of our data at customers
> > sites and have them "replicate up" there data rather than send there
> > feedback in a different manner that then has to be processed and pumped
> into
> > our datastore (hopefully this is possible with Cassandra with some
> creative
> > choices of how the data is hashed between nodes)
> >
> > Does the capability to set an expiry time exist? If not is there any
> plans
> > to add it? My java experience is very limited (I'm accessing Cassandra
> via
> > thrift/Perl) so it isn't something I'd be able to jump in and run with
> > myself.
> >
>
>
>

Re: expiring data out of Cassandra/time to live

Reply via email to