Yeah. No JOINs as of now in Cassandra. What if you dumped the CF in question once a month to json and rewrote out each record in the json data if it met the time stamp you were interested in archiving.
You could then bulk load each "month" back in if you had to restore. Doesn't help with deletes though and I would advise against large mass delete operations each month -- tends to lead to a very unhappy cluster On Dec 18, 2012, at 9:23 AM, "stephen.m.thomp...@wellsfargo.com<mailto:stephen.m.thomp...@wellsfargo.com>" <stephen.m.thomp...@wellsfargo.com<mailto:stephen.m.thomp...@wellsfargo.com>> wrote: Michael - That is one approach I have considered, but that also makes querying the system particularly onerous since every column family would require its own query – I don’t think there is any good way to “join” those, right? Chris – that is an interesting concept, but as Viktor and Keith note, it seems to have problems. Could we do this simply by mass deletes? For example, if I created a column which was just YYYY/MM, then during our maintenance we could spool off records that match the month we are archiving, then do a bulk delete by that key. We would need to have a secondary index for that, I would assume. From: Michael Kjellman [mailto:mkjell...@barracuda.com] Sent: Tuesday, December 18, 2012 11:15 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Partition maintenance You could make a column family for each period of time and then drop the column family when you want to destroy it. Before you drop it you could use the sstabletojson converter and write the json files out to tape. Might make your life difficult however if you need an input split for map reduce between each time period because you would be limited to working on one column family at a time. On Dec 18, 2012, at 8:09 AM, "stephen.m.thomp...@wellsfargo.com<mailto:stephen.m.thomp...@wellsfargo.com>" <stephen.m.thomp...@wellsfargo.com<mailto:stephen.m.thomp...@wellsfargo.com>> wrote: Hi folks. Still working through the details of building out a Cassandra solution and I have an interesting requirement that I’m not sure how to implement in Cassandra: In our current Oracle world, we have the data for this system partitioned by month, and each month the data that are now 18-months old are archived to tape/cold storage and then the partition for that month is dropped. Is there a way to do something similar with Cassandra without destroying our overall performance? Thanks in advance, Steve ---------------------------------- Join Barracuda Networks in the fight against hunger. To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f Join Barracuda Networks in the fight against hunger. To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f