Re: Partition maintenance

Michael Kjellman Tue, 18 Dec 2012 09:37:38 -0800

Yeah. No JOINs as of now in Cassandra.

What if you dumped the CF in question once a month to json and rewrote out each 
record in the json data if it met the time stamp you were interested in 
archiving.


You could then bulk load each "month" back in if you had to restore.

Doesn't help with deletes though and I would advise against large mass delete 
operations each month -- tends to lead to a very unhappy cluster

On Dec 18, 2012, at 9:23 AM, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>> 
wrote:

Michael - That is one approach I have considered, but that also makes querying 
the system particularly onerous since every column family would require its own 
query – I don’t think there is any good way to “join” those, right?

Chris – that is an interesting concept, but as Viktor and Keith note, it seems 
to have problems.

Could we do this simply by mass deletes?  For example, if I created a column 
which was just YYYY/MM, then during our maintenance we could spool off records 
that match the month we are archiving, then do a bulk delete by that key.  We 
would need to have a secondary index for that, I would assume.


From: Michael Kjellman [mailto:[email protected]]
Sent: Tuesday, December 18, 2012 11:15 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Partition maintenance

You could make a column family for each period of time and then drop the column 
family when you want to destroy it. Before you drop it you could use the 
sstabletojson converter and write the json files out to tape.

Might make your life difficult however if you need an input split for map 
reduce between each time period because you would be limited to working on one 
column family at a time.

On Dec 18, 2012, at 8:09 AM, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>> 
wrote:
Hi folks.  Still working through the details of building out a Cassandra 
solution and I have an interesting requirement that I’m not sure how to 
implement in Cassandra:

In our current Oracle world, we have the data for this system partitioned by 
month, and each month the data that are now 18-months old are archived to 
tape/cold storage and then the partition for that month is dropped.  Is there a 
way to do something similar with Cassandra without destroying our overall 
performance?

Thanks in advance,
Steve

----------------------------------
Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit: 
http://on.fb.me/UAdL4f
  

Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit: 
http://on.fb.me/UAdL4f

Re: Partition maintenance

Reply via email to