I believe Aegisthus is open sourced. Mohammed
From: Jan [mailto:cne...@yahoo.com] Sent: Monday, January 26, 2015 11:20 AM To: user@cassandra.apache.org Subject: Re: Controlling the MAX SIZE of sstables after compaction Parth et al; the folks at Netflix seem to have built a solution for your problem. The Netflix Tech Blog: Aegisthus - A Bulk Data Pipeline out of Cassandra<http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html> [image]<http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html> The Netflix Tech Blog: Aegisthus - A Bulk Data Pipeline ...<http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html> By Charles Smith and Jeff Magnusson View on techblog.netflix.com<http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html> Preview by Yahoo May want to chase Jeff Magnuson & check if the solution is open sourced. Pl. report back to this forum if you get an answer to the problem. hope this helps. Jan C* Architect On Monday, January 26, 2015 11:25 AM, Robert Coli <rc...@eventbrite.com<mailto:rc...@eventbrite.com>> wrote: On Sun, Jan 25, 2015 at 10:40 PM, Parth Setya <setya.pa...@gmail.com<mailto:setya.pa...@gmail.com>> wrote: 1. Is there a way to configure the size of sstables created after compaction? No, won'tfix : https://issues.apache.org<https://issues.apache.org/>/jira/browse/CASSANDRA-4897. You could use the "sstablesplit" utility on your One Big SSTable to split it into files of your preferred size. 2. Is there a better approach to generate the report? The major compaction isn't too bad, but something that understands SSTables as an input format would be preferable to sstable2json. 3. What are the flaws with this approach? sstable2json is slow and transforms your data to JSON. =Rob