Hi Parth,
I’ll take your questions in order: 1. Have a look at the compaction subproperties for STCS: http://datastax.com/documentation/cql/3.1/cql/cql_reference/compactSubprop.html 2. Why not talk to Cassandra when generating the report? It will be waaay faster (and easier!); Cassandra will use bloom filters, handle shadowed (overwritten) columns, handle tombstones for you, not the mention the fact that it uses sstables that are hot in OS file cache. 3. See 2) above. Also, your approach requires you to implement handling of shadowed columns as well as tombstone handling which could be pretty messy. Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Mon, Jan 26, 2015 at 7:40 AM, Parth Setya <setya.pa...@gmail.com> wrote: > Hi > *Setup* > *3 Node Cluster* > Api- > * Hector*CL- > * QUORUM* > RF- > *3* > Compaction Strategy- > *Size Tiered Compaction* > *Use Case* > I have about *320 million rows*(~12 to 15 columns each) worth of data > stored in Cassandra. In order to generate a report containing ALL that > data, I do the following: > 1. Run Compaction > 2. Take a snapshot of the db > 3. Run sstable2json on all the *Data.db files > 4. Read those jsons and write to a csv. > *Problem*: > The *sstable2json* utility takes about 350-400 hours (~85% of the total > time) thereby lengthening the process. (I am running sstable2json > sequentially on all the *Data.db files but the size of those is > inconsistent so making it run concurrently doesn't help either E.G one file > is of size 25 GB while another of 500 MB) > *My Thought Process:* > Is there a way to put a cap on the maximum size of the sstables that are > generated after compaction such that i have multiple sstables of uniform > size. Then I can run sstable2json utility on the same concurrently > *Questions:* > 1. Is there a way to configure the size of sstables created after > compaction? > 2. Is there a better approach to generate the report? > 3. What are the flaws with this approach? > Best > Parth