Hi
*Setup* *3 Node Cluster* Api- * Hector*CL- * QUORUM* RF- *3* Compaction Strategy- *Size Tiered Compaction* *Use Case* I have about *320 million rows*(~12 to 15 columns each) worth of data stored in Cassandra. In order to generate a report containing ALL that data, I do the following: 1. Run Compaction 2. Take a snapshot of the db 3. Run sstable2json on all the *Data.db files 4. Read those jsons and write to a csv. *Problem*: The *sstable2json* utility takes about 350-400 hours (~85% of the total time) thereby lengthening the process. (I am running sstable2json sequentially on all the *Data.db files but the size of those is inconsistent so making it run concurrently doesn't help either E.G one file is of size 25 GB while another of 500 MB) *My Thought Process:* Is there a way to put a cap on the maximum size of the sstables that are generated after compaction such that i have multiple sstables of uniform size. Then I can run sstable2json utility on the same concurrently *Questions:* 1. Is there a way to configure the size of sstables created after compaction? 2. Is there a better approach to generate the report? 3. What are the flaws with this approach? Best Parth