What do you mean by ' better file offset caching'? Presumably you mean 'better page cache hit rate'? Out of curiosity, why do you think this? What data are you seeing which makes you think it's better? I am certainly not even close to a virtual memory or page caching expert but I am pretty sure file size does not matter (assuming file sizes are significantly greater than the page size which I believe is 4k).
Perhaps what you are actually seeing is row fragmentation across your SSTables? Easy to check with nodetool cfhistograms (SSTables column). To answer your question, I know of no tools to split SSTables. If you want to switch compaction strategies, levelled compaction (1.0.x) creates many smaller sstables instead of fewer, bigger ones. Although it is workload dependent, increasing min_compaction_threshold for size tiered compaction is probably a bad idea since it will increase row fragmentation across SSTables and therefore increase io/seeking requirements for reads (particularly for column ranges or non named-column queries). The only reason to do so is to reduce the frequency of compaction (disk io considerations). Dan -----Original Message----- From: Radim Kolar [mailto:h...@sendmail.cz] Sent: November-17-11 5:02 To: user@cassandra.apache.org Subject: split large sstable Is there some simple way how to split large sstable into several smaller ones? I increased min_compaction_threshold (smaller tables seems to get better file offset caching from OS) and now i need to reshuffle data to smaller sstables, running several cluster wide repairs worked well just largest table was left. I have 80 GB sstable and need to split it to about 10 GB ones. No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.920 / Virus Database: 271.1.1/4020 - Release Date: 11/16/11 02:34:00