Dne 17.11.2011 17:42, Dan Hendry napsal(a):
What do you mean by ' better file offset caching'? Presumably you mean
'better page cache hit rate'?
fs metadata used to find blocks in smaller files are cached better.
Large files are using indirect blocks and you need more reads to find
correct block during seek syscall. For example if large file is using 3
indirect levels, you need 3xdisk seek to find correct block.
http://computer-forensics.sans.org/blog/2008/12/24/understanding-indirect-blocks-in-unix-file-systems/
Metadata caching in OS is far worse then file caching - one "find /"
will effectively nullify metadata cache.
If cassandra could use raw storage. it will eliminate fs overhead and it
could be over 100% faster on reads because fragmentation will be an
exception - no need to design fs like FAT or UFS where designers expects
files to be stored in non continuous area on disk. Implementing
something log based like - http://logfs.sourceforge.net/ will be enough.
Cleaning will not be much needed because compaction will clean it naturally.
Perhaps what you are actually seeing is row fragmentation across your
SSTables? Easy to check with nodetool cfhistograms (SSTables column).
i have 1.5% hitrate to 2 sstables and 3% to hit 3 sstables. Its pretty
low with min. compaction set to 5, i will probably set it to 6.
I would really like to see tests with user defined sizes and file counts
used for tiered compaction because it work best if you do not leave
largest file alone in bucket. If your data in cassandra are not growing,
it can be better fine tuned. i havent done experiments with it but maybe
max sstable size defined per cf will be enough. Lets say i have 5 GB
data per CF - ideal setting will be max sstable size to slightly less
then 1 GB. Cassandra will not keep old data stuck in one 4 GB compacted
sstable waiting for other 4 GB sstables to be created before compaction
will remove old data.
To answer your question, I know of no tools to split SSTables. If you want
to switch compaction strategies, levelled compaction (1.0.x) creates many
smaller sstables instead of fewer, bigger ones.
I dont use levelled compaction, it compacts too often. It might get
better if it can be tuned how many and how large files to use at each
level. But i will try to switch to levelled compaction and back again it
might do what i want.