best regards, 韩竹(Zhu Han) 坚果铺子 <https://jianguopuzi.com>, 最简单易用的云存储 同步文件, 分享照片, 文档备份!
On Mon, Nov 21, 2011 at 11:07 PM, Dan Hendry <dan.hendry.j...@gmail.com>wrote: > Pretty sure your argument about indirect blocks making large files > inefficient only pertains to ext2/3 and not ext4. It seems ext4 replaces > the > 'indirect block' approach with extents > ( > http://kernelnewbies.org/Ext4#head-7c5fd53118e8b888345b95cc11756346be4268f4 > , http://en.wikipedia.org/wiki/Ext4#Features). > > > I was not aware of this difference in the file systems and it seems to be a > compelling reason ext4 should be chosen (over ext3) for Cassandra - at > least > when using size tiered compaction. > An alternative is XFS, which is also extent based. > > Dan > > -----Original Message----- > From: Radim Kolar [mailto:h...@sendmail.cz] > Sent: November-19-11 19:42 > To: user@cassandra.apache.org > Subject: Re: split large sstable > > Dne 17.11.2011 17:42, Dan Hendry napsal(a): > > What do you mean by ' better file offset caching'? Presumably you mean > > 'better page cache hit rate'? > fs metadata used to find blocks in smaller files are cached better. > Large files are using indirect blocks and you need more reads to find > correct block during seek syscall. For example if large file is using 3 > indirect levels, you need 3xdisk seek to find correct block. > > http://computer-forensics.sans.org/blog/2008/12/24/understanding-indirect-bl > ocks-in-unix-file-systems/ > Metadata caching in OS is far worse then file caching - one "find /" > will effectively nullify metadata cache. > > If cassandra could use raw storage. it will eliminate fs overhead and it > could be over 100% faster on reads because fragmentation will be an > exception - no need to design fs like FAT or UFS where designers expects > files to be stored in non continuous area on disk. Implementing > something log based like - http://logfs.sourceforge.net/ will be enough. > Cleaning will not be much needed because compaction will clean it > naturally. > > > Perhaps what you are actually seeing is row fragmentation across your > > SSTables? Easy to check with nodetool cfhistograms (SSTables column). > i have 1.5% hitrate to 2 sstables and 3% to hit 3 sstables. Its pretty > low with min. compaction set to 5, i will probably set it to 6. > > I would really like to see tests with user defined sizes and file counts > used for tiered compaction because it work best if you do not leave > largest file alone in bucket. If your data in cassandra are not growing, > it can be better fine tuned. i havent done experiments with it but maybe > max sstable size defined per cf will be enough. Lets say i have 5 GB > data per CF - ideal setting will be max sstable size to slightly less > then 1 GB. Cassandra will not keep old data stuck in one 4 GB compacted > sstable waiting for other 4 GB sstables to be created before compaction > will remove old data. > > > To answer your question, I know of no tools to split SSTables. If you > want > > to switch compaction strategies, levelled compaction (1.0.x) creates many > > smaller sstables instead of fewer, bigger ones. > I dont use levelled compaction, it compacts too often. It might get > better if it can be tuned how many and how large files to use at each > level. But i will try to switch to levelled compaction and back again it > might do what i want. > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 9.0.920 / Virus Database: 271.1.1/4029 - Release Date: 11/20/11 > 14:34:00 > >