Re: split large sstable

Zhu Han Mon, 21 Nov 2011 07:15:35 -0800

best regards,
韩竹(Zhu Han)

坚果铺子 <https://jianguopuzi.com>, 最简单易用的云存储
同步文件, 分享照片, 文档备份!




On Mon, Nov 21, 2011 at 11:07 PM, Dan Hendry <dan.hendry.j...@gmail.com>wrote:

> Pretty sure your argument about indirect blocks making large files
> inefficient only pertains to ext2/3 and not ext4. It seems ext4 replaces
> the
> 'indirect block' approach with extents
> (
> http://kernelnewbies.org/Ext4#head-7c5fd53118e8b888345b95cc11756346be4268f4
> , http://en.wikipedia.org/wiki/Ext4#Features).
>



>
> I was not aware of this difference in the file systems and it seems to be a
> compelling reason ext4 should be chosen (over ext3) for Cassandra - at
> least
> when using size tiered compaction.
>

An alternative is XFS, which is also extent based.

>
> Dan
>
> -----Original Message-----
> From: Radim Kolar [mailto:h...@sendmail.cz]
> Sent: November-19-11 19:42
> To: user@cassandra.apache.org
> Subject: Re: split large sstable
>
> Dne 17.11.2011 17:42, Dan Hendry napsal(a):
> > What do you mean by ' better file offset caching'? Presumably you mean
> > 'better page cache hit rate'?
> fs metadata used to find blocks in smaller files are cached better.
> Large files are using indirect blocks and you need more reads to find
> correct block during seek syscall. For example if large file is using 3
> indirect levels, you need 3xdisk seek to find correct block.
>
> http://computer-forensics.sans.org/blog/2008/12/24/understanding-indirect-bl
> ocks-in-unix-file-systems/
> Metadata caching in OS is far worse then file caching - one "find /"
> will effectively nullify metadata cache.
>
> If cassandra could use raw storage. it will eliminate fs overhead and it
> could be over 100% faster on reads because fragmentation will be an
> exception - no need to design fs like FAT or UFS where designers expects
> files to be stored in non continuous area on disk.  Implementing
> something log based like - http://logfs.sourceforge.net/ will be enough.
> Cleaning will not be much needed because compaction will clean it
> naturally.
>
> > Perhaps what you are actually seeing is row fragmentation across your
> > SSTables? Easy to check with nodetool cfhistograms (SSTables column).
> i have 1.5% hitrate to 2 sstables and 3% to hit 3 sstables. Its pretty
> low with min. compaction set to 5, i will probably set it to 6.
>
> I would really like to see tests with user defined sizes and file counts
> used for tiered compaction because it work best if you do not leave
> largest file alone in bucket. If your data in cassandra are not growing,
> it can be better fine tuned. i havent done experiments with it but maybe
> max sstable size defined per cf will be enough. Lets say i have 5 GB
> data per CF - ideal setting will be max sstable size to slightly less
> then 1 GB. Cassandra will not keep old data stuck in one 4 GB compacted
> sstable waiting for other 4 GB sstables to be created before compaction
> will remove old data.
>
> > To answer your question, I know of no tools to split SSTables. If you
> want
> > to switch compaction strategies, levelled compaction (1.0.x) creates many
> > smaller sstables instead of fewer, bigger ones.
> I dont use levelled compaction, it compacts too often. It might get
> better if it can be tuned how many and how large files to use at each
> level. But i will try to switch to levelled compaction and back again it
> might do what i want.
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.920 / Virus Database: 271.1.1/4029 - Release Date: 11/20/11
> 14:34:00
>
>

Re: split large sstable

Reply via email to