On Tue, Sep 25, 2012 at 10:36 AM, Віталій Тимчишин <tiv...@gmail.com> wrote: > See my comments inline > > 2012/9/25 Aaron Turner <synfina...@gmail.com> >> >> On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин <tiv...@gmail.com> >> wrote: >> > Why so? >> > What are pluses and minuses? >> > As for me, I am looking for number of files in directory. >> > 700GB/512MB*5(files per SST) = 7000 files, that is OK from my view. >> > 700GB/5MB*5 = 700000 files, that is too much for single directory, too >> > much >> > memory used for SST data, too huge compaction queue (that leads to >> > strange >> > pauses, I suppose because of compactor thinking what to compact >> > next),... >> >> >> Not sure why a lot of files is a problem... modern filesystems deal >> with that pretty well. > > > May be. May be it's not filesystem, but cassandra. I've seen slowdowns of > compaction when the compaction queue is too large. And it can be too large > if you have a lot of SSTables. Note that each SSTable is both FS metadata > (and FS metadata cache can be limited) and cassandra in-memory data. > Anyway, as for me, performance test would be great in this area. Otherwise > it's all speculations.
Agreed... I guess my thought is the default is 5MB and the recommendations of the developers is to not stray too far from that. So unless you've done the performance benchmarks to prove otherwise, I'm not sure why you chose a value about 100x that? Also, I notice you're talking about 700GB/node? That's about 200% above the recommended maximum of 300-400GB node. I notice a lot of people are trying to push this number, because while disk is relatively cheap, computers are not. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"