Cool thanks, that helps. So even if we have defined a column family in the storage-conf and it's empty, this has some overhead in cassandra and the following rule should apply:
memtable_throughput_in_mb * 3 * number of hot CFs + 1G + internal caches. On Wed, Oct 20, 2010 at 12:53 PM, Aaron Morton <[email protected]>wrote: > Take a look at the section on JVM Heap size here > http://wiki.apache.org/cassandra/MemtableThresholds > > <http://wiki.apache.org/cassandra/MemtableThresholds>CF's have a large > overhead, Keyspaces have none/little. > > In general write performance will be affected by the memtable thresholds > (also on the link above). Read performance will be affected by the size of > the cassandra caches and OS file caches. Compaction can slow a node, 0.7 > handles this better via the dynamic snitch. > > Start with conservative / default values, then crank things up. > > Aaron > > On 21 Oct, 2010,at 08:42 AM, CassUser CassUser <[email protected]> wrote: > > Thanks for the link. > > #2 was not meant to be trick question, it just came out like that :). what > i was after is the overhead associated with large number of keyspaces and > column families (i didn't mean empty memtables :). If a few keyspaces that > have 20 or so column families with a percentage of rows cached. Does this > effect write performance to other keyspaces in the cluster? > > > > On Wed, Oct 20, 2010 at 12:01 PM, Edward Capriolo > <[email protected]>wrote: > >> >> On Wed, Oct 20, 2010 at 2:47 PM, CassUser CassUser <[email protected]> >> wrote: >> > Hey, >> > >> > As I understand it writes go directly to the commit log. Once a >> threshold >> > has been reached the data is shipped to a memtable, and again to an >> sstable. >> > >> > 1. How many memtables are created when a flush happens from a commit >> log? >> > One per CF? >> > >> > 2. Is there any space associated with an empty memtable? >> > >> > 3. When a flush happens from a memtable to an sstable, does this create >> a >> > single new sstable? >> > >> > 4. Should compaction be turned off during a large data load? >> > >> > Thanks. >> > >> >> Take a look at: >> >> >> http://wiki.apache.org/cassandra/MemtableSSTable >> >> 1 and 3 >> Memtables flush for three reasons size, time, and number of >> operations. There is one memtable per column family. Each memtable >> flushes individually. >> >> 2. Is this a trick question? >> >> 4. Should compaction be turned off during a large data load? >> You can disable compaction during bulk loads. This can help because >> otherwise the same data might be compacted multiple times. However if >> you go to long with compaction turned off you end up with multiple >> sstables. This can end up in fragmented rows. >> > >
