Why use such a large batch size? -ryan
On Thu, Mar 10, 2011 at 6:31 AM, Desimpel, Ignace <ignace.desim...@nuance.com> wrote: > > > Hello, > > I had a demo application with embedded cassandra version 0.6.x, inserting > about 120 K row mutations in one call. > > In version 0.6.x that usually took about 5 seconds, and I could repeat this > step adding each time the same amount of data. > > Running on a single CPU computer, single hard disk, XP 32 bit OS, 1G memory > > I tested this again on CentOS 64 bit OS, 6G memory, different settings of > memtable_throughput_in_mb and memtable_operations_in_millions. > > Also tried version 0.7.3. Also the same behavior. > > > > Now with version 0.7.2 the call returns with a timeout exception even using > a timeout of 120000 (2 minutes). I see the CPU time going to 100%, a lot of > disk writing ( giga bytes), a lot of log messages about compacting, > flushing, commitlog, … > > > > Below you can find some information using the nodetool at start of the batch > mutation and also after 14 minutes. The MutationStage is clearly showing how > slow the system handles the row mutations. > > > > Attached : Cassandra.yaml with at end the description of my database > structure using yaml > > Attached : log file with cassandra output. > > > > Any idea what I could be doing wrong? > > > > Regards, > > > > Ignace Desimpel > > > > ignace.desim...@nuance.com > > > > At start of the insert (after inserting 124360 row mutations) I get the > following info from the nodetool : > > > > C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com info > > Starting NodeTool > > 34035877798200531112672274220979640561 > > Gossip active : true > > Load : 5.49 MB > > Generation No : 1299502115 > > Uptime (seconds) : 1152 > > Heap Memory (MB) : 179,84 / 1196,81 > > > > C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com tpstats > > Starting NodeTool > > Pool Name Active Pending Completed > > ReadStage 0 0 40637 > > RequestResponseStage 0 0 30 > > MutationStage 32 121679 72149 > > GossipStage 0 0 0 > > AntiEntropyStage 0 0 0 > > MigrationStage 0 0 1 > > MemtablePostFlusher 0 0 6 > > StreamStage 0 0 0 > > FlushWriter 0 0 5 > > MiscStage 0 0 0 > > FlushSorter 0 0 0 > > InternalResponseStage 0 0 0 > > HintedHandoff 0 0 0 > > > > After 14 minutes (timeout exception after 2 minutes : see log file) I get : > > > > C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com info > > Starting NodeTool > > 34035877798200531112672274220979640561 > > Gossip active : true > > Load : 10.31 MB > > Generation No : 1299502115 > > Uptime (seconds) : 2172 > > Heap Memory (MB) : 733,82 / 1196,81 > > > > C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com tpstats > > Starting NodeTool > > Pool Name Active Pending Completed > > ReadStage 0 0 40646 > > RequestResponseStage 0 0 30 > > MutationStage 32 103310 90526 > > GossipStage 0 0 0 > > AntiEntropyStage 0 0 0 > > MigrationStage 0 0 1 > > MemtablePostFlusher 0 0 69 > > StreamStage 0 0 0 > > FlushWriter 0 0 68 > > FILEUTILS-DELETE-POOL 0 0 42 > > MiscStage 0 0 0 > > FlushSorter 0 0 0 > > InternalResponseStage 0 0 0 > > HintedHandoff 0 0 0 > > > >