That is the amount of records I need to add for each document. And we would 
like to test it with more than 100K or more documents. That's why we thought 
Cassandra could be a good database system.

At start I did the inserts one by one. Of course by doing it in batch the 
system was a lot faster, and it worked fine in version 0.6.x.
With your question in mind, I did some more tests (only on Windows XP):
1) Changed the code to insert two sets of about 50K. Same behavior in 0.7.x.
2) Then changed it to store 1000 records at a time. Seems a bit better. Now the 
rpc timeout is not throwed. But the number of flushing of Memtables and the 
number of generate commit logs is still large. And the total amount of time to 
write is still more than 10 minutes, although is used to be less than 10 
seconds.


I do not know the code of Cassandra, but I also have the system running in 
Eclipse. Thus if needed I can debug the code but I would need some input from 
your team.

Ignace


-----Original Message-----
From: Ryan King [mailto:r...@twitter.com] 
Sent: donderdag 10 maart 2011 18:18
To: user@cassandra.apache.org
Cc: Desimpel, Ignace
Subject: Re: FW: Very slow batch insert using version 0.7.2

Why use such a large batch size?

-ryan

On Thu, Mar 10, 2011 at 6:31 AM, Desimpel, Ignace
<ignace.desim...@nuance.com> wrote:
>
>
> Hello,
>
> I had a demo application with embedded cassandra version 0.6.x, inserting
> about 120 K  row mutations in one call.
>
> In version 0.6.x that usually took about 5 seconds, and I could repeat this
> step adding each time the same amount of data.
>
> Running on a single CPU computer, single hard disk, XP 32 bit OS, 1G memory
>
> I tested this again on CentOS 64 bit OS, 6G memory, different settings of
> memtable_throughput_in_mb and memtable_operations_in_millions.
>
> Also tried version 0.7.3. Also the same behavior.
>
>
>
> Now with version 0.7.2 the call returns with a timeout exception even using
> a timeout of 120000 (2 minutes). I see the CPU time going to 100%, a lot of
> disk writing ( giga bytes), a lot of log messages  about compacting,
> flushing, commitlog, ...
>
>
>
> Below you can find some information using the nodetool at start of the batch
> mutation and also after 14 minutes. The MutationStage is clearly showing how
> slow the system handles the row mutations.
>
>
>
> Attached : Cassandra.yaml with at end the description of my database
> structure using yaml
>
> Attached : log file with cassandra output.
>
>
>
> Any idea what I could be doing wrong?
>
>
>
> Regards,
>
>
>
> Ignace Desimpel
>
>
>
> ignace.desim...@nuance.com
>
>
>
> At start of the insert (after inserting 124360 row mutations) I get the
> following info from the nodetool :
>
>
>
> C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com info
>
> Starting NodeTool
>
> 34035877798200531112672274220979640561
>
> Gossip active    : true
>
> Load             : 5.49 MB
>
> Generation No    : 1299502115
>
> Uptime (seconds) : 1152
>
> Heap Memory (MB) : 179,84 / 1196,81
>
>
>
> C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com tpstats
>
> Starting NodeTool
>
> Pool Name                    Active   Pending      Completed
>
> ReadStage                         0         0          40637
>
> RequestResponseStage              0         0             30
>
> MutationStage                    32    121679          72149
>
> GossipStage                       0         0              0
>
> AntiEntropyStage                  0         0              0
>
> MigrationStage                    0         0              1
>
> MemtablePostFlusher               0         0              6
>
> StreamStage                       0         0              0
>
> FlushWriter                       0         0              5
>
> MiscStage                         0         0              0
>
> FlushSorter                       0         0              0
>
> InternalResponseStage             0         0              0
>
> HintedHandoff                     0         0              0
>
>
>
> After 14 minutes (timeout exception after 2 minutes : see log file) I get :
>
>
>
> C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com info
>
> Starting NodeTool
>
> 34035877798200531112672274220979640561
>
> Gossip active    : true
>
> Load             : 10.31 MB
>
> Generation No    : 1299502115
>
> Uptime (seconds) : 2172
>
> Heap Memory (MB) : 733,82 / 1196,81
>
>
>
> C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com tpstats
>
> Starting NodeTool
>
> Pool Name                    Active   Pending      Completed
>
> ReadStage                         0         0          40646
>
> RequestResponseStage              0         0             30
>
> MutationStage                    32    103310          90526
>
> GossipStage                       0         0              0
>
> AntiEntropyStage                  0         0              0
>
> MigrationStage                    0         0              1
>
> MemtablePostFlusher               0         0             69
>
> StreamStage                       0         0              0
>
> FlushWriter                       0         0             68
>
> FILEUTILS-DELETE-POOL             0         0             42
>
> MiscStage                         0         0              0
>
> FlushSorter                       0         0              0
>
> InternalResponseStage             0         0              0
>
> HintedHandoff                     0         0              0
>
>
>
>

Reply via email to