Re: insert slowdown with secondary indexes

aaron morton Sun, 12 Jun 2011 12:42:57 -0700

There are several possible issues here, to diagnose them would require some 
info on how fast you are writing and what CL level.


Some thoughts

http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts

32 bit machine and 2GV JVM is not ideal. 

A single HDD means the commit log and the data are competing for IO resources. 

The old CF index value must be read and removed during the write process. This 
will be competing for IO resources as well. 

It sounds like you are creating very small SSTables, set higher values for the 
memtable_thoughput. See the cli help or 
http://wiki.apache.org/cassandra/StorageConfiguration

>               Key cache hit rate: 3.6248105493445186E-5
Is a very small hit count, E-5 means move the decimal point 5 places to the 
right. 

>               Write Latency: 0.026 ms.
Shows a reasonable write latency for the current setup

>               Read Count: 22211625
>               Write Count: 22211625
Shows there are some reads going on. 

> mycolumnfam.646f6d61696e4964-f-1943-Data.db              11M
Is odd, how are you clearing the data between tests runs?

I would:
* look into setting the memtable thresholds, 
* load all the data without secondary indexes, then add them. 
* consider using 64bit machines with more memory and disk. 
* work out what is add the read load. 

Hope that helps. 

 
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 11 Jun 2011, at 12:27, jodylandren...@comcast.net wrote:

> Problem: 
> I am attempting to compare a data model of SuperColumn family with a normal 
> Column Family with Secondary Indexes. I did not have insert issues with the 
> SuperColumn family. The problem I am having seems to be inserting into the 
> Column Family with indexes. Seems to be very slow and getting slower. Also, 
> seems like from some previous test, I did not have issue with the normal 
> column family without indexes. About 24hrs after I started the inserts it is 
> taking 7x longer to do the same size insert.  Progressively getting slower 
> and slower.
> 
> Cluster config: 
> I am using cassandra 0.7.6 for a test on a 4 node cluster with replication 
> set at 2. The nodes are 32-bit, quad-core, Linux, 4GB ram, single hard drive. 
> Some settings: 
> MAX_HEAP_SIZE="2000m" 
> HEAP_NEWSIZE="400m" 
> memtable_flush_queue_size: 10 (was 4) 
> Everything else is pretty much default - Random partitioner, etc. 
> 
> 
> What I am seeing: 
> On one machine in particular, it seems to have a bit of IO contention and 
> waits. The other machines don't exhibit this problem. 
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> ..
> 1 32  79384 110272  23900 1477272    0    0  3972     4 2927 1659 47  2  0 51
> 3 31  79384 110520  23892 1476788    0    0  4420     0 1723  622 52  2  0 46
> 4 29  79384 111512  23892 1475788    0    0  3876     0 1579  576 53  2  0 44
> 
> the other machines look like
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> 1  0  96120 1032600  13100 581160    0    0    72     8 1598 1325 30  4 65  0
> 1  0  96120 1029252  13108 584224    0    0     0   144  609  155 23  2 75  0
> 1  0  96120 1027012  13108 587308    0    0    68     0 3437 6890 37  6 57  0
> 
> doing an iostat -x on the machine that is bogged down from an io standpoint
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz 
> avgqu-sz   await  svctm  %util
> sda               0.00     0.00  364.00    0.00  8264.00     0.00    22.70   
> 109.95  149.60   2.75 100.00
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          36.86    0.00    1.23   61.92    0.00    0.00
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz 
> avgqu-sz   await  svctm  %util
> sda               0.00     0.00  326.00    0.00  7832.00     0.00    24.02   
> 118.45  180.10   3.07 100.00
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          40.29    0.00    1.47   58.23    0.00    0.00
> 
> 
> ****** Additionally, and very strange to me, I see on one machine about 
> 60000+ files representing the test column family(and growing with the test). 
> This does not seem like it would be normal? I've shown a few with typical 
> sizes(very small).
> mycolumnfam.646f6d61696e4964-f-1943-Data.db              11M
> mycolumnfam.646f6d61696e4964-f-1943-Filter.db            40K
> mycolumnfam.646f6d61696e4964-f-1943-Index.db             1.3K
> mycolumnfam.646f6d61696e4964-f-1943-Statistics.db        4.2K
> mycolumnfam-f-993-Index.db
> mycolumnfam-f-993-Statistics.db
> mycolumnfam-f-994-Data.db
> mycolumnfam-f-994-Filter.db
> etc, etc. repeating
> 
> The test:
> I have a process that has only 2 threads that is attempting to load about 
> 300million rows(22GB of data).  This is using the Hector java client. I am 
> doing batch inserts of 1000 rows at a time. I am inserting the column values 
> as bytes. The column names are strings. The column family has a total of 15 
> columns(each row).  9 of those columns have indexes.
> 
> The column family stats while under test look like the following. I note that 
> the key cache hit rate is very large. I haven't done any reads yet. None of 
> my other families have this.
> Column Family: mycolumnfam
>               SSTable count: 11
>               Space used (live): 13297907918
>               Space used (total): 13385402196
>               Memtable Columns Count: 287238
>               Memtable Data Size: 8778990
>               Memtable Switch Count: 1036
>               Read Count: 22211625
>               Read Latency: 0.347 ms.
>               Write Count: 22211625
>               Write Latency: 0.026 ms.
>               Pending Tasks: 0
>               Key cache capacity: 200000
>               Key cache size: 6086
>               Key cache hit rate: 3.6248105493445186E-5
>               Row cache: disabled
>               Compacted row minimum size: 447
>               Compacted row maximum size: 642
>               Compacted row mean size: 634
> 
> I'm trying to understand why doing the inserts into a column family with 
> indexes seems to jam things up and am wondering if there are any settings 
> that I could tweak to help. It seems that the 4 node cluster should be able 
> to handle 2 threads of data coming at it.  Has anyone had any experience with 
> this number of indexes per column family? Any insight or suggestions would be 
> appreciated.
> 
> Thanks in advance--

Re: insert slowdown with secondary indexes

Reply via email to