Re: Poor write performance with seconrady index

aaron morton Tue, 17 Apr 2012 02:53:49 -0700

Secondary indexes require a read and a write (potentially two) for every 
update. Regular mutations are no look writes and are much faster.


Just like in a RDBMS, it's more efficient to insert data and then create the 
index than to insert data with the index present. 

An alternative is to create SSTables in the hadoop jobs and bulk load them into 
the cluster. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/04/2012, at 2:51 AM, Patrik Modesto wrote:

> Hi,
> 
> I've a 4 node test cluster running Cassandra 1.0.9, 32GB memory, 4x
> 1TB disks. I've two keyspaces, rfTest2 (RF=2) and rfTest3 (RF=3).
> There are two CF, one with source data and one with secondary index:
> 
> create column family UrlGroup
>    with column_type=Standard
>    and comparator=UTF8Type
>    and default_validation_class=UTF8Type
>    and key_validation_class=UTF8Type
>    and column_metadata=
>    [{
>        column_name: groupId,
>        validation_class: UTF8Type,
>        index_type: KEYS
>    }];
> 
> I'm running Hadoop mapreduce job, reading the source CF and creating 3
> mutations for each row-key in the UrlGroup CF.
> 
> The mapreduce runs for 30minutes. When I remove the secondary index,
> the mapreduce runs just 10minutes. There are 26,273,544 mutations
> total.
> 
> Also with the secondary index, the nodes show very high load 50+ and
> iowait 70%+. Without secondary index the load is ~5 and iowait ~10%.
> 
> What may be the problem?
> 
> Regards,
> Patrik

Re: Poor write performance with seconrady index

Reply via email to