Hi Aaron, thanks for the reply. I suspected it might be the
read-and-write that causes the slower updates.

Regards,
P.

On Tue, Apr 17, 2012 at 11:52, aaron morton <aa...@thelastpickle.com> wrote:
> Secondary indexes require a read and a write (potentially two) for every
> update. Regular mutations are no look writes and are much faster.
>
> Just like in a RDBMS, it's more efficient to insert data and then create the
> index than to insert data with the index present.
>
> An alternative is to create SSTables in the hadoop jobs and bulk load them
> into the cluster.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/04/2012, at 2:51 AM, Patrik Modesto wrote:
>
> Hi,
>
> I've a 4 node test cluster running Cassandra 1.0.9, 32GB memory, 4x
> 1TB disks. I've two keyspaces, rfTest2 (RF=2) and rfTest3 (RF=3).
> There are two CF, one with source data and one with secondary index:
>
> create column family UrlGroup
>    with column_type=Standard
>    and comparator=UTF8Type
>    and default_validation_class=UTF8Type
>    and key_validation_class=UTF8Type
>    and column_metadata=
>    [{
>        column_name: groupId,
>        validation_class: UTF8Type,
>        index_type: KEYS
>    }];
>
> I'm running Hadoop mapreduce job, reading the source CF and creating 3
> mutations for each row-key in the UrlGroup CF.
>
> The mapreduce runs for 30minutes. When I remove the secondary index,
> the mapreduce runs just 10minutes. There are 26,273,544 mutations
> total.
>
> Also with the secondary index, the nodes show very high load 50+ and
> iowait 70%+. Without secondary index the load is ~5 and iowait ~10%.
>
> What may be the problem?
>
> Regards,
> Patrik
>
>

Reply via email to