Secondary indexes require a read and a write (potentially two) for every update. Regular mutations are no look writes and are much faster.
Just like in a RDBMS, it's more efficient to insert data and then create the index than to insert data with the index present. An alternative is to create SSTables in the hadoop jobs and bulk load them into the cluster. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/04/2012, at 2:51 AM, Patrik Modesto wrote: > Hi, > > I've a 4 node test cluster running Cassandra 1.0.9, 32GB memory, 4x > 1TB disks. I've two keyspaces, rfTest2 (RF=2) and rfTest3 (RF=3). > There are two CF, one with source data and one with secondary index: > > create column family UrlGroup > with column_type=Standard > and comparator=UTF8Type > and default_validation_class=UTF8Type > and key_validation_class=UTF8Type > and column_metadata= > [{ > column_name: groupId, > validation_class: UTF8Type, > index_type: KEYS > }]; > > I'm running Hadoop mapreduce job, reading the source CF and creating 3 > mutations for each row-key in the UrlGroup CF. > > The mapreduce runs for 30minutes. When I remove the secondary index, > the mapreduce runs just 10minutes. There are 26,273,544 mutations > total. > > Also with the secondary index, the nodes show very high load 50+ and > iowait 70%+. Without secondary index the load is ~5 and iowait ~10%. > > What may be the problem? > > Regards, > Patrik