Hi Aaron, thanks for the reply. I suspected it might be the read-and-write that causes the slower updates.
Regards, P. On Tue, Apr 17, 2012 at 11:52, aaron morton <aa...@thelastpickle.com> wrote: > Secondary indexes require a read and a write (potentially two) for every > update. Regular mutations are no look writes and are much faster. > > Just like in a RDBMS, it's more efficient to insert data and then create the > index than to insert data with the index present. > > An alternative is to create SSTables in the hadoop jobs and bulk load them > into the cluster. > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 17/04/2012, at 2:51 AM, Patrik Modesto wrote: > > Hi, > > I've a 4 node test cluster running Cassandra 1.0.9, 32GB memory, 4x > 1TB disks. I've two keyspaces, rfTest2 (RF=2) and rfTest3 (RF=3). > There are two CF, one with source data and one with secondary index: > > create column family UrlGroup > with column_type=Standard > and comparator=UTF8Type > and default_validation_class=UTF8Type > and key_validation_class=UTF8Type > and column_metadata= > [{ > column_name: groupId, > validation_class: UTF8Type, > index_type: KEYS > }]; > > I'm running Hadoop mapreduce job, reading the source CF and creating 3 > mutations for each row-key in the UrlGroup CF. > > The mapreduce runs for 30minutes. When I remove the secondary index, > the mapreduce runs just 10minutes. There are 26,273,544 mutations > total. > > Also with the secondary index, the nodes show very high load 50+ and > iowait 70%+. Without secondary index the load is ~5 and iowait ~10%. > > What may be the problem? > > Regards, > Patrik > >