Index updates require read-before-write (to find out what the prior version was, if any, and update the index accordingly). This is random i/o.
Index creation on the other hand is a lot of sequential i/o, hence more efficient. So, the classic bulk load advice to ingest data prior to creating indexes applies. On Sun, Jun 5, 2011 at 5:47 PM, Donal Zang <zan...@ihep.ac.cn> wrote: > I did a insertion test with and without secondary indexes, and found that: > Without secondary index: ~10864 rows inserted per second > With secondary index on one column(BytesType): ~1515 rows inserted per > second > Is this normal? why secondary index would have so much affect? > > I noticed that If I build the index using “update column family ...” after I > inserted all data (90578207 rows) , It will finish very quickly. > I'm not very clear about how the secondary index works, will some one > explain this ? > Thanks! > Donal > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com