Index updates require read-before-write (to find out what the prior
version was, if any, and update the index accordingly).  This is
random i/o.

Index creation on the other hand is a lot of sequential i/o, hence
more efficient.

So, the classic bulk load advice to ingest data prior to creating
indexes applies.

On Sun, Jun 5, 2011 at 5:47 PM, Donal Zang <zan...@ihep.ac.cn> wrote:
> I did a insertion test with and without secondary indexes, and found that:
> Without secondary index: ~10864 rows inserted per second
> With secondary index on one column(BytesType): ~1515 rows inserted per
> second
> Is this normal? why secondary index would have so much affect?
>
> I noticed that If I build the index using “update column family ...” after I
> inserted all data (90578207 rows) , It will finish very quickly.
> I'm not very clear about how the secondary index works, will some one
> explain this ?
> Thanks!
> Donal
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Reply via email to