Secondary Index on table with a lot of data crashes Cassandra

Tamar Rosen Thu, 25 Apr 2013 01:04:30 -0700

Hi,

We have a case of a reproducible crash, probably due to out of memory, but
I don't understand why.


The installation is currently single node.

We have a column family with approx 50000 rows.

In cql, the CF definition is:

CREATE TABLE users (
  user_name text PRIMARY KEY,
  big_json text,
  status int);

Each big_json can have 500K or more of data.


There is also a secondary index on the status column.

Status can have various values, over 90% of all rows have status = 2.


Calling:

Select user_name from users limit 80000;

Is pretty fast


Calling:

Select user_name from users where status = 1;

is slower, even though much less data is returned.


Calling:

Select user_name from users where status = 2;

Always crashes.


What are we doing wrong? Can it be that Cassandra is actually trying
to read all the CF data rather than just the keys! (actually, it
doesn't need to go to the users CF at all - all the data it needs is
in the index CF)

Also, in the code I am doing the same using Astyanax index query with
pagination, and the behavior is the same.


Please help me:

1. solve the immediate issue

2. understand if there is something in this use case which indicates
that we are not using Cassandra the way it is meant.


Thanks,


Tamar Rosen

Correlor.com

Secondary Index on table with a lot of data crashes Cassandra

Reply via email to