Hi all,

I have a cluster of 2.0.6 and one of my tables is like this:
CREATE TABLE word (
  user text,
  word text,
  flag double,
  PRIMARY KEY (user, word)
)

each "user" has about 10000 "word" per node. I have a requirement of
selecting all rows where user='someuser' and word is in a large set whose
size is about 1000 .

In C* document, it is not recommended to use "select ... in" just like:

select from word where user='someuser' and word in ('a','b','aa','ab',...)

So now I select all rows where user='someuser' and filtrate them via client
rather than via C*. Of course, I use Datastax Java Driver to page the
resultset by setFetchSize(1000).  Is it the best way? I found the system's
load is high because of large range query, should I change to select for
only one row each time and select 1000 times?

just like:
select from word where user='someuser' and word = 'a';
select from word where user='someuser' and word = 'b';
select from word where user='someuser' and word = 'c';
.....

Which method will cause lower pressure on Cassandra cluster?

Thanks,
Philo Yang

Reply via email to