One more update, it looks like the driver is generating this CQL statements:
SELECT "test_id", "channel", "ts", "event", "groups" FROM "KEYSPACE"."test" WHERE token("test_id") > ? AND token("test_id") <= ? ALLOW FILTERING; Best regards, Nathan On Fri, Jun 26, 2015 at 8:16 PM Nathan Bijnens <nat...@nathan.gs> wrote: > Thanks for the suggestion, will take a look. > > Our code looks like this: > > val rdd = sc.cassandraTable[EventV0](keyspace, "test") > > val transformed = rdd.map{e => EventV1(e.testId, e.ts, e.channel, e.groups, > e.event)} > transformed.saveToCassandra(keyspace, "test_v1") > > Not sure if this code might translate to limits. > > The total date in this table is +/- 2gb on disk, total data for each node > is around 290gb. > > On Fri, Jun 26, 2015 at 7:01 PM Nate McCall <n...@thelastpickle.com> > wrote: > >> > We notice incredibly slow reads, 600mb in an hour, we are using quorum >> LOCAL_ONE reads. >> > The load_one of Cassandra increases from <1 to 60! There is no CPU >> wait, only user & nice. >> >> Without seeing the code and query, it's hard to tell, but I noticed >> something similar when we had a client incorrectly using the 'take' method >> for a result count like so: >> val resultCount = query.take(count).length >> >> 'take' can call limit under the hood. The docs for the latter are >> interesting: >> "The limit will be applied for each created Spark partition. In other >> words, unless the data are fetched from a single Cassandra partition the >> number of results is unpredictable." [0] >> >> Removing that line (it wasnt necessary for the use case) and just relying >> on a simple 'myRDD.select("my_col")).toArray.foreach" got performance back >> to where it should be. Per the docs, limit (and therefore take) works fine >> as long as the partition key is used as a predicate in the where clause >> ("WHERE test_id = somevalue" in your example). >> >> [0] >> https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/CassandraRDD.scala#L92-L101 >> >> -- >> ----------------- >> Nate McCall >> Austin, TX >> @zznate >> >> Co-Founder & Sr. Technical Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> >