One more update, it looks like the driver is generating this CQL statements:

SELECT
 "test_id", "channel", "ts", "event", "groups" FROM "KEYSPACE"."test" WHERE
token("test_id") > ? AND token("test_id") <= ?   ALLOW FILTERING;

Best regards,
  Nathan

On Fri, Jun 26, 2015 at 8:16 PM Nathan Bijnens <nat...@nathan.gs> wrote:

> Thanks for the suggestion, will take a look.
>
> Our code looks like this:
>
> val rdd = sc.cassandraTable[EventV0](keyspace, "test")
>
> val transformed = rdd.map{e => EventV1(e.testId, e.ts, e.channel, e.groups, 
> e.event)}
> transformed.saveToCassandra(keyspace, "test_v1")
>
> Not sure if this code might translate to limits.
>
> The total date in this table is +/- 2gb on disk, total data for each node
> is around 290gb.
>
> On Fri, Jun 26, 2015 at 7:01 PM Nate McCall <n...@thelastpickle.com>
> wrote:
>
>> > We notice incredibly slow reads, 600mb in an hour, we are using quorum
>> LOCAL_ONE reads.
>> > The load_one of Cassandra increases from <1 to 60! There is no CPU
>> wait, only user & nice.
>>
>> Without seeing the code and query, it's hard to tell, but I noticed
>> something similar when we had a client incorrectly using the 'take' method
>> for a result count like so:
>> val resultCount = query.take(count).length
>>
>> 'take' can call limit under the hood. The docs for the latter are
>> interesting:
>> "The limit will be applied for each created Spark partition. In other
>> words, unless the data are fetched from a single Cassandra partition the
>> number of results is unpredictable." [0]
>>
>> Removing that line (it wasnt necessary for the use case) and just relying
>> on a simple 'myRDD.select("my_col")).toArray.foreach" got performance back
>> to where it should be. Per the docs, limit (and therefore take) works fine
>> as long as the partition key is used as a predicate in the where clause
>> ("WHERE test_id = somevalue" in your example).
>>
>> [0]
>> https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/CassandraRDD.scala#L92-L101
>>
>> --
>> -----------------
>> Nate McCall
>> Austin, TX
>> @zznate
>>
>> Co-Founder & Sr. Technical Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>

Reply via email to