> We notice incredibly slow reads, 600mb in an hour, we are using quorum LOCAL_ONE reads. > The load_one of Cassandra increases from <1 to 60! There is no CPU wait, only user & nice.
Without seeing the code and query, it's hard to tell, but I noticed something similar when we had a client incorrectly using the 'take' method for a result count like so: val resultCount = query.take(count).length 'take' can call limit under the hood. The docs for the latter are interesting: "The limit will be applied for each created Spark partition. In other words, unless the data are fetched from a single Cassandra partition the number of results is unpredictable." [0] Removing that line (it wasnt necessary for the use case) and just relying on a simple 'myRDD.select("my_col")).toArray.foreach" got performance back to where it should be. Per the docs, limit (and therefore take) works fine as long as the partition key is used as a predicate in the where clause ("WHERE test_id = somevalue" in your example). [0] https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/CassandraRDD.scala#L92-L101 -- ----------------- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com