>version We are on DSE 4.7. (Cassandra 2.1) and spark 1.2.1 >cqlsh select * from site_users returns fast, subsecond, only 3 rows
>Can you show some code how you're doing the reads? dse beeline !connect ... select * from site_users --table has 3 rows, several columns in each row. Spark eunts 769 tasks and estimates input as 800000 TB 0: jdbc:hive2://dsenode01:10000> select count(*) from site_users; +------+ | _c0 | +------+ | 3 | +------+ 1 row selected (41.635 seconds) >Spark and Cassandra-connector /usr/share/dse/spark/lib/spark-cassandra-connector-java_2.10-1.2.1.jar /usr/share/dse/spark/lib/spark-cassandra-connector_2.10-1.2.1.jar 2015-06-17 13:52 GMT+02:00 Yana Kadiyska <yana.kadiy...@gmail.com>: > Can you show some code how you're doing the reads? Have you successfully > read other stuff from Cassandra (i.e. do you have a lot of experience with > this path and this particular table is causing issues or are you trying to > figure out the right way to do a read). > > What version of Spark and Cassandra-connector are you using? > Also, what do you get for "select count(*) from foo" -- is that just as > bad? > > On Wed, Jun 17, 2015 at 4:37 AM, Serega Sheypak <serega.shey...@gmail.com> > wrote: > >> Hi, can somebody suggest me the way to reduce quantity of task? >> >> 2015-06-15 18:26 GMT+02:00 Serega Sheypak <serega.shey...@gmail.com>: >> >>> Hi, I'm running spark sql against Cassandra table. I have 3 C* nodes, >>> Each of them has spark worker. >>> The problem is that spark runs 869 task to read 3 lines: select bar from >>> foo. >>> I've tried these properties: >>> >>> #try to avoid 769 tasks per dummy select foo from bar qeury >>> spark.cassandra.input.split.size_in_mb=32mb >>> spark.cassandra.input.fetch.size_in_rows=1000 >>> spark.cassandra.input.split.size=10000 >>> >>> but it doesn't help. >>> >>> Here are mean metrics for the job : >>> input1= 8388608.0 TB >>> input2 = -320 B >>> input3 = -400 B >>> >>> I'm confused with input, there are only 3 rows in C* table. >>> Definitely, I don't have 8388608.0 TB of data :) >>> >>> >>> >>> >> >