Hi, is there a paper or a document where one can read how Spark reads Cassandra data in parallel? And how it writes data back from RDDs? Its a bit hard to have a clear picture in mind.
Thank you, Pavel Velikhov > On Mar 3, 2015, at 1:08 AM, Rumph, Frens Jan <m...@frensjan.nl> wrote: > > Hi all, > > I didn't find the issues button on > https://github.com/datastax/spark-cassandra-connector/ > <https://github.com/datastax/spark-cassandra-connector/> so posting here. > > Any one have an idea why token ranges are grouped into one partition per > executor? I expected at least one per core. Any suggestions on how to work > around this? Doing a repartition is way to expensive as I just want more > partitions for parallelism, not reshuffle ... > > Thanks in advance! > Frens Jan