Hi, is there a paper or a document where one can read how Spark reads Cassandra 
data in parallel? And how it writes data back from RDDs? Its a bit hard to have 
a clear picture in mind.

Thank you,
Pavel Velikhov

> On Mar 3, 2015, at 1:08 AM, Rumph, Frens Jan <m...@frensjan.nl> wrote:
> 
> Hi all,
> 
> I didn't find the issues button on 
> https://github.com/datastax/spark-cassandra-connector/ 
> <https://github.com/datastax/spark-cassandra-connector/> so posting here.
> 
> Any one have an idea why token ranges are grouped into one partition per 
> executor? I expected at least one per core. Any suggestions on how to work 
> around this? Doing a repartition is way to expensive as I just want more 
> partitions for parallelism, not reshuffle ...
> 
> Thanks in advance!
> Frens Jan

Reply via email to