Interesting. The way I understand the spark connector is that it's basically a client executing a cql query and filling a spark rdd. Spark will then handle the partitioning of data. Again, this is my understanding, and it maybe incorrect.
On Monday, September 15, 2014, Robert Coli <rc...@eventbrite.com> wrote: > On Mon, Sep 15, 2014 at 4:57 PM, Eric Plowe <eric.pl...@gmail.com > <javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com');>> wrote: > >> Based on this stackoverflow question, vnodes effect the number of mappers >> Hadoop needs to spawn. Which in then affect performance. >> >> With the spark connector for cassandra would the same situation happen? >> Would vnodes affect performance in a similar situation to Hadoop? >> > > I don't know what specifically Spark does here, but if it has the same > locality expectations as Hadoop generally, my belief would be : "yes." > > =Rob > >