Re: Cassandra, vnodes, and spark

George Stergiou Tue, 16 Sep 2014 05:33:59 -0700

Run into this performance report

https://github.com/datastax/spark-cassandra-connector/issues/200


Does spark connector in its current state issue one CQL per vnode or task
per vnode?

Regards.

On Tue, Sep 16, 2014 at 2:05 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> Look into the source code of the Spark connector. CassandraRDD try to find
> all token ranges (even when using vnodes) for each node (endpoint) and
> create RDD partition to match this distribution of token ranges. Thus data
> locality is guaranteed
>
> On Tue, Sep 16, 2014 at 4:39 AM, Eric Plowe <eric.pl...@gmail.com> wrote:
>
>> Interesting. The way I understand the spark connector is that it's
>> basically a client executing a cql query and filling a spark rdd. Spark
>> will then handle the partitioning of data. Again, this is my understanding,
>> and it maybe incorrect.
>>
>>
>> On Monday, September 15, 2014, Robert Coli <rc...@eventbrite.com> wrote:
>>
>>> On Mon, Sep 15, 2014 at 4:57 PM, Eric Plowe <eric.pl...@gmail.com>
>>> wrote:
>>>
>>>> Based on this stackoverflow question, vnodes effect the number of
>>>> mappers Hadoop needs to spawn. Which in then affect performance.
>>>>
>>>> With the spark connector for cassandra would the same situation happen?
>>>> Would vnodes affect performance in a similar situation to Hadoop?
>>>>
>>>
>>> I don't know what specifically Spark does here, but if it has the same
>>> locality expectations as Hadoop generally, my belief would be : "yes."
>>>
>>> =Rob
>>>
>>>
>

Re: Cassandra, vnodes, and spark

Reply via email to