Re: Cassandra, vnodes, and spark

2014-09-16 Thread George Stergiou
Run into this performance report https://github.com/datastax/spark-cassandra-connector/issues/200 Does spark connector in its current state issue one CQL per vnode or task per vnode? Regards. On Tue, Sep 16, 2014 at 2:05 AM, DuyHai Doan wrote: > Look into the source code of the Spark connecto

Re: Cassandra, vnodes, and spark

2014-09-15 Thread DuyHai Doan
Look into the source code of the Spark connector. CassandraRDD try to find all token ranges (even when using vnodes) for each node (endpoint) and create RDD partition to match this distribution of token ranges. Thus data locality is guaranteed On Tue, Sep 16, 2014 at 4:39 AM, Eric Plowe wrote: >

Re: Cassandra, vnodes, and spark

2014-09-15 Thread Eric Plowe
Interesting. The way I understand the spark connector is that it's basically a client executing a cql query and filling a spark rdd. Spark will then handle the partitioning of data. Again, this is my understanding, and it maybe incorrect. On Monday, September 15, 2014, Robert Coli wrote: > On Mo

Re: Cassandra, vnodes, and spark

2014-09-15 Thread Robert Coli
On Mon, Sep 15, 2014 at 4:57 PM, Eric Plowe wrote: > Based on this stackoverflow question, vnodes effect the number of mappers > Hadoop needs to spawn. Which in then affect performance. > > With the spark connector for cassandra would the same situation happen? > Would vnodes affect performance i

Re: Cassandra, vnodes, and spark

2014-09-15 Thread Eric Plowe
As hadoop* again sorry.. On Monday, September 15, 2014, Eric Plowe wrote: > Sorry. Trigger finger on the send. > > Would vnodes affect performance for spark in a similar fashion for spark. > > On Monday, September 15, 2014, Eric Plowe > wrote: > >> Hello. >> >> >> http://stackoverflow.com/quest

Re: Cassandra, vnodes, and spark

2014-09-15 Thread Eric Plowe
Sorry. Trigger finger on the send. Would vnodes affect performance for spark in a similar fashion for spark. On Monday, September 15, 2014, Eric Plowe wrote: > Hello. > > > http://stackoverflow.com/questions/19969329/why-not-enable-virtual-node-in-an-hadoop-node/19974621#19974621 > > Based on t

Cassandra, vnodes, and spark

2014-09-15 Thread Eric Plowe
Hello. http://stackoverflow.com/questions/19969329/why-not-enable-virtual-node-in-an-hadoop-node/19974621#19974621 Based on this stackoverflow question, vnodes effect the number of mappers Hadoop needs to spawn. Which in then affect performance. With the spark connector for cassandra would the s