Re: Execption writing on two cassandra tables NoHostAvailableException: All host(s) tried for query failed (no host was tried)

2015-06-01 Thread Helena Edelson
astax.driver.core.AbstractSession.prepareAsync(AbstractSession.java:103) > at > com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:89) > ... 24 more > > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org > <http://org.apache.spark

Re: Execption writing on two cassandra tables NoHostAvailableException: All host(s) tried for query failed (no host was tried)

2015-06-01 Thread Helena Edelson
Hi Antonio, First, what version of the Spark Cassandra Connector are you using? You are using Spark 1.3.1, which the Cassandra connector today supports in builds from the master branch only - the release with public artifacts supporting Spark 1.3.1 is coming soon ;) Please see https://github.

Re: Grouping and storing unordered time series data stream to HDFS

2015-05-16 Thread Helena Edelson
Consider using cassandra with spark streaming and timeseries, cassandra has been doing time series for years. Here’s some snippets with kafka streaming and writing/reading the data back: https://github.com/killrweather/killrweather/blob/master/killrweather-app/src/main/scala/com/datastax/killrwea

Re: Spark streaming alerting

2015-03-24 Thread Helena Edelson
war Rizal wrote: > > Helena, > > The CassandraInputDStream sounds interesting. I dont find many things in the > jira though. Do you have more details on what it tries to achieve ? > > Thanks, > Anwar. > > On Tue, Mar 24, 2015 at 2:39 PM, Helena Edelson <mailto

Re: Spark streaming alerting

2015-03-24 Thread Helena Edelson
Streaming _from_ cassandra, CassandraInputDStream, is coming BTW https://issues.apache.org/jira/browse/SPARK-6283 I am working on it now. Helena @helenaedelson > On Mar 23, 2015, at 5:22 AM, Khanderao Kand Gmail > wrote: > > Akhil > > You

Re: How to parse Json formatted Kafka message in spark streaming

2015-03-05 Thread Helena Edelson
parse(v).extract[MonthlyCommits]} .saveToCassandra("githubstats","monthly_commits") HELENA EDELSON Senior Software Engineer, DSE Analytics On Mar 5, 2015, at 9:33 AM, Ted Yu wrote: > Cui: > You can check messages.partitions.size to determine whether messages i

Re: How to parse Json formatted Kafka message in spark streaming

2015-03-05 Thread Helena Edelson
Hi Cui, What version of Spark are you using? There was a bug ticket that may be related to this, fixed in core/src/main/scala/org/apache/spark/rdd/RDD.scala that is merged into versions 1.3.0 and 1.2.1 . If you are using 1.1.1 that may be the reason but it’s a stretch https://issues.apache.org/

Re: JSON Input files

2014-12-13 Thread Helena Edelson
One solution can be found here: https://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets - Helena @helenaedelson On Dec 13, 2014, at 11:18 AM, Madabhattula Rajesh Kumar wrote: > Hi Team, > > I have a large JSON file in Hadoop. Could you please let me know > > 1. How to

Re: Error: Spark-streaming to Cassandra

2014-12-13 Thread Helena Edelson
I am curious why you use the 1.0.4 java artifact with the latest 1.1.0? This might be your compilation problem - The older java version. com.datastax.spark spark-cassandra-connector_2.10 1.1.0 com.datastax.spark spark-cassandra-connector-java_2.10 1.0.4 See: - doc https:

Re: Spark-Streaming: output to cassandra

2014-12-05 Thread Helena Edelson
at I can subsequently use > JavaRDD rdd = sc.parallelize(list); > javaFunctions(rdd, TestTable.class).saveToCassandra("testkeyspace", > "test_table"); > to save the RDD data into Cassandra. > > I had tried coding this way: > messages.foreachRDD(new Function,

Re: Spark-Streaming: output to cassandra

2014-12-05 Thread Helena Edelson
You can just do You can just do something like this, the Spark Cassandra Connector handles the rest KafkaUtils.createStream[String, String, StringDecoder, StringDecoder]( ssc, kafkaParams, Map(KafkaTopicRaw -> 10), StorageLevel.DISK_ONLY_2) .map { case (_, line) => line.split(",")} .map(Ra

Re: Spark streaming cannot receive any message from Kafka

2014-11-13 Thread Helena Edelson
I encounter no issues with streaming from kafka to spark in 1.1.0. Do you perhaps have a version conflict? Helena On Nov 13, 2014 12:55 AM, "Jay Vyas" wrote: > Yup , very important that n>1 for spark streaming jobs, If local use > local[2] > > The thing to remember is that your spark receiv

Re: Cassandra spark connector exception: "NoSuchMethodError: com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;"

2014-11-11 Thread Helena Edelson
Hi, It looks like you are building from master (spark-cassandra-connector-assembly-1.2.0). - Append this to your com.google.guava declaration: % "provided" - Be sure your version of the connector dependency is the same as the assembly build. For instance, if you are using 1.1.0-beta1, build your

Re: Accessing Cassandra with SparkSQL, Does not work?

2014-10-31 Thread Helena Edelson
.0.0", > > "org.scalatest" %% "scalatest" % "1.9.1" % "test", > > "org.apache.spark" %% "spark-sql" % "1.1.0" % "provided", > > "org.apache.spark" %% "spark-hive&quo

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Helena Edelson
Hi Harold, This is a great use case, and here is how you could do it, for example, with Spark Streaming: Using a Kafka stream: https://github.com/killrweather/killrweather/blob/master/killrweather-app/src/main/scala/com/datastax/killrweather/KafkaStreamingActor.scala#L50 Save raw data to Cassand

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Helena Edelson
r-assembly-1.2.0-SNAPSHOT > > Best wishes, > > Harold > > On Fri, Oct 31, 2014 at 10:31 AM, Helena Edelson > wrote: > Hi Harold, > Can you include the versions of spark and spark-cassandra-connector you are > using? > > Thanks! > > Helena > @helenaedelson &

Re: Accessing Cassandra with SparkSQL, Does not work?

2014-10-31 Thread Helena Edelson
Hi Shahab, I’m just curious, are you explicitly needing to use thrift? Just using the connector with spark does not require any thrift dependencies. Simply: "com.datastax.spark" %% "spark-cassandra-connector" % "1.1.0-beta1” But to your question, you declare the keyspace but also unnecessarily r

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Helena Edelson
Hi Harold, Can you include the versions of spark and spark-cassandra-connector you are using? Thanks! Helena @helenaedelson On Oct 30, 2014, at 12:58 PM, Harold Nguyen wrote: > Hi all, > > I'd like to be able to modify values in a DStream, and then send it off to an > external source like C

Re: Spark SQL on Cassandra

2014-10-31 Thread Helena Edelson
You can use https://github.com/datastax/spark-cassandra-connector to integrate Cassandra using Spark SQL. docs in progress but for now: https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/org/apache/spark/sql/cassandra/CassandraSQLContext.sca

Re: Best way to partition RDD

2014-10-30 Thread Helena Edelson
-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/CassandraRDD.scala#L26-L37 Cheers, Helena @helenaedelson On Oct 30, 2014, at 1:12 PM, Helena Edelson wrote: > Hi Shahab, > -How many spark/cassandra nodes are in your cluster? > -What is your deploy topology for

Re: Best way to partition RDD

2014-10-30 Thread Helena Edelson
Hi Shahab, -How many spark/cassandra nodes are in your cluster? -What is your deploy topology for spark and cassandra clusters? Are they co-located? - Helena @helenaedelson On Oct 30, 2014, at 12:16 PM, shahab wrote: > Hi. > > I am running an application in the Spark which first loads data f

Re: PySpark and Cassandra 2.1 Examples

2014-10-29 Thread Helena Edelson
Nice! - Helena @helenaedelson On Oct 29, 2014, at 12:01 PM, Mike Sukmanowsky wrote: > Hey all, > > Just thought I'd share this with the list in case any one else would benefit. > Currently working on a proper integration of PySpark and DataStax's new > Cassandra-Spark connector, but that'

Re: Including jars in Spark-shell vs Spark-submit

2014-10-28 Thread Helena Edelson
d to piece together 6 > different forums and sites to get that working (being absolutely new to Spark > and Scala and sbt). I'll write a blog post on how to get this working later, > in case it can help someone. > > I really appreciate the help! > > Harold >

Re: Including jars in Spark-shell vs Spark-submit

2014-10-28 Thread Helena Edelson
Hi Harold, It seems like, based on your previous post, you are using one version of the connector as a dependency yet building the assembly jar from master? You were using 1.1.0-alpha3 (you can upgrade to alpha4, beta coming this week) yet your assembly is spark-cassandra-connector-assembly-1.2.

Re: NoSuchMethodError: cassandra.thrift.ITransportFactory.openTransport()

2014-10-27 Thread Helena Edelson
Hi Sasi, Thrift is not needed to integrate Cassandra with Spark. In fact the only dep you need is spark-cassandra-connector_2.10-1.1.0-alpha3.jar, and you can upgrade to alpha4; we’re publishing beta very soon. For future reference, questions/tickets can be created here:https://github.com/data

Re: Spark as Relational Database

2014-10-26 Thread Helena Edelson
Hi, It is very easy to integrate using Cassandra in a use case such as this. For instance, do your joins in Spark and do your data storage in Cassandra which allows a very flexible schema, unlike a relational DB, and is much faster, fault tolerant, and with spark and colocation WRT data locality