Re: saveAsTextFile error

2014-11-14 Thread Harold Nguyen
Hi Niko, It looks like you are calling a method on DStream, which does not exist. Check out: https://spark.apache.org/docs/1.1.0/streaming-programming-guide.html#output-operations-on-dstreams for the method "saveAsTextFiles" Harold On Fri, Nov 14, 2014 at 10:39 AM, Niko Gamulin wrote: > Hi,

Re: Can spark read and write to cassandra without HDFS?

2014-11-12 Thread Harold Nguyen
Hi Kevin, Yes, Spark can read and write to Cassandra without Hadoop. Have you seen this: https://github.com/datastax/spark-cassandra-connector Harold On Wed, Nov 12, 2014 at 9:28 PM, Kevin Burton wrote: > We have all our data in Cassandra so I’d prefer to not have to bring up > Hadoop/HDFS as

Spark Streaming - Most popular Twitter Hashtags

2014-11-03 Thread Harold Nguyen
Hi all, I was just reading this nice documentation here: http://ampcamp.berkeley.edu/3/exercises/realtime-processing-with-spark-streaming.html And got to the end of it, which says: "Note that there are more efficient ways to get the top 10 hashtags. For example, instead of sorting the entire of

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Harold Nguyen
Thanks Lalit, and Helena, What I'd like to do is manipulate the values within a DStream like this: DStream.foreachRDD( rdd => { val arr = record.toArray } I'd then like to be able to insert results from the arr back into Cassadnra, after I've manipulated the arr array. However, for all

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Harold Nguyen
d spark-cassandra-connector you > are using? > > Thanks! > > Helena > @helenaedelson > > On Oct 30, 2014, at 12:58 PM, Harold Nguyen wrote: > > > Hi all, > > > > I'd like to be able to modify values in a DStream, and then send it off > to an external s

Manipulating RDDs within a DStream

2014-10-30 Thread Harold Nguyen
Hi all, I'd like to be able to modify values in a DStream, and then send it off to an external source like Cassandra, but I keep getting Serialization errors and am not sure how to use the correct design pattern. I was wondering if you could help me. I'd like to be able to do the following: wor

Re: Manipulating RDDs within a DStream

2014-10-30 Thread Harold Nguyen
Hi, Sorry, there's a typo there: val arr = rdd.toArray Harold On Thu, Oct 30, 2014 at 9:58 AM, Harold Nguyen wrote: > Hi all, > > I'd like to be able to modify values in a DStream, and then send it off to > an external source like Cassandra, but I keep getting Seriali

NonSerializable Exception in foreachRDD

2014-10-30 Thread Harold Nguyen
Hi all, In Spark Streaming, when I do "foreachRDD" on my DStreams, I get a NonSerializable exception when I try to do something like: DStream.foreachRDD( rdd => { var sc.parallelize(Seq(("test", "blah"))) }) Is there any way around that ? Thanks, Harold

Re: Convert DStream to String

2014-10-29 Thread Harold Nguyen
mean to make a DStream into a String? it's inherently a > sequence of things over time, each of which might be a string but > which are usually RDDs of things. > > On Wed, Oct 29, 2014 at 11:15 PM, Harold Nguyen > wrote: > > Hi all, > > > > How do I convert a DStrea

Convert DStream to String

2014-10-29 Thread Harold Nguyen
Hi all, How do I convert a DStream to a string ? For instance, I want to be able to: val myword = words.filter(word => word.startsWith("blah")) And use "myword" in other places, like tacking it onto (key, value) pairs, like so: val pairs = words.map(word => (myword+"_"+word, 1)) Thanks for an

Re: Spark Streaming with Kinesis

2014-10-29 Thread Harold Nguyen
On Wed, Oct 29, 2014 at 9:22 AM, Harold Nguyen wrote: > Hi all, > > I followed the guide here: > http://spark.apache.org/docs/latest/streaming-kinesis-integration.html > > But got this error: > Exception in thread "main" java.lang.NoClassDefFoundError: > com/am

Spark Streaming with Kinesis

2014-10-29 Thread Harold Nguyen
Hi all, I followed the guide here: http://spark.apache.org/docs/latest/streaming-kinesis-integration.html But got this error: Exception in thread "main" java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider Would you happen to know what dependency or jar is needed ? Harold

Spark Streaming from Kafka

2014-10-28 Thread Harold Nguyen
Hi, Just wondering if you've seen the following error when reading from Kafka: ERROR ReceiverTracker: Deregistered receiver for stream 0: Error starting receiver 0 - java.lang.NoClassDefFoundError: scala/reflect/ClassManifest at kafka.utils.Log4jController$.(Log4jController.scala:29) at kafka.uti

Re: Including jars in Spark-shell vs Spark-submit

2014-10-28 Thread Harold Nguyen
quot;spark-cassandra-connector" % "1.1.0-alpha3" > withSources() withJavadoc(), > "org.apache.spark" %% "spark-sql" % "1.1.0" > ) > > - Helena > > On Oct 28, 2014, at 2:08 PM, Harold Nguyen wrote: > > > Hi all, > > >

Including jars in Spark-shell vs Spark-submit

2014-10-28 Thread Harold Nguyen
Hi all, The following works fine when submitting dependency jars through Spark-Shell: ./bin/spark-shell --master spark://ip-172-31-38-112:7077 --jars /home/ubuntu/spark-cassandra-connector/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.2

Saving to Cassandra from Spark Streaming

2014-10-28 Thread Harold Nguyen
Hi all, I'm having trouble troubleshooting this particular block of code for Spark Streaming and saving to Cassandra: val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x,

Re: Spark Streaming into Cassandra - NoClass ColumnMapper

2014-10-27 Thread Harold Nguyen
a) On Mon, Oct 27, 2014 at 9:22 PM, Harold Nguyen wrote: > Hi Spark friends, > > I'm trying to connect Spark Streaming into Cassandra by modifying the > NetworkWordCount.scala streaming example, and doing the "make as few > changes as possible" but having it insert d

Spark Streaming into Cassandra - NoClass ColumnMapper

2014-10-27 Thread Harold Nguyen
Hi Spark friends, I'm trying to connect Spark Streaming into Cassandra by modifying the NetworkWordCount.scala streaming example, and doing the "make as few changes as possible" but having it insert data into Cassandra. Could you let me know if you see any errors? I'm using the spark-cassandra-c