Re: Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-15 Thread Abraham Jacob
ugh > which you may be referring to one of its members. If not, it's still > possible the closure cleaner isn't removing the reference even though > it could. > > Is ReduceWords actually an inner class? > > Or on another tangent, when you remove reduceByKey, you ar

Re: Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-14 Thread Abraham Jacob
, Michael Campbell < michael.campb...@gmail.com> wrote: > Do you get any different results if you have ReduceWords actually > implement java.io.Serializable? > > On Tue, Oct 14, 2014 at 7:35 PM, Abraham Jacob > wrote: > >> Yeah... it totally should be... T

Re: Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-14 Thread Abraham Jacob
first + second; } } On Tue, Oct 14, 2014 at 4:16 PM, Stephen Boesch wrote: > Is ReduceWords serializable? > > 2014-10-14 16:11 GMT-07:00 Abraham Jacob : > > >> Hi All, >> >> I am trying to understand what is going on in my simple WordCount Spark >> S

Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-14 Thread Abraham Jacob
Hi All, I am trying to understand what is going on in my simple WordCount Spark Streaming application. Here is the setup - I have a Kafka producer that is streaming words (lines of text). On the flip side, I have a spark streaming application that uses the high-level Kafka/Spark connector to read

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
d of partition > (Kafka is set “auto.commit.offset” to “largest” by default). > > > > If you want to keep the same semantics as Kafka, you need to remove the > above code path manually and recompile the Spark. > > > > Thanks > > Jerry > > > > *From:*

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
sues.apache.org/jira/browse/SPARK-2492 > ). > > > > Thanks > > Jerry > > > > *From:* Abraham Jacob [mailto:abe.jac...@gmail.com] > *Sent:* Saturday, October 11, 2014 6:57 AM > *To:* Sean McNamara > *Cc:* user@spark.apache.org > *Subject:* Re: Spark Streamin

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
che-spark-user-list.1001560.n3.nabble.com/spark-streaming-and-the-spark-shell-tp3347p3387.html> and that discussion <http://markmail.org/message/257a5l3oqyftsjxj>. Hmm interesting... Wondering what happens if I set it as largest...? On Fri, Oct 10, 2014 at 3:47 PM, Abraham Jacob wr

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
kafkaStreams.get(0), kafkaStreams.subList(1, kafkaStreams.size())); } else { unifiedStream = kafkaStreams.get(0); } unifiedStream.print(); jssc.start(); jssc.awaitTermination(); -abe On Fri, Oct 10, 2014 at 3:37 PM, Sean McNamara wrote: > Would you mind sharing the code leading to y

Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
Hi Folks, I am seeing some strange behavior when using the Spark Kafka connector in Spark streaming. I have a Kafka topic which has 8 partitions. I have a kafka producer that pumps some messages into this topic. On the consumer side I have a spark streaming application that that has 8 executors

Re: SparkStreaming program does not start

2014-10-07 Thread Abraham Jacob
I have not played around with spark-shell much (especially for spark streaming), but was just suggesting that spark-submit logs could possibly tell you whats going on and yes for that you would need to create a jar. I am not even sure that you can give a .scala file to spark-shell Usage: ./bin/sp

Re: Spark / Kafka connector - CDH5 distribution

2014-10-07 Thread Abraham Jacob
Never mind... my bad... made a typo. looks good. Thanks, On Tue, Oct 7, 2014 at 3:57 PM, Abraham Jacob wrote: > Thanks Sean, > > Sorry in my earlier question I meant to type CDH5.1.3 not CDH5.1.2 > > I presume it's included in spark-streaming_2.10-1.0.0-cdh5.1.3 >

Re: SparkStreaming program does not start

2014-10-07 Thread Abraham Jacob
Try using spark-submit instead of spark-shell On Tue, Oct 7, 2014 at 3:47 PM, spr wrote: > I'm probably doing something obviously wrong, but I'm not seeing it. > > I have the program below (in a file try1.scala), which is similar but not > identical to the examples. > > import org.apache.spark

Re: Spark / Kafka connector - CDH5 distribution

2014-10-07 Thread Abraham Jacob
spark-streaming_2.10-1.0.0-cdh5.1.3.jar file in the project. Where can I find it in the CDH5.1.3 spark distribution? On Tue, Oct 7, 2014 at 3:40 PM, Sean Owen wrote: > Yes, it is the entire Spark distribution. > On Oct 7, 2014 11:36 PM, "Abraham Jacob" wrote: > >> Hi All, >

Spark / Kafka connector - CDH5 distribution

2014-10-07 Thread Abraham Jacob
Hi All, Does anyone know if CDH5.1.2 packages the spark streaming kafka connector under the spark externals project? -- ~

Re: Spark Streaming saveAsNewAPIHadoopFiles

2014-10-07 Thread Abraham Jacob
stream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } Less confusion, more readability and better consistency... -abe On Mon, Oct 6, 2014 at 1:51 PM, Abraham Jacob wrote: > Sean, > > Thanks a ton Sean... This is exactly what I was looki

Re: Spark Streaming saveAsNewAPIHadoopFiles

2014-10-06 Thread Abraham Jacob
Sean... On Mon, Oct 6, 2014 at 12:23 PM, Sean Owen wrote: > Here's an example: > > > https://github.com/OryxProject/oryx/blob/master/oryx-lambda/src/main/java/com/cloudera/oryx/lambda/BatchLayer.java#L131 > > On Mon, Oct 6, 2014 at 7:39 PM, Abraham Jac

Spark Streaming saveAsNewAPIHadoopFiles

2014-10-06 Thread Abraham Jacob
Hi All, Would really appreciate from the community if anyone has implemented the saveAsNewAPIHadoopFiles method in "Java" found in the org.apache.spark.streaming.api.java.JavaPairDStream Any code snippet or online link would be greatly appreciated. Regards, Jacob

Re: Spark Streaming writing to HDFS

2014-10-04 Thread Abraham Jacob
1:33 AM, Sean Owen wrote: > Are you importing the '.mapred.' version of TextOutputFormat instead > of the new API '.mapreduce.' version? > > On Sat, Oct 4, 2014 at 1:08 AM, Abraham Jacob > wrote: > > Hi All, > > > > > > Would really appre

Spark Streaming writing to HDFS

2014-10-03 Thread Abraham Jacob
Hi All, Would really appreciate if someone in the community can help me with this. I have a simple Java spark streaming application - NetworkWordCount SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("Streaming WordCount"); JavaStreamingContext jssc = ne