Caching in Spark

2016-01-22 Thread Sourabh Chandak
Hi, I have a spark app which internally splits into 2 jobs coz we write to 2 different cassandra tables. The input data comes from the same cassandra table, so after reading data from cassandra and apply few transformations I cache one of the RDD and fork the program to compute both the metrics. I

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
ich saves the actual intermediate RDD data) > > TD > > On Fri, Oct 2, 2015 at 2:56 PM, Sourabh Chandak > wrote: > >> Tried using local checkpointing as well, and even that becomes slow after >> sometime. Any idea what can be wrong? >> >> Thanks, >> Sour

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
Tried using local checkpointing as well, and even that becomes slow after sometime. Any idea what can be wrong? Thanks, Sourabh On Fri, Oct 2, 2015 at 9:35 AM, Sourabh Chandak wrote: > I can see the entries processed in the table very fast but after that it > takes a long time f

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
ure it's checkpointing speed? > > Have you compared it against checkpointing to hdfs, s3, or local disk? > > On Fri, Oct 2, 2015 at 1:17 AM, Sourabh Chandak > wrote: > >> Hi, >> >> I have a receiverless kafka streaming job which was started yesterday >>

Checkpointing is super slow

2015-10-01 Thread Sourabh Chandak
Hi, I have a receiverless kafka streaming job which was started yesterday evening and was running fine till 4 PM today. Suddenly post that writing of checkpoint has slowed down and it is now not able to catch up with the incoming data. We are using the DSE stack with Spark 1.2 and Cassandra for ch

Re: spark.streaming.kafka.maxRatePerPartition for direct stream

2015-10-01 Thread Sourabh Chandak
> *Sent:* Thursday, October 1, 2015 11:46 PM > *To:* Sourabh Chandak > *Cc:* user > *Subject:* Re: spark.streaming.kafka.maxRatePerPartition for direct stream > > That depends on your job, your cluster resources, the number of seconds > per batch... > > You'll need to do s

spark.streaming.kafka.maxRatePerPartition for direct stream

2015-10-01 Thread Sourabh Chandak
Hi, I am writing a spark streaming job using the direct stream method for kafka and wanted to handle the case of checkpoint failure when we'll have to reprocess the entire data from starting. By default for every new checkpoint it tries to load everything from each partition and that takes a lot o

Re: Adding / Removing worker nodes for Spark Streaming

2015-09-28 Thread Sourabh Chandak
I also have the same use case as Augustus, and have some basic questions about recovery from checkpoint. I have a 10 node Kafka cluster and a 30 node Spark cluster running streaming job, how is the (topic, partition) data handled in checkpointing. The scenario I want to understand is, in case of no

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-25 Thread Sourabh Chandak
ach > individual error, instead of only printing the message. > > > > > On Thu, Sep 24, 2015 at 5:00 PM, Sourabh Chandak > wrote: > >> I was able to get pass this issue. I was pointing the SSL port whereas >> SimpleConsumer should point to the PLA

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
ing("Throwing this errir\n")), ok => ok ) } On Thu, Sep 24, 2015 at 3:00 PM, Sourabh Chandak wrote: > I was able to get pass this issue. I was pointing the SSL port whereas > SimpleConsumer should point to the PLAINTEXT port. But after fixing that I > am getting the follo

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
a profiler on it to see what's taking > up heap. > > > > On Thu, Sep 24, 2015 at 3:51 PM, Sourabh Chandak > wrote: > >> Adding Cody and Sriharsha >> >> On Thu, Sep 24, 2015 at 1:25 PM, Sourabh Chandak >> wrote: >> >>> Hi, >>>

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
Adding Cody and Sriharsha On Thu, Sep 24, 2015 at 1:25 PM, Sourabh Chandak wrote: > Hi, > > I have ported receiver less spark streaming for kafka to Spark 1.2 and am > trying to run a spark streaming job to consume data form my broker, but I > am getting the following error: >

ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
Hi, I have ported receiver less spark streaming for kafka to Spark 1.2 and am trying to run a spark streaming job to consume data form my broker, but I am getting the following error: 15/09/24 20:17:45 ERROR BoundedByteBufferReceive: OOME with size 352518400 java.lang.OutOfMemoryError: Java heap

Re: SSL between Kafka and Spark Streaming API

2015-08-28 Thread Sourabh Chandak
Can we use the existing kafka spark streaming jar to connect to a kafka server running in SSL mode? We are fine with non SSL consumer as our kafka cluster and spark cluster are in the same network Thanks, Sourabh On Fri, Aug 28, 2015 at 12:03 PM, Gwen Shapira wrote: > I can't speak for the Sp

Re: Reliable Streaming Receiver

2015-08-05 Thread Sourabh Chandak
ct Kafka approach. That is quite flexible, can > give exactly-once guarantee without WAL, and is more robust and performant. > Consider using it. > > > On Wed, Aug 5, 2015 at 5:48 PM, Sourabh Chandak > wrote: > >> Hi, >> >> I am trying to replicate the Kafka Stre

Reliable Streaming Receiver

2015-08-05 Thread Sourabh Chandak
Hi, I am trying to replicate the Kafka Streaming Receiver for a custom version of Kafka and want to create a Reliable receiver. The current implementation uses BlockGenerator which is a private class inside Spark streaming hence I can't use that in my code. Can someone help me with some resources