Re: Checkpointing is super slow

2015-10-02 Thread Tathagata Das
Could you get the log4j INFO/DEBUG level logs which shows the error, and if possible time taken to write the checkpoints. On Fri, Oct 2, 2015 at 6:28 PM, Sourabh Chandak wrote: > Offset checkpoints (partition, offset) when using kafka direct streaming > approach > > > On Friday, October 2, 2015,

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
Offset checkpoints (partition, offset) when using kafka direct streaming approach On Friday, October 2, 2015, Tathagata Das wrote: > Which checkpointing are you talking about? DStream checkpoints (which > saves the DAG of DStreams, that is, only metadata), or RDD checkpointing > (which saves the

Re: Checkpointing is super slow

2015-10-02 Thread Tathagata Das
Which checkpointing are you talking about? DStream checkpoints (which saves the DAG of DStreams, that is, only metadata), or RDD checkpointing (which saves the actual intermediate RDD data) TD On Fri, Oct 2, 2015 at 2:56 PM, Sourabh Chandak wrote: > Tried using local checkpointing as well, and

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
Tried using local checkpointing as well, and even that becomes slow after sometime. Any idea what can be wrong? Thanks, Sourabh On Fri, Oct 2, 2015 at 9:35 AM, Sourabh Chandak wrote: > I can see the entries processed in the table very fast but after that it > takes a long time for the checkpoin

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
I can see the entries processed in the table very fast but after that it takes a long time for the checkpoint update. Haven't tried other methods of checkpointing yet, we are using DSE on Azure. Thanks, Sourabh On Fri, Oct 2, 2015 at 6:52 AM, Cody Koeninger wrote: > Why are you sure it's check

Re: Checkpointing is super slow

2015-10-02 Thread Cody Koeninger
Why are you sure it's checkpointing speed? Have you compared it against checkpointing to hdfs, s3, or local disk? On Fri, Oct 2, 2015 at 1:17 AM, Sourabh Chandak wrote: > Hi, > > I have a receiverless kafka streaming job which was started yesterday > evening and was running fine till 4 PM today

Checkpointing is super slow

2015-10-01 Thread Sourabh Chandak
Hi, I have a receiverless kafka streaming job which was started yesterday evening and was running fine till 4 PM today. Suddenly post that writing of checkpoint has slowed down and it is now not able to catch up with the incoming data. We are using the DSE stack with Spark 1.2 and Cassandra for ch