Which checkpointing are you talking about? DStream checkpoints (which saves the DAG of DStreams, that is, only metadata), or RDD checkpointing (which saves the actual intermediate RDD data)
TD On Fri, Oct 2, 2015 at 2:56 PM, Sourabh Chandak <sourabh3...@gmail.com> wrote: > Tried using local checkpointing as well, and even that becomes slow after > sometime. Any idea what can be wrong? > > Thanks, > Sourabh > > On Fri, Oct 2, 2015 at 9:35 AM, Sourabh Chandak <sourabh3...@gmail.com> > wrote: > >> I can see the entries processed in the table very fast but after that it >> takes a long time for the checkpoint update. >> >> Haven't tried other methods of checkpointing yet, we are using DSE on >> Azure. >> >> Thanks, >> Sourabh >> >> On Fri, Oct 2, 2015 at 6:52 AM, Cody Koeninger <c...@koeninger.org> >> wrote: >> >>> Why are you sure it's checkpointing speed? >>> >>> Have you compared it against checkpointing to hdfs, s3, or local disk? >>> >>> On Fri, Oct 2, 2015 at 1:17 AM, Sourabh Chandak <sourabh3...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I have a receiverless kafka streaming job which was started yesterday >>>> evening and was running fine till 4 PM today. Suddenly post that writing of >>>> checkpoint has slowed down and it is now not able to catch up with the >>>> incoming data. We are using the DSE stack with Spark 1.2 and Cassandra for >>>> checkpointing. Spark streaming is done using a backported code. >>>> >>>> Running nodetool shows that the Read latency of the cfs keyspace is >>>> ~8.5 ms. >>>> >>>> Can someone please help me resolve this? >>>> >>>> Thanks, >>>> Sourabh >>>> >>>> >>> >> >