Re: Write ahead Logs and checkpoint

2015-02-23 Thread Tathagata Das
: > Kafka 0.8.2 has built-in offset management, how would that affect direct > stream in spark? > Please see KAFKA-1012 > > --- Original Message --- > > From: "Tathagata Das" > Sent: February 23, 2015 9:53 PM > To: "V Dineshkumar" > Cc: &quo

Re: Write ahead Logs and checkpoint

2015-02-23 Thread Felix C
Kafka 0.8.2 has built-in offset management, how would that affect direct stream in spark? Please see KAFKA-1012 --- Original Message --- From: "Tathagata Das" Sent: February 23, 2015 9:53 PM To: "V Dineshkumar" Cc: "user" Subject: Re: Write ahead Logs and ch

Re: Write ahead Logs and checkpoint

2015-02-23 Thread Tathagata Das
Exactly, that is the reason. To avoid that, in Spark 1.3 to-be-released, we have added a new Kafka API (called direct stream) which does not use Zookeeper at all to keep track of progress, and maintains offset within Spark Streaming. That can guarantee all records being received exactly-once. Its

Write ahead Logs and checkpoint

2015-02-23 Thread V Dineshkumar
Hi, My spark streaming application is pulling data from Kafka.To prevent data loss I have implemented WAL and enable checkpointing.On killing my application and restarting it I am able to prevent data loss now but however I am getting duplicate messages. Is it because the application got killed b