Re: RE: Spark or Storm

2015-06-19 Thread Tathagata Das
>> >>> -- >>> bit1...@163.com >>> >>> >>> *From:* Haopu Wang >>> *Date:* 2015-06-19 18:47 >>> *To:* Enno Shioji ; Tathagata Das >>> >>> *CC:* prajod.vettiyat...@wipro.com; Cody Koeninger ;

Re: RE: Spark or Storm

2015-06-19 Thread Cody Koeninger
9 18:47 >> *To:* Enno Shioji ; Tathagata Das >> >> *CC:* prajod.vettiyat...@wipro.com; Cody Koeninger ; >> bit1...@163.com; Jordan Pilat ; Will Briggs >> ; Ashish Soni ; ayan guha >> ; user@spark.apache.org; Sateesh Kavuri >> ; Spark Enthusiast ; >> Sa

Re: RE: Spark or Storm

2015-06-19 Thread Cody Koeninger
> ; Ashish Soni ; ayan guha > ; user@spark.apache.org; Sateesh Kavuri > ; Spark Enthusiast ; > Sabarish > Sasidharan > *Subject:* RE: RE: Spark or Storm > > My question is not directly related: about the "exactly-once semantic", > the document (copied below) said s

Re: RE: Spark or Storm

2015-06-19 Thread Ashish Soni
Soni ; ayan guha > ; user@spark.apache.org; Sateesh Kavuri > ; Spark Enthusiast ; > Sabarish > Sasidharan > *Subject:* RE: RE: Spark or Storm > > My question is not directly related: about the "exactly-once semantic", > the document (copied below) said spark streamin

Re: RE: Spark or Storm

2015-06-19 Thread bit1...@163.com
.apache.org; Sateesh Kavuri; Spark Enthusiast; Sabarish Sasidharan Subject: RE: RE: Spark or Storm My question is not directly related: about the "exactly-once semantic", the document (copied below) said spark streaming gives exactly-once semantic, but actually from my test result, with ch

RE: RE: Spark or Storm

2015-06-19 Thread Haopu Wang
ata Das Cc: prajod.vettiyat...@wipro.com; Cody Koeninger; bit1...@163.com; Jordan Pilat; Will Briggs; Ashish Soni; ayan guha; user@spark.apache.org; Sateesh Kavuri; Spark Enthusiast; Sabarish Sasidharan Subject: Re: RE: Spark or Storm Fair enough, on second thought, just saying that it should be idempotent

Re: RE: Spark or Storm

2015-06-19 Thread Enno Shioji
use of checkpoints to persist the Kafka offsets in Spark >>> Streaming itself, and not in zookeeper. >>> >>> >>> >>> Also this statement:”.. This allows one to build a Spark Streaming + >>> Kafka pipelines with end-to-end exactly-once semantics (if

Re: RE: Spark or Storm

2015-06-19 Thread Tathagata Das
dempotent or transactional).” >> >> >> >> >> >> *From:* Cody Koeninger [mailto:c...@koeninger.org] >> *Sent:* 18 June 2015 19:38 >> *To:* bit1...@163.com >> *Cc:* Prajod S Vettiyattil (WT01 - BAS); jrpi...@gmail.com; >> eshi...@gmail.com; wrbri...@gma

Re: RE: Spark or Storm

2015-06-18 Thread Enno Shioji
To:* bit1...@163.com > *Cc:* Prajod S Vettiyattil (WT01 - BAS); jrpi...@gmail.com; > eshi...@gmail.com; wrbri...@gmail.com; asoni.le...@gmail.com; ayan guha; > user; sateesh.kav...@gmail.com; sparkenthusi...@yahoo.in; > sabarish.sasidha...@manthan.com > *Subject:* Re: RE: Spark or Sto

RE: RE: Spark or Storm

2015-06-18 Thread prajod.vettiyattil
(WT01 - BAS); jrpi...@gmail.com; eshi...@gmail.com; wrbri...@gmail.com; asoni.le...@gmail.com; ayan guha; user; sateesh.kav...@gmail.com; sparkenthusi...@yahoo.in; sabarish.sasidha...@manthan.com Subject: Re: RE: Spark or Storm That general description is accurate, but not really a specific

Re: RE: Spark or Storm

2015-06-18 Thread Cody Koeninger
That general description is accurate, but not really a specific issue of the direct steam. It applies to anything consuming from kafka (or, as Matei already said, any streaming system really). You can't have exactly once semantics, unless you know something more about how you're storing results.

Re: RE: Spark or Storm

2015-06-18 Thread bit1...@163.com
I am wondering how direct stream api ensures end-to-end exactly once semantics I think there are two things involved: 1. From the spark streaming end, the driver will replay the Offset range when it's down and restarted,which means that the new tasks will process some already processed data. 2.