Hi, Request your expertise on these doubts of mine
Thanks, Prabhjot On Thu, Nov 26, 2015 at 12:09 PM, Prabhjot Bharaj <prabhbha...@gmail.com> wrote: > Hello Folks, > > I am trying to build fault tolerance on the consumer side, so as to make > sure that all failure scenarios are handled. > On Data integrity side, there are primary 2 requirements:- > > 1. No Data loss > 2. No data duplication > > I'm particularly interested in data duplication. e.g. there are various > steps in the following order that will happen on the consumer during each > consume cycle:- > > 1. connect > 2. consume > 3. write offset back to zookeeper/kafka (0.8/0.9) > 4. process the message (which will be done by another code, not the > consumer api) > > Please correct the above steps if I'm wrong > > Now, failures (machine down/process down/unhandled exceptions or bugs) can > occur at each of the above steps > Especially, if a failure occurs after consuming the message and before > writing the offset back to zookeeper/kafka, on restart of the consumer, the > same message could be reconsumed - leading to duplication of this message, > if the 4th step is asynchronous. > e.g. if processing the message happens before writing back the offset, it > could cause data duplication after consumer restarts ! > > Is this a valid scenario ? > Also, are there any other scenarios that need to be taken into > consideration when consuming ? > > > Thanks, > Prabhjot > -- --------------------------------------------------------- "There are only 10 types of people in the world: Those who understand binary, and those who don't"