Hello Folks, Request your expertise on these doubts of mine
Thanks, Prabhjot On Thu, Nov 26, 2015 at 11:28 PM, Prabhjot Bharaj <prabhbha...@gmail.com> wrote: > Hi, > > Request your expertise on these doubts of mine > > Thanks, > Prabhjot > > On Thu, Nov 26, 2015 at 12:09 PM, Prabhjot Bharaj <prabhbha...@gmail.com> > wrote: > >> Hello Folks, >> >> I am trying to build fault tolerance on the consumer side, so as to make >> sure that all failure scenarios are handled. >> On Data integrity side, there are primary 2 requirements:- >> >> 1. No Data loss >> 2. No data duplication >> >> I'm particularly interested in data duplication. e.g. there are various >> steps in the following order that will happen on the consumer during each >> consume cycle:- >> >> 1. connect >> 2. consume >> 3. write offset back to zookeeper/kafka (0.8/0.9) >> 4. process the message (which will be done by another code, not the >> consumer api) >> >> Please correct the above steps if I'm wrong >> >> Now, failures (machine down/process down/unhandled exceptions or bugs) >> can occur at each of the above steps >> Especially, if a failure occurs after consuming the message and before >> writing the offset back to zookeeper/kafka, on restart of the consumer, the >> same message could be reconsumed - leading to duplication of this message, >> if the 4th step is asynchronous. >> e.g. if processing the message happens before writing back the offset, it >> could cause data duplication after consumer restarts ! >> >> Is this a valid scenario ? >> Also, are there any other scenarios that need to be taken into >> consideration when consuming ? >> >> >> Thanks, >> Prabhjot >> > > > > -- > --------------------------------------------------------- > "There are only 10 types of people in the world: Those who understand > binary, and those who don't" > -- --------------------------------------------------------- "There are only 10 types of people in the world: Those who understand binary, and those who don't"