Are you using snappy compression? There was a bug with snappy that caused corrupt messages.
Sent from my iPhone > On Mar 29, 2016, at 8:15 AM, sunil kalva <kalva.ka...@gmail.com> wrote: > > Hi > Do we store message crc also on disk, and server verifies same when we are > reading messages back from disk? > And how to handle errors when we use async publish ? > >> On Fri, Mar 25, 2016 at 4:17 AM, Becket Qin <becket....@gmail.com> wrote: >> >> You mentioned that you saw few corrupted messages, (< 0.1%). If so are you >> able to see some corrupted messages if you produce, say, 10M messages? >> >> On Wed, Mar 23, 2016 at 9:40 PM, sunil kalva <kalva.ka...@gmail.com> >> wrote: >> >>> I am using java client and kafka 0.8.2, since events are corrupted in >>> kafka broker i cant read and replay them again. >>> >>>> On Thu, Mar 24, 2016 at 9:42 AM, Becket Qin <becket....@gmail.com> >>> wrote: >>> >>>> Hi Sunil, >>>> >>>> The messages in Kafka has a CRC stored with each of them. When consumer >>>> receives a message, it will compute the CRC from the message bytes and >>>> compare it to the stored CRC. If the computed CRC and stored CRC does >> not >>>> match, that indicates the message has corrupted. I am not sure in your >>> case >>>> why the message is corrupted. Corrupted message seems to be pretty >> rare >>>> because the broker actually validate the CRC before it stores the >>> messages >>>> on to the disk. >>>> >>>> Is this problem reproduceable? If so, can you find out the messages >> that >>>> are corrupted? Also, are you using the Java clients or some other >>> clients? >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> On Wed, Mar 23, 2016 at 8:28 PM, sunil kalva <kalva.ka...@gmail.com> >>>> wrote: >>>> >>>>> can some one help me out here. >>>>> >>>>> On Wed, Mar 23, 2016 at 7:36 PM, sunil kalva <kalva.ka...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi >>>>>> I am seeing few messages getting corrupted in kafka, It is not >>>> happening >>>>>> frequently and percentage is also very very less (less than 0.1%). >>>>>> >>>>>> Basically i am publishing thrift events in byte array format to >> kafka >>>>>> topics(with out encoding like base64), and i also see more events >>> than >>>> i >>>>>> publish (i confirm this by looking at the offset for that topic). >>>>>> For example if i publish 100 events and i see 110 as offset for >> that >>>>> topic >>>>>> (since it is in production i could not get exact messages which >>> causing >>>>>> this problem, and we will only realize this problem when we consume >>>>> because >>>>>> our thrift deserialization fails). >>>>>> >>>>>> So my question is, is there any magic byte which actually >> determines >>>> the >>>>>> boundary of the message which is same as the byte i am sending or >> or >>>> for >>>>>> any n/w issues messages get chopped and stores as one message to >>>> multiple >>>>>> messages on server side ? >>>>>> >>>>>> tx >>>>>> SunilKalva >>