Double checking: The "deserialize(byte[] message)" already receives an additional byte[] with too many bytes?
I wonder if this might be an issue in Kafka then, or in the specific way Kafka is configured. On Wed, Mar 7, 2018 at 5:40 PM, Philip Doctor <philip.doc...@physiq.com> wrote: > Hi Stephan, > > Sorry for the slow response. > > > > I added some logging inside of my DeserializationSchema’s > `deserialize(byte[] message)` method. > > > > I see the extra bytes appearing in that method. > > > > If there’s another place I should add logging, please let me know and I’m > happy to do so. > > > > Additionally (and this is weird), I write all my messages to the DB, so I > was looking for what messages didn’t make it (i.e. input message 1->10,000 > which of those isn’t in the DB). Turns out all 10k are in the DB. I’m not > sure if that indicates this message is read and then retried, or what. I > would have guessed that somehow extra data got written to my topic, but > kafka tool tell me otherwise. So from my application’s perspective it just > looks like I get extra garbage data every now and then. > > > > This is actually a big relief, I toss out the garbage data and keep > rolling. > > > > I hope this helps, thank you. > > > > > > > > *From: *Stephan Ewen <se...@apache.org> > *Date: *Thursday, March 1, 2018 at 9:26 AM > *To: *"user@flink.apache.org" <user@flink.apache.org> > *Cc: *Philip Doctor <philip.doc...@physiq.com> > *Subject: *Re: Flink Kafka reads too many bytes .... Very rarely > > > > Can you specify exactly where you have that excess of data? > > > > Flink uses basically Kafka's standard consumer and passes byte[] > unmodified to the DeserializationSchema. Can you help us check whether the > "too > many bytes" happens already before or after the DeserializationSchema? > > > > - If the "too many bytes" already arrive at the DeserializationSchema, > then we should dig into the way that Kafka's consumer is configured > > > > - If the "too many bytes" appears after the DeserializationSchema, then > we should look into the DeserializationSchema, for example whether it is > stateful, accidentally shared across threads, etc. > > > > > > > > On Thu, Mar 1, 2018 at 11:08 AM, Fabian Hueske <fhue...@gmail.com> wrote: > > Hi Phil, > > I've created a JIRA ticket for the problem that you described and linked > it to this thread: FLINK-8820. > > Thank you, Fabian > > [1] https://issues.apache.org/jira/browse/FLINK-8820 > > > > 2018-02-28 5:13 GMT+01:00 Philip Doctor <philip.doc...@physiq.com>: > > > - The fact that I seem to get all of my data is currently leading me > to discard and ignore this error > > > > Please ignore this statement, I typed this email as I was testing a > theory, I meant to delete this line. This is still a very real issue for > me. I was looking to try a work around tomorrow, I saw that the Kafka 11 > consumer supported transactions for exactly once processing, I was going to > read about that and see if I could somehow fail a read that I couldn’t > deserialize and try again, and if that might make a difference (can I just > retry this ?). I’m not sure how that’ll go. If you’ve got an idea for a > work around, I’d be all ears too. > > > > > > *From: *Philip Doctor <philip.doc...@physiq.com> > *Date: *Tuesday, February 27, 2018 at 10:02 PM > *To: *"Tzu-Li (Gordon) Tai" <tzuli...@apache.org>, Fabian Hueske < > fhue...@gmail.com> > *Cc: *"user@flink.apache.org" <user@flink.apache.org> > *Subject: *Re: Flink Kafka reads too many bytes .... Very rarely > > > > Honestly this has been a very frustrating issue to dig in to. The fact > that I seem to get all of my data is currently leading me to discard and > ignore this error, it’s rare, flink still seems to work, but something is > very hard to debug here and despite some confusing observations, most of my > evidence suggests that this originates in the flink kafka consumer. > > > > >