Double checking: The "deserialize(byte[] message)" already receives an
additional byte[] with too many bytes?

I wonder if this might be an issue in Kafka then, or in the specific way
Kafka is configured.

On Wed, Mar 7, 2018 at 5:40 PM, Philip Doctor <philip.doc...@physiq.com>
wrote:

> Hi Stephan,
>
> Sorry for the slow response.
>
>
>
> I added some logging inside of my DeserializationSchema’s
> `deserialize(byte[] message)` method.
>
>
>
> I see the extra bytes appearing in that method.
>
>
>
> If there’s another place I should add logging, please let me know and I’m
> happy to do so.
>
>
>
> Additionally (and this is weird), I write all my messages to the DB, so I
> was looking for what messages didn’t make it (i.e. input message 1->10,000
> which of those isn’t in the DB).  Turns out all 10k are in the DB.  I’m not
> sure if that indicates this message is read and then retried, or what.  I
> would have guessed that somehow extra data got written to my topic, but
> kafka tool tell me otherwise.  So from my application’s perspective it just
> looks like I get extra garbage data every now and then.
>
>
>
> This is actually a big relief, I toss out the garbage data and keep
> rolling.
>
>
>
> I hope this helps, thank you.
>
>
>
>
>
>
>
> *From: *Stephan Ewen <se...@apache.org>
> *Date: *Thursday, March 1, 2018 at 9:26 AM
> *To: *"user@flink.apache.org" <user@flink.apache.org>
> *Cc: *Philip Doctor <philip.doc...@physiq.com>
> *Subject: *Re: Flink Kafka reads too many bytes .... Very rarely
>
>
>
> Can you specify exactly where you have that excess of data?
>
>
>
> Flink uses basically Kafka's standard consumer and passes byte[]
> unmodified to the DeserializationSchema. Can you help us check whether the 
> "too
> many bytes" happens already before or after the DeserializationSchema?
>
>
>
>   - If the "too many bytes" already arrive at the DeserializationSchema,
> then we should dig into the way that Kafka's consumer is configured
>
>
>
>   - If the "too many bytes" appears after the  DeserializationSchema, then
> we should look into the DeserializationSchema, for example whether it is
> stateful, accidentally shared across threads, etc.
>
>
>
>
>
>
>
> On Thu, Mar 1, 2018 at 11:08 AM, Fabian Hueske <fhue...@gmail.com> wrote:
>
> Hi Phil,
>
> I've created a JIRA ticket for the problem that you described and linked
> it to this thread: FLINK-8820.
>
> Thank you, Fabian
>
> [1] https://issues.apache.org/jira/browse/FLINK-8820
>
>
>
> 2018-02-28 5:13 GMT+01:00 Philip Doctor <philip.doc...@physiq.com>:
>
>
>    - The fact that I seem to get all of my data is currently leading me
>    to discard and ignore this error
>
>
>
> Please ignore this statement, I typed this email as I was testing a
> theory, I meant to delete this line.  This is still a very real issue for
> me.  I was looking to try a work around tomorrow, I saw that the Kafka 11
> consumer supported transactions for exactly once processing, I was going to
> read about that and see if I could somehow fail a read that I couldn’t
> deserialize and try again, and if that might make a difference (can I just
> retry this ?).  I’m not sure how that’ll go.  If you’ve got an idea for a
> work around, I’d be all ears too.
>
>
>
>
>
> *From: *Philip Doctor <philip.doc...@physiq.com>
> *Date: *Tuesday, February 27, 2018 at 10:02 PM
> *To: *"Tzu-Li (Gordon) Tai" <tzuli...@apache.org>, Fabian Hueske <
> fhue...@gmail.com>
> *Cc: *"user@flink.apache.org" <user@flink.apache.org>
> *Subject: *Re: Flink Kafka reads too many bytes .... Very rarely
>
>
>
> Honestly this has been a very frustrating issue to dig in to.  The fact
> that I seem to get all of my data is currently leading me to discard and
> ignore this error, it’s rare, flink still seems to work, but something is
> very hard to debug here and despite some confusing observations, most of my
> evidence suggests that this originates in the flink kafka consumer.
>
>
>
>
>

Reply via email to