Hi Lasse, your reported issue [1] will be fixed in the next release of 1.10 and the upcoming 1.11. Thank you for your detailed report.
[1] https://issues.apache.org/jira/browse/FLINK-17322 On Wed, Apr 22, 2020 at 12:54 PM Lasse Nedergaard < [email protected]> wrote: > Hi Yun > > Thanks for looking into it and forwarded it to the right place. > > > Med venlig hilsen / Best regards > Lasse Nedergaard > > > Den 22. apr. 2020 kl. 11.06 skrev Yun Tang <[email protected]>: > > > Hi Lasse > > After debug locally, this should be a bug in Flink (even the latest > version). However, the bug should be caused in network stack with which I > am not very familiar and not so easy to find root cause directly. After > discussion with our network guys in Flink, we decide to first create > FLINK-17322 [1] to track this problem, and related owner would take a look > at this problem. > > Really thank you for reporting this bug. > > [1] https://issues.apache.org/jira/browse/FLINK-17322 > > Best > Yun Tang > ------------------------------ > *From:* Yun Tang <[email protected]> > *Sent:* Wednesday, April 22, 2020 1:43 > *To:* Lasse Nedergaard <[email protected]> > *Cc:* user <[email protected]> > *Subject:* Re: Latency tracking together with broadcast state can cause > job failure > > Hi Lasse > > Really sorry for missing your reply. I'll run your project and find the > root cause in my day time. And thanks for @Robert Metzger > <[email protected]> 's kind remind. > > Best > Yun Tang > ------------------------------ > *From:* Robert Metzger <[email protected]> > *Sent:* Tuesday, April 21, 2020 20:01 > *To:* Lasse Nedergaard <[email protected]> > *Cc:* Yun Tang <[email protected]>; user <[email protected]> > *Subject:* Re: Latency tracking together with broadcast state can cause > job failure > > Hey Lasse, > has the problem been resolved? > > (I'm also responding to this to make sure the thread gets attention again > :) ) > > Best, > Robert > > > On Wed, Apr 1, 2020 at 10:03 PM Lasse Nedergaard < > [email protected]> wrote: > > Hi > > I have attached a simple project with a test that reproduce the problem. > The normal fault is a mixed string but you can also EOF exception. > Please let me know if you have any questions to the solution. > > Med venlig hilsen / Best regards > Lasse Nedergaard > > > Den 1. apr. 2020 kl. 09.15 skrev Yun Tang <[email protected]>: > > > Hi Lasse > > Never meet this problem before, but can you share some exception stack > trace so that we could take a look. The simple project to reproduce is also > a good choice. > > Best > Yun Tang > ------------------------------ > *From:* Lasse Nedergaard <[email protected]> > *Sent:* Tuesday, March 31, 2020 19:10 > *To:* user <[email protected]> > *Subject:* Latency tracking together with broadcast state can cause job > failure > > Hi > > We have in both Flink 1.9.2 and 1.10 struggled with random deserialze and > Index out of range exception in one of our job. We also get out of memory > exceptions. > We have now identified it as a latency tracking together with broadcast > state Causing the problem. When we do integration testing locally we don’t > see any problem it’s only fails running on the cluster. > We have concluded that latency tracking package send over broadcast cause > the data stream to be corrupted and causing the exceptions. > We work on preparing a simple project on github to reproduce the problem > so the underlying problem can be solved. > > Anyone else have seen these kind of problems? > > Med venlig hilsen / Best regards > Lasse Nedergaard > > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng
