Hi,
Could you post full output of the mvn dependency:tree command on your project?
Can you reproduce this issue with some minimal project stripped down of any
custom code/external dependencies except of Flink itself?
Thanks Piotrek
> On 28 May 2018, at 20:13, Elias Levy wrote:
>
> On Mon, May
On Mon, May 28, 2018 at 1:48 AM, Piotr Nowojski
wrote:
> Most likely suspect is the standard java problem of some dependency
> convergence issue. Please check if you are not pulling in multiple Kafka
> versions into your class path. Especially your job shouldn’t pull any Kafka
> library except of
Hi,
I think that’s unlikely to happen. As far as I know, the only way to actually
unload the classes in JVM is when their class loader is garbage collected,
which means all the references in the code to it must vanish. In other words,
it should never happen that class is not found while anyone
Piotr & Stephan,
Thanks for the replies. Apologies for the late response. I've been
traveling for the past month.
We've not observed this issue (spilling) again, but it is good to know that
1.5 will use back-pressure based alignment. I think for now we'll leave
task.checkpoint.alignment.max-si
Concerning the connectivity issue - it is hard to say anything more without
any logs or details.
Does the JM log that it is trying to send tasks to the 3rd TM, but the TM
does not show signs of executing them?
On Thu, May 3, 2018 at 10:22 AM, Stephan Ewen wrote:
> Hi Elias!
>
> Concerning the
Hi Elias!
Concerning the spilling of alignment data to disk:
- In 1.4.x , you can set an upper limit via "
task.checkpoint.alignment.max-size ". See [1].
- In 1.5.x, the default is a back-pressure based alignment, which does
not spill any more.
Best,
Stephan
[1]
https://ci.apache.org/projec
Hi,
It might be some Kafka issue.
From what you described your reasoning seems sound. For some reason TM3 fails
and is unable to restart and process any data, thus forcing spilling on
checkpoint barriers on TM1 and TM2.
I don’t know the reason behind java.lang.NoClassDefFoundError:
org/apach
We had a job on a Flink 1.4.2 cluster with three TMs experience an odd
failure the other day. It seems that it started as some sort of network
event.
It began with the 3rd TM starting to warn every 30 seconds about socket
timeouts while sending metrics to DataDog. This latest for the whole
outag