subject:"Odd job failure"

Re: Odd job failure

2018-05-29 Thread Piotr Nowojski

Hi, Could you post full output of the mvn dependency:tree command on your project? Can you reproduce this issue with some minimal project stripped down of any custom code/external dependencies except of Flink itself? Thanks Piotrek > On 28 May 2018, at 20:13, Elias Levy wrote: > > On Mon, May

Re: Odd job failure

2018-05-28 Thread Elias Levy

On Mon, May 28, 2018 at 1:48 AM, Piotr Nowojski wrote: > Most likely suspect is the standard java problem of some dependency > convergence issue. Please check if you are not pulling in multiple Kafka > versions into your class path. Especially your job shouldn’t pull any Kafka > library except of

Re: Odd job failure

2018-05-28 Thread Piotr Nowojski

Hi, I think that’s unlikely to happen. As far as I know, the only way to actually unload the classes in JVM is when their class loader is garbage collected, which means all the references in the code to it must vanish. In other words, it should never happen that class is not found while anyone

Re: Odd job failure

2018-05-26 Thread Elias Levy

Piotr & Stephan, Thanks for the replies. Apologies for the late response. I've been traveling for the past month. We've not observed this issue (spilling) again, but it is good to know that 1.5 will use back-pressure based alignment. I think for now we'll leave task.checkpoint.alignment.max-si

Re: Odd job failure

2018-05-03 Thread Stephan Ewen

Concerning the connectivity issue - it is hard to say anything more without any logs or details. Does the JM log that it is trying to send tasks to the 3rd TM, but the TM does not show signs of executing them? On Thu, May 3, 2018 at 10:22 AM, Stephan Ewen wrote: > Hi Elias! > > Concerning the

Re: Odd job failure

2018-05-03 Thread Stephan Ewen

Hi Elias! Concerning the spilling of alignment data to disk: - In 1.4.x , you can set an upper limit via " task.checkpoint.alignment.max-size ". See [1]. - In 1.5.x, the default is a back-pressure based alignment, which does not spill any more. Best, Stephan [1] https://ci.apache.org/projec

Re: Odd job failure

2018-05-02 Thread Piotr Nowojski

Hi, It might be some Kafka issue. From what you described your reasoning seems sound. For some reason TM3 fails and is unable to restart and process any data, thus forcing spilling on checkpoint barriers on TM1 and TM2. I don’t know the reason behind java.lang.NoClassDefFoundError: org/apach

Odd job failure

2018-04-27 Thread Elias Levy

We had a job on a Flink 1.4.2 cluster with three TMs experience an odd failure the other day. It seems that it started as some sort of network event. It began with the 3rd TM starting to warn every 30 seconds about socket timeouts while sending metrics to DataDog. This latest for the whole outag

Re: Odd job failure

Re: Odd job failure

Re: Odd job failure

Re: Odd job failure

Re: Odd job failure

Re: Odd job failure

Re: Odd job failure

Odd job failure

8 matches

Site Navigation

Mail list logo

Footer information