subject:"Re\: Batch job stuck in Canceled state in Flink 1.5"

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-30 Thread Till Rohrmann

Great to hear :-) On Tue, May 29, 2018 at 4:56 PM, Amit Jain wrote: > Thanks Till. `taskmanager.network.request-backoff.max` option helped in > my case. We tried this on 1.5.0 and jobs are running fine. > > > -- > Thanks > Amit > > On Thu 24 May, 2018, 4:58 PM Amit Jain, wrote: > >> Thanks! Ti

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-29 Thread Amit Jain

Thanks Till. `taskmanager.network.request-backoff.max` option helped in my case. We tried this on 1.5.0 and jobs are running fine. -- Thanks Amit On Thu 24 May, 2018, 4:58 PM Amit Jain, wrote: > Thanks! Till. I'll give a try on your suggestions and update the thread. > > On Wed, May 23, 2018

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-24 Thread Amit Jain

Thanks! Till. I'll give a try on your suggestions and update the thread. On Wed, May 23, 2018 at 4:43 AM, Till Rohrmann wrote: > Hi Amit, > > it looks as if the current cancellation cause is not the same as the > initially reported cancellation cause. In the current case, it looks as if > the dep

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-22 Thread Till Rohrmann

Hi Amit, it looks as if the current cancellation cause is not the same as the initially reported cancellation cause. In the current case, it looks as if the deployment of your tasks takes so long that that maximum `taskmanager.network.request-backoff.max` value has been reached. When this happens

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-22 Thread Nico Kruber

Hi Amit, thanks for providing the logs, I'll look into it. We currently have a suspicion of this being caused by https://issues.apache.org/jira/browse/FLINK-9406 which we found by looking over the surrounding code. The RC4 has been cancelled since we see this as a release blocker. To rule out furt

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-03 Thread Nico Kruber

Also, please have a look at the other TaskManagers' logs, in particular the one that is running the operator that was mentioned in the exception. You should look out for the ID 98f5976716234236dc69fb0e82a0cc34. Nico PS: Flink logs files should compress quite nicely if they grow too big :) On 0

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-03 Thread Stephan Ewen

Google Drive would be great. Thanks! On Thu, May 3, 2018 at 1:33 PM, Amit Jain wrote: > Hi Stephan, > > Size of JM log file is 122 MB. Could you provide me other media to > post the same? We can use Google Drive if that's fine with you. > > -- > Thanks, > Amit > > On Thu, May 3, 2018 at 12:58 P

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-03 Thread Amit Jain

Hi Stephan, Size of JM log file is 122 MB. Could you provide me other media to post the same? We can use Google Drive if that's fine with you. -- Thanks, Amit On Thu, May 3, 2018 at 12:58 PM, Stephan Ewen wrote: > Hi Amit! > > Thanks for sharing this, this looks like a regression with the netwo

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-03 Thread Stephan Ewen

Hi Amit! Thanks for sharing this, this looks like a regression with the network stack changes. The log you shared from the TaskManager gives some hint, but that exception alone should not be a problem. That exception can occur under a race between deployment of some tasks while the whole job is e

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-02 Thread Amit Jain

Thanks! Fabian I will try using the current release-1.5 branch and update this thread. -- Thanks, Amit On Wed, May 2, 2018 at 3:42 PM, Fabian Hueske wrote: > Hi Amit, > > We recently fixed a bug in the network stack that affected batch jobs > (FLINK-9144). > The fix was added after your commit.

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-02 Thread Fabian Hueske

Hi Amit, We recently fixed a bug in the network stack that affected batch jobs (FLINK-9144). The fix was added after your commit. Do you have a chance to build the current release-1.5 branch and check if the fix also resolves your problem? Otherwise it would be great if you could open a blocker

Re: Batch job stuck in Canceled state in Flink 1.5

2018-04-29 Thread Amit Jain

Cluster is running on commit 2af481a On Sun, Apr 29, 2018 at 9:59 PM, Amit Jain wrote: > Hi, > > We are running numbers of batch jobs in Flink 1.5 cluster and few of those > are getting stuck at random. These jobs having the following failure after > which operator status changes to CANCELED and

Re: Batch job stuck in Canceled state in Flink 1.5

Re: Batch job stuck in Canceled state in Flink 1.5

Re: Batch job stuck in Canceled state in Flink 1.5

Re: Batch job stuck in Canceled state in Flink 1.5

Re: Batch job stuck in Canceled state in Flink 1.5

Re: Batch job stuck in Canceled state in Flink 1.5

Re: Batch job stuck in Canceled state in Flink 1.5

Re: Batch job stuck in Canceled state in Flink 1.5

Re: Batch job stuck in Canceled state in Flink 1.5

Re: Batch job stuck in Canceled state in Flink 1.5

Re: Batch job stuck in Canceled state in Flink 1.5

Re: Batch job stuck in Canceled state in Flink 1.5

12 matches

Site Navigation

Mail list logo

Footer information