Great to hear :-)
On Tue, May 29, 2018 at 4:56 PM, Amit Jain wrote:
> Thanks Till. `taskmanager.network.request-backoff.max` option helped in
> my case. We tried this on 1.5.0 and jobs are running fine.
>
>
> --
> Thanks
> Amit
>
> On Thu 24 May, 2018, 4:58 PM Amit Jain, wrote:
>
>> Thanks! Ti
Thanks Till. `taskmanager.network.request-backoff.max` option helped in my
case. We tried this on 1.5.0 and jobs are running fine.
--
Thanks
Amit
On Thu 24 May, 2018, 4:58 PM Amit Jain, wrote:
> Thanks! Till. I'll give a try on your suggestions and update the thread.
>
> On Wed, May 23, 2018
Thanks! Till. I'll give a try on your suggestions and update the thread.
On Wed, May 23, 2018 at 4:43 AM, Till Rohrmann wrote:
> Hi Amit,
>
> it looks as if the current cancellation cause is not the same as the
> initially reported cancellation cause. In the current case, it looks as if
> the dep
Hi Amit,
it looks as if the current cancellation cause is not the same as the
initially reported cancellation cause. In the current case, it looks as if
the deployment of your tasks takes so long that that maximum
`taskmanager.network.request-backoff.max` value has been reached. When this
happens
Hi Amit,
thanks for providing the logs, I'll look into it. We currently have a
suspicion of this being caused by
https://issues.apache.org/jira/browse/FLINK-9406 which we found by
looking over the surrounding code. The RC4 has been cancelled since we
see this as a release blocker.
To rule out furt
Also, please have a look at the other TaskManagers' logs, in particular
the one that is running the operator that was mentioned in the
exception. You should look out for the ID 98f5976716234236dc69fb0e82a0cc34.
Nico
PS: Flink logs files should compress quite nicely if they grow too big :)
On 0
Google Drive would be great.
Thanks!
On Thu, May 3, 2018 at 1:33 PM, Amit Jain wrote:
> Hi Stephan,
>
> Size of JM log file is 122 MB. Could you provide me other media to
> post the same? We can use Google Drive if that's fine with you.
>
> --
> Thanks,
> Amit
>
> On Thu, May 3, 2018 at 12:58 P
Hi Stephan,
Size of JM log file is 122 MB. Could you provide me other media to
post the same? We can use Google Drive if that's fine with you.
--
Thanks,
Amit
On Thu, May 3, 2018 at 12:58 PM, Stephan Ewen wrote:
> Hi Amit!
>
> Thanks for sharing this, this looks like a regression with the netwo
Hi Amit!
Thanks for sharing this, this looks like a regression with the network
stack changes.
The log you shared from the TaskManager gives some hint, but that exception
alone should not be a problem. That exception can occur under a race
between deployment of some tasks while the whole job is e
Thanks! Fabian
I will try using the current release-1.5 branch and update this thread.
--
Thanks,
Amit
On Wed, May 2, 2018 at 3:42 PM, Fabian Hueske wrote:
> Hi Amit,
>
> We recently fixed a bug in the network stack that affected batch jobs
> (FLINK-9144).
> The fix was added after your commit.
Hi Amit,
We recently fixed a bug in the network stack that affected batch jobs
(FLINK-9144).
The fix was added after your commit.
Do you have a chance to build the current release-1.5 branch and check if
the fix also resolves your problem?
Otherwise it would be great if you could open a blocker
Cluster is running on commit 2af481a
On Sun, Apr 29, 2018 at 9:59 PM, Amit Jain wrote:
> Hi,
>
> We are running numbers of batch jobs in Flink 1.5 cluster and few of those
> are getting stuck at random. These jobs having the following failure after
> which operator status changes to CANCELED and
12 matches
Mail list logo