Hi,
Thanks for the feedback!
As Till explained, the problem is that the JM first tries to schedule the
job to the failed TM (which hasn't been detected as failed yet).
The configured three restart attempts are "consumed" by these attempts and
the job fails afterwards.
Best, Fabian
2018-04-05 8:1
Just for the record,
It did not work with RestartStrategies.fixedDelayRestart(3, 5000) but worked
with RestartStrategies.fixedDelayRestart(20, 5000)
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
As suggested by Till, it works perfectly fine after increasing the no. of
retries. Thanks people.
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
There is a JIRA issue for the problem:
https://issues.apache.org/jira/browse/FLINK-9120. Mirroring my response to
this thread:
The logs (attached to the JIRA ticket) show that the JM did not yet
recognize the killed TM as killed when trying to restart. Thus, it tries to
re-deploy tasks to this mac
@Till: Do you have any advice for this issue?
Am 03.04.18 um 11:54 schrieb dhirajpraj:
What I have found is that the TM fault tolerance behaviour is not consistent.
Sometimes it works and sometimes it doesnt. I am attaching my java code file
(which is the main class).
What I did was:
1) Run cl
What I have found is that the TM fault tolerance behaviour is not consistent.
Sometimes it works and sometimes it doesnt. I am attaching my java code file
(which is the main class).
What I did was:
1) Run cluster with JM on machine A, one TM on machine B and one TM on
machine C
2) Submit a job to
Could you provide a little reproducible example? Which file system are
you using? This sounds like a bug to me that should be fixed if valid.
Am 03.04.18 um 11:28 schrieb dhirajpraj:
I have not specified any parallelism in the job code. So I guess, the
parallelism should be set to parallelism.d
I have not specified any parallelism in the job code. So I guess, the
parallelism should be set to parallelism.default defined in the
flinkConfig.yaml.
An update: The TMs were on different machines and I was using FsStateBackend
with state backend directories pointing to instance specific file pa
Hi,
does your job code declare a higher parallelism than 2? Or is submitted
with a higher parallelism? What is the Web UI displaying?
Regards,
Timo
Am 03.04.18 um 10:48 schrieb dhirajpraj:
Hi,
I have done that
env.enableCheckpointing(5000L);
env.setRestartStrategy(RestartStrategies.fixedDela
Hi,
I have done that
env.enableCheckpointing(5000L);
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(3, 5000));
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Please make sure you have set a number of re-tries and have checkpointing
activated if you use streaming.
On Fri, Mar 30, 2018 at 1:59 PM, dhirajpraj wrote:
> HI,
> I have set up a flink 1.4 cluster with 1 job manager and two task managers.
> The configs taskmanager.numberOfTaskSlots and paralle
11 matches
Mail list logo