Hi
I did some tests and it turns out I was really overloading the cluster
which caused the problems.
I tried the timeout setting but that didn't help. Simply 'not overloading'
the system did help.
Thanks.
Niels
On Thu, Oct 12, 2017 at 10:42 AM, Ufuk Celebi wrote:
> Hey Niels,
>
> Flink curre
Hey Niels,
Flink currently restarts the complete job if you have a restart
strategy configured:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/restart_strategies.html.
I agree that only restarting the required parts of the pipeline is an
important optimization. Flink has not impl
Hi,
I'm currently doing some tests to see it this info helps.
I was running a different high CPU task on one of the nodes outside Yarn,
so I took that one out of the cluster to see if that helps.
What I do find strange that in this kind of error scenario the entire job
fails.
I would have expecte
Hey Niels,
any update on this?
– Ufuk
On Mon, Oct 9, 2017 at 10:16 PM, Ufuk Celebi wrote:
> Hey Niels,
>
> thanks for the detailed report. I don't think that it is related to
> the Hadoop or Scala version. I think the following happens:
>
> - Occasionally, one of your tasks seems to be extreme
Hey Niels,
thanks for the detailed report. I don't think that it is related to
the Hadoop or Scala version. I think the following happens:
- Occasionally, one of your tasks seems to be extremely slow in
registering its produced intermediate result (the data shuffled
between TaskManagers)
- Anothe
Hi,
I'm having some trouble running a java based Flink job in a yarn-session.
The job itself consists of reading a set of files resulting in a DataStream
(I use DataStream because in the future I intend to change the file with a
Kafka feed), then does some parsing and eventually writes the data i