Thanks for the heads-up and explaining how you resolve the issue!
Best, Fabian
2017-10-18 3:50 GMT+02:00 ShB :
> I just wanted to leave an update about this issue, for someone else who
> might
> come across it. The problem was with memory, but it was disk memory and not
> heap/off-heap memory. Y
I just wanted to leave an update about this issue, for someone else who might
come across it. The problem was with memory, but it was disk memory and not
heap/off-heap memory. Yarn was killing off my containers as they exceeded
the threshold for disk utilization and this was manifesting as Task man
On further investigation, seems to me the I/O exception I posted previously
is not the cause of the TM being lost. it's the after effect of the TM being
shut down and the channel being closed after a record is emitted but before
it's processed.
Previously, the logs didn't throw up this error and I
Hi Stephan,
Apologies, I hit send too soon on the last email.
So, while trying to debug this, I ran it multiple times on different
instance types(to increase RAM available) and while digging into the logs, I
found this to be the error in the task manager logs:
/
java.lang.RuntimeException: Emit
Hi Stephan,
Thanks for your response!
Task manager lost/killed has been a recurring problem I've had with Flink
for the last few months, as I try to scale to larger and larger amounts of
data. I would be very grateful for some help figuring out how I can avoid
this.
The program is set up someth
Hi!
The garbage collection stats actually look okay, not terribly bad - almost
surprised that this seems to cause failures.
Can you check whether you find messages in the TM / JM log about heartbeat
timeouts, actor systems being "gated" or "quarantined"?
Would also be interesting to know how the
Thanks for your response!
Recommendation to decrease allotted memory? Which allotted memory should be
decreased?
I tried decreasing taskmanager.memory.fraction to give more memory to user
managed operations, that doesn't work beyond a point. Also tried increasing
containerized.heap-cutoff-ratio,
Late response, but a common reason for disappearing TaskManagers is termination
by the Linux out-of-memory killer, with the recommendation to decrease the
allotted memory.
> On Sep 5, 2017, at 9:09 AM, ShB wrote:
>
> Hi,
>
> I'm running a Flink batch job that reads almost 1 TB of data from
Hi,
I'm running a Flink batch job that reads almost 1 TB of data from S3 and
then performs operations on it. A list of filenames are distributed among
the TM's and each subset of files is read from S3 from each TM. This job
errors out at the read step due to the following error:
java.lang.Excepti