Hi Rainie,
I believe we need the full JobManager log to understand what's going on
with your job. The logs you've provided so far only tell us that a
TaskManager has died (which is expected, when a node goes down). What is
interesting to see is what's happening next: are we having enough resources
Thank you Yang, I checked "yarn.application-attempts" is already set to 10.
Here is the exception part from job manager log. Full log file is too big,
I also reflected it to remove some company specific info.
Any suggestion to this exception would be appreciated!
2020-07-15 20:04:52,265 INFO
org.
Could you check for that whether the JobManager is also running on the lost
Yarn NodeManager?
If it is the case, you need to configure "yarn.application-attempts" to a
value bigger than 1.
BTW, the logs you provided are not Yarn NodeManager logs. And if you could
provide the full jobmanager
log,