NPE when checkpointing

2020-10-04 Thread Binh Nguyen Van
Hi, I have a streaming job that is written in Apache Beam and uses Flink as its runner. The job is working as expected for about 15 hours and then it started to have checkpointing error. The error message looks like this java.lang.Exception: Could not perform checkpoint 910 for operator Source:

Re: [External Sender] Debugging "Container is running beyond physical memory limits" on YARN for a long running streaming job

2020-10-04 Thread Shubham Kumar
@Kye , Thanks for your suggestions, we are using one yarn app per job mode and your point is still valid in Flink 1.10 as per docs, it does make sense to avoid dynamic classloading for such jobs. Also, we seemed to have enough off heap for resources mentioned and what turned out to be the issue was