Hi Anuj, SIGTERM with SIGNAL 15 means that it was killed by an external process. Look into the Yarn logs to look for a specific error.
Usually, yarn kills a container with exit code 143 when it goes over memory boundaries. This is something the community constantly improves, but may still happen because of the various types of memory that is allocated (in particular native memory). Please recheck [1], how you can increase some safety margins. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/memory/mem_setup.html On Wed, Dec 9, 2020 at 6:25 PM aj <ajainje...@gmail.com> wrote: > I have a Flink stream job that reads data from Kafka and writes it to S3. > This job keeps failing after running for 2-3 days. > I am not able to find anything in logs why it's failing. Can somebody help > me how to find out the cause of failure? > > I can only see this in logs : > > org.apache.flink.streaming.api.functions.sink.filesystem.Buckets [] - > Subtask 7 received completion notification for checkpoint with id=608. > 2020-12-09 16:41:56,110 INFO org.apache.flink.yarn.YarnTaskExecutorRunner > [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as > requested. > 2020-12-09 16:41:56,111 INFO > org.apache.flink.runtime.blob.TransientBlobCache [] - Shutting > down BLOB cache > 2020-12-09 16:41:56,111 INFO > org.apache.flink.runtime.blob.PermanentBlobCache [] - Shutting > down BLOB cache > 2020-12-09 16:41:56,111 INFO > org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager [] - > Shutting down TaskExecutorLocalStateStoresManager. > 2020-12-09 16:41:56,115 INFO org.apache.flink.runtime.filecache.FileCache > [] - removed file cache directory > /mnt1/yarn/usercache/hadoop/appcache/application_1603267081962_94843/flink-dist-cache-fd5d7eae-bff7-4d74-89d8-0a40f174b7b8 > 2020-12-09 16:41:56,115 INFO org.apache.flink.runtime.filecache.FileCache > [] - removed file cache directory > /mnt2/yarn/usercache/hadoop/appcache/application_1603267081962_94843/flink-dist-cache-c5833412-5944-4b41-a502-5d952f5156af > 2020-12-09 16:41:56,115 INFO > org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - > FileChannelManager removed spill file directory > /mnt1/yarn/usercache/hadoop/appcache/application_1603267081962_94843/flink-io-e290b3fd-9110-47c4-9463-1bd08003afc9 > 2020-12-09 16:41:56,115 INFO org.apache.flink.runtime.filecache.FileCache > [] - removed file cache directory > /mnt3/yarn/usercache/hadoop/appcache/application_1603267081962_94843/flink-dist-cache-bf0d69fe-0f00-4483-8b20-0056a049f86b > 2020-12-09 16:41:56,115 INFO > org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - > FileChannelManager removed spill file directory > /mnt2/yarn/usercache/hadoop/appcache/application_1603267081962_94843/flink-io-55b8467d-8c16-441a-83d6-393462a0b4ca > 2020-12-09 16:41:56,115 INFO org.apache.flink.runtime.filecache.FileCache > [] - removed file cache directory > /mnt/yarn/usercache/hadoop/appcache/application_1603267081962_94843/flink-dist-cache-8bc77a7c-f62b-4f06-b963-41f174a0db8e > 2020-12-09 16:41:56,115 INFO > org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - > FileChannelManager removed spill file directory > /mnt3/yarn/usercache/hadoop/appcache/application_1603267081962_94843/flink-io-bf57f8db-0152-4697-b743-d07b4e46c9d7 > 2020-12-09 16:41:56,115 INFO > org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - > FileChannelManager removed spill file directory > /mnt/yarn/usercache/hadoop/appcache/application_1603267081962_94843/flink-io-d650ed68-9c44-45b9-9b41-d501152b3f0f > 2020-12-09 16:41:56,120 INFO > org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - > FileChannelManager removed spill file directory > /mnt1/yarn/usercache/hadoop/appcache/application_1603267081962_94843/flink-netty-shuffle-9311e006-fee0-4317-9355-5d981c558a08 > 2020-12-09 16:41:56,120 INFO > org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - > FileChannelManager removed spill file directory > /mnt2/yarn/usercache/hadoop/appcache/application_1603267081962_94843/flink-netty-shuffle-c633cf9f-8220-433a-8f3e-04d45e81efde > 2020-12-09 16:41:56,120 INFO > org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - > FileChannelManager removed spill file directory > /mnt3/yarn/usercache/hadoop/appcache/application_1603267081962_94843/flink-netty-shuffle-671eda78-9981-4f6d-bff4-25cca973d76d > 2020-12-09 16:41:56,120 INFO > org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - > FileChannelManager removed spill file directory > /mnt/yarn/usercache/hadoop/appcache/application_1603267081962_94843/flink-netty-shuffle-5efa7701-72da-4d91-b9f7-7e6963ffefdb > > End of LogType:taskmanager.log > > ******************************************************************************** > > > End of LogType:taskmanager.out > > ******************************************************************************** > > > -- > Thanks & Regards, > Anuj Jain > Mob. : +91- 8588817877 > Skype : anuj.jain07 > <http://www.oracle.com/> > > > <http://www.cse.iitm.ac.in/%7Eanujjain/> > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng