Hi, We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager (and jobs that should be running on it) remain down until manually restarted.
I managed to take and analyze a memory dump from one of the afflicted taskmanagers. It showed that 85% of the heap was made up of the java.io.DeleteOnExitHook.files hashset. The majority of the strings in that hashset (9041060 out of ~9041100) pointed to files that began /tmp/hadoop-yarn/s3a/s3ablock The problem seems to affect jobs that make use of the StreamingFileSink - all of the taskmanager crashes have been on the taskmaster running at least one job using this sink, and a cluster running only a single taskmanager / job that uses the StreamingFileSink crashed with the GC overhead limit exceeded error. I've had a look for advice on handling this error more broadly without luck. Any suggestions or advice gratefully received. Best regards, Mark Harris The information contained in or attached to this email is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. It may contain information which is confidential and/or covered by legal professional or other privilege under applicable law. The views expressed in this email are not necessarily the views of Centrica plc or its subsidiaries, and the company, its directors, officers or employees make no representation or accept any liability for its accuracy or completeness unless expressly stated to the contrary. Additional regulatory disclosures may be found here: https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email PH Jones is a trading name of British Gas Social Housing Limited. British Gas Social Housing Limited (company no: 01026007), British Gas Trading Limited (company no: 03078711), British Gas Services Limited (company no: 3141243), British Gas Insurance Limited (company no: 06608316), British Gas New Heating Limited (company no: 06723244), British Gas Services (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 3033654). Each company is registered in England and Wales with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. British Gas Insurance Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. British Gas Services Limited and Centrica Energy (Trading) Limited are authorised and regulated by the Financial Conduct Authority. British Gas Trading Limited is an appointed representative of British Gas Services Limited which is authorised and regulated by the Financial Conduct Authority.