Trying a few different approaches to the fs.s3a.fast.upload settings has bought 
me no joy - the taskmanagers end up simply crashing or complaining of high GC 
load. Heap dumps suggest that this time they're clogged with buffers instead, 
which makes sense.

Our job has parallelism of 6 and checkpoints every 15 minutes - if anything, 
we'd like to increase the frequency of that checkpoint duration. I suspect this 
could be affected by the partition structure we were bucketing to as well, and 
at any given moment we could be receiving data for up to 280 buckets at once.
Could this be a factor?

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@ververica.com>
Sent: 27 January 2020 16:16
To: Cliff Resnick <cre...@gmail.com>
Cc: David Magalhães <speeddra...@gmail.com>; Mark Harris 
<mark.har...@hivehome.com>; Till Rohrmann <trohrm...@apache.org>; 
flink-u...@apache.org <flink-u...@apache.org>; kkloudas <kklou...@apache.org>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for 
S3a files

Hi,

I think reducing the frequency of the checkpoints and decreasing parallelism of 
the things using the S3AOutputStream class, would help to mitigate the issue.

I don’t know about other solutions. I would suggest to ask this question 
directly to Steve L. in the bug ticket [1], as he is the one that fixed the 
issue. If there is no workaround, maybe it would be possible to put a pressure 
on the Hadoop guys to back port the fix to older versions?

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 27 Jan 2020, at 15:41, Cliff Resnick 
<cre...@gmail.com<mailto:cre...@gmail.com>> wrote:

I know from experience that Flink's shaded S3A FileSystem does not reference 
core-site.xml, though I don't remember offhand what file (s) it does reference. 
However since it's shaded, maybe this could be fixed by building a Flink FS 
referencing 3.3.0? Last I checked I think it referenced 3.1.0.

On Mon, Jan 27, 2020, 8:48 AM David Magalhães 
<speeddra...@gmail.com<mailto:speeddra...@gmail.com>> wrote:
Does StreamingFileSink use core-site.xml ? When I was using it, it didn't load 
any configurations from core-site.xml.

On Mon, Jan 27, 2020 at 12:08 PM Mark Harris 
<mark.har...@hivehome.com<mailto:mark.har...@hivehome.com>> wrote:
Hi Piotr,

Thanks for the link to the issue.

Do you know if there's a workaround? I've tried setting the following in my 
core-site.xml:

​fs.s3a.fast.upload.buffer=true

To try and avoid writing the buffer files, but the taskmanager breaks with the 
same problem.

Best regards,

Mark
________________________________
From: Piotr Nowojski <pi...@data-artisans.com<mailto:pi...@data-artisans.com>> 
on behalf of Piotr Nowojski <pi...@ververica.com<mailto:pi...@ververica.com>>
Sent: 22 January 2020 13:29
To: Till Rohrmann <trohrm...@apache.org<mailto:trohrm...@apache.org>>
Cc: Mark Harris <mark.har...@hivehome.com<mailto:mark.har...@hivehome.com>>; 
flink-u...@apache.org<mailto:flink-u...@apache.org> 
<flink-u...@apache.org<mailto:flink-u...@apache.org>>; kkloudas 
<kklou...@apache.org<mailto:kklou...@apache.org>>
Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for 
S3a files

Hi,

This is probably a known issue of Hadoop [1]. Unfortunately it was only fixed 
in 3.3.0.

Piotrek

[1] https://issues.apache.org/jira/browse/HADOOP-15658

On 22 Jan 2020, at 13:56, Till Rohrmann 
<trohrm...@apache.org<mailto:trohrm...@apache.org>> wrote:

Thanks for reporting this issue Mark. I'm pulling Klou into this conversation 
who knows more about the StreamingFileSink. @Klou does the StreamingFileSink 
relies on DeleteOnExitHooks to clean up files?

Cheers,
Till

On Tue, Jan 21, 2020 at 3:38 PM Mark Harris 
<mark.har...@hivehome.com<mailto:mark.har...@hivehome.com>> wrote:
Hi,

We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop v 
"Amazon 2.8.5". We've recently noticed that some TaskManagers fail (causing all 
the jobs running on them to fail) with an "java.lang.OutOfMemoryError: GC 
overhead limit exceeded”. The taskmanager (and jobs that should be running on 
it) remain down until manually restarted.

I managed to take and analyze a memory dump from one of the afflicted 
taskmanagers.

It showed that 85% of the heap was made up of the 
java.io.DeleteOnExitHook.files hashset. The majority of the strings in that 
hashset (9041060 out of ~9041100) pointed to files that began 
/tmp/hadoop-yarn/s3a/s3ablock

The problem seems to affect jobs that make use of the StreamingFileSink - all 
of the taskmanager crashes have been on the taskmaster running at least one job 
using this sink, and a cluster running only a single taskmanager / job that 
uses the StreamingFileSink crashed with the GC overhead limit exceeded error.

I've had a look for advice on handling this error more broadly without luck.

Any suggestions or advice gratefully received.

Best regards,

Mark Harris



The information contained in or attached to this email is intended only for the 
use of the individual or entity to which it is addressed. If you are not the 
intended recipient, or a person responsible for delivering it to the intended 
recipient, you are not authorised to and must not disclose, copy, distribute, 
or retain this message or any part of it. It may contain information which is 
confidential and/or covered by legal professional or other privilege under 
applicable law.

The views expressed in this email are not necessarily the views of Centrica plc 
or its subsidiaries, and the company, its directors, officers or employees make 
no representation or accept any liability for its accuracy or completeness 
unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: 
https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas 
Social Housing Limited (company no: 01026007), British Gas Trading Limited 
(company no: 03078711), British Gas Services Limited (company no: 3141243), 
British Gas Insurance Limited (company no: 06608316), British Gas New Heating 
Limited (company no: 06723244), British Gas Services (Commercial) Limited 
(company no: 07385984) and Centrica Energy (Trading) Limited (company no: 
02877397) are all wholly owned subsidiaries of Centrica plc (company no: 
3033654). Each company is registered in England and Wales with a registered 
office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation 
Authority and regulated by the Financial Conduct Authority and the Prudential 
Regulation Authority. British Gas Services Limited and Centrica Energy 
(Trading) Limited are authorised and regulated by the Financial Conduct 
Authority. British Gas Trading Limited is an appointed representative of 
British Gas Services Limited which is authorised and regulated by the Financial 
Conduct Authority.



The information contained in or attached to this email is intended only for the 
use of the individual or entity to which it is addressed. If you are not the 
intended recipient, or a person responsible for delivering it to the intended 
recipient, you are not authorised to and must not disclose, copy, distribute, 
or retain this message or any part of it. It may contain information which is 
confidential and/or covered by legal professional or other privilege under 
applicable law.

The views expressed in this email are not necessarily the views of Centrica plc 
or its subsidiaries, and the company, its directors, officers or employees make 
no representation or accept any liability for its accuracy or completeness 
unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: 
https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas 
Social Housing Limited (company no: 01026007), British Gas Trading Limited 
(company no: 03078711), British Gas Services Limited (company no: 3141243), 
British Gas Insurance Limited (company no: 06608316), British Gas New Heating 
Limited (company no: 06723244), British Gas Services (Commercial) Limited 
(company no: 07385984) and Centrica Energy (Trading) Limited (company no: 
02877397) are all wholly owned subsidiaries of Centrica plc (company no: 
3033654). Each company is registered in England and Wales with a registered 
office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation 
Authority and regulated by the Financial Conduct Authority and the Prudential 
Regulation Authority. British Gas Services Limited and Centrica Energy 
(Trading) Limited are authorised and regulated by the Financial Conduct 
Authority. British Gas Trading Limited is an appointed representative of 
British Gas Services Limited which is authorised and regulated by the Financial 
Conduct Authority.



The information contained in or attached to this email is intended only for the 
use of the individual or entity to which it is addressed. If you are not the 
intended recipient, or a person responsible for delivering it to the intended 
recipient, you are not authorised to and must not disclose, copy, distribute, 
or retain this message or any part of it. It may contain information which is 
confidential and/or covered by legal professional or other privilege under 
applicable law.

The views expressed in this email are not necessarily the views of Centrica plc 
or its subsidiaries, and the company, its directors, officers or employees make 
no representation or accept any liability for its accuracy or completeness 
unless expressly stated to the contrary.

Additional regulatory disclosures may be found here: 
https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email

PH Jones is a trading name of British Gas Social Housing Limited. British Gas 
Social Housing Limited (company no: 01026007), British Gas Trading Limited 
(company no: 03078711), British Gas Services Limited (company no: 3141243), 
British Gas Insurance Limited (company no: 06608316), British Gas New Heating 
Limited (company no: 06723244), British Gas Services (Commercial) Limited 
(company no: 07385984) and Centrica Energy (Trading) Limited (company no: 
02877397) are all wholly owned subsidiaries of Centrica plc (company no: 
3033654). Each company is registered in England and Wales with a registered 
office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

British Gas Insurance Limited is authorised by the Prudential Regulation 
Authority and regulated by the Financial Conduct Authority and the Prudential 
Regulation Authority. British Gas Services Limited and Centrica Energy 
(Trading) Limited are authorised and regulated by the Financial Conduct 
Authority. British Gas Trading Limited is an appointed representative of 
British Gas Services Limited which is authorised and regulated by the Financial 
Conduct Authority.

Reply via email to