Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-11 Thread Arvid Heise
Hi Vishal, if you have the possibility could you create a memdump? It would be interesting to know why the TransferManager is never released. Note that it's impossible to release all objects/classes loaded through a particular ClassLoader, all we can do is making sure that the ClassLoader is not

Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-10 Thread Vishal Santoshi
As in https://github.com/aws/aws-sdk-java/blob/41a577e3f667bf5efb3d29a46aaf210bf70483a1/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/TransferManager.java#L2378 never gets called as it is never GCed... On Wed, Feb 10, 2021 at 10:47 AM Vishal Santoshi wrote: > Thank you, > > Th

Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-10 Thread Chesnay Schepler
FileSystems must not be bundled in the user jar. You must place them in lib/ or plugins/, because by bundling it you break our assumption that they exist for the lifetime of the cluster (which in turn means we don't really have to worry about cleaning up). On 2/10/2021 4:01 PM, Vishal Santosh

Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-10 Thread Vishal Santoshi
com/amazonaws/services/s3/transfer/TransferManager.class is in flink-s3-fs-hadoop-1.11.2.jar which is in the plugins and that AFAIK should have a dedicated ClassLoader per plugin. So does it make sense that these classes remain beyond the job and so does the executor service for multipart upload

Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-10 Thread Vishal Santoshi
We do put the flink-hdoop-uber*.jar in the flink lib ( and thus available to the root classloader ). That still does not explain the executor service outliving the job. On Tue, Feb 9, 2021 at 6:49 PM Vishal Santoshi wrote: > Hello folks, > We see threads from > https://github.c

ClassLoader leak when using s3a upload through DataSet.output

2021-02-09 Thread Vishal Santoshi
Hello folks, We see threads from https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/TransferManagerUtils.java#L49 outlive a batch job that writes Parquet Files to S3, causing a ClassLoader Leak. Is this a known