Hi everyone,

We are migrating our ETL tasks from Spark 3.2.1 (Java 11) to Spark 3.5.2 (Java 
17).

One of these applications that works fine on 3.2 completely kills our cluster 
on 3.5.2

The clusters consist of five 256GB workers and a 256GB master.

The task is run with "--executor-memory 200G” and is completed in about 15 
minutes on 3.2.1

However, when I run with  "--executor-memory 200G” on 3.5.2, the workers all 
die eventually because the worker is unable to allocate more shared memory (as 
far as I can tell because they have to be rebooted). I then tried with  
"--executor-memory 100G”. This chugs along for about half an hour and then runs 
out of disk space (/tmp/ has about 125GB) for shared memory.

The 3.2.1 Physical Plan is 11268 lines.
The 3.5.2 Physical Plan is 12923 lines.

All the consumed data consists of parquet files that live on S3 and are 
accessed using the s3a protocol configured as:

spark.hadoop.fs.s3a.aws.credentials.provider 
org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider

# Enables the hadoop s3a committer
spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a 
org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory

spark.hadoop.fs.s3a.threads.max 40
spark.hadoop.fs.s3a.connection.maximum 40

The query itself is basically:


final var catalogParts = partsSelectorA.selectParts()
    .union(partsSelectorB.selectParts())
    .union(partsSelectorC.selectParts())
    .union(partsSelectorD.selectParts())
    .distinct()
    .persist();

This is followed by some further “lightweight" unions that can be ignored as I 
have tried excluding these with no effect.

Each “selectParts()” method is a select statement on a huge table (~156M rows) 
combined with a half dozen or more left joins with large (~3M rows)  tables.

I’m considering trying the 3.5.3RC which resolves some left join issues.

Any ideas?

I can share more details privately if that can help.

Regards,

Steve Coy






This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia’s privacy policy. 
http://www.infomedia.com.au/privacy-policy/

Reply via email to