Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

Steve Loughran Mon, 31 Mar 2025 06:15:58 -0700

   1. mino does actually have atomic object relenames, but as it is file by
   file, task commit is nonatomic;
   2. v2 task commit is also unsafe -it just writes to the destination.
   There is no way committer which supports task failure can be as fast as
   this.


further reading
https://github.com/steveloughran/zero-rename-committer/releases/download/tag_release_2021-05-17/a_zero_rename_committer.pdf

the s3a committers -and the manifest committer for abfs
performance/resilience and gcs correctnless - all save statistics of the
commit work to the file _SUCCESS, which is now a JSON file. I'd be curious
about what those numbers are -though they only measure task/job commit, not
all the work (that's not quite true, but...)

You can get a log of all S3 IO performed for an entire Spark job across all
worker threads, via the S3 auditing,
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/auditing.html

On aws s3, if you turn on server logging, the HTTP Referrer header includes
a fake referrer which contains everything known about the object

```
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8
stevel-london [03/Mar/2025:18:33:25 +0000] 109.157.214.141
arn:aws:iam::152813717728:user/stevel-dev M7T1KD9CJ3RKEQ2N REST.GET.OBJECT
job-00-fork-0004/test/testCostOfSavingLoadingPendingFile/file.pending/__magic_job-123/__base/file.pending
"GET
/job-00-fork-0004/test/testCostOfSavingLoadingPendingFile/file.pending/__magic_job-123/__base/file.pending?versionId=VUTVBVgdJlUxFWXzmOQ6acvjarLxzwVb
HTTP/1.1" 206 - 548 548 17 16
"*https://audit.example.org/hadoop/1/op_open/28a5f56e-8f8d-4ba8-ab2d-738eed3c551b-00000011/?op=op_open&p1=job-00-fork-0004/test/testCostOfSavingLoadingPendingFile/file.pending/__magic_job-123/__base/file.pending&pr=stevel&ps=3c7dc974-e570-4247-a784-3d25e6e1aeec&rg=0-547&id=28a5f56e-8f8d-4ba8-ab2d-738eed3c551b-00000011&t0=32&fs=28a5f56e-8f8d-4ba8-ab2d-738eed3c551b&t1=12&ts=1741026805915
<https://audit.example.org/hadoop/1/op_open/28a5f56e-8f8d-4ba8-ab2d-738eed3c551b-00000011/?op=op_open&p1=job-00-fork-0004/test/testCostOfSavingLoadingPendingFile/file.pending/__magic_job-123/__base/file.pending&pr=stevel&ps=3c7dc974-e570-4247-a784-3d25e6e1aeec&rg=0-547&id=28a5f56e-8f8d-4ba8-ab2d-738eed3c551b-00000011&t0=32&fs=28a5f56e-8f8d-4ba8-ab2d-738eed3c551b&t1=12&ts=1741026805915>*"
"Hadoop 3.5.0-SNAPSHOT, aws-sdk-java/2.27.14 Mac_OS_X/15.3.1
OpenJDK_64-Bit_Server_VM/25.362-b09 Java/1.8.0_362
vendor/Azul_Systems__Inc. io/sync http/Apache cfg/retry-mode/adaptive
hll/cross-region ft/s3-transfer" VUTVBVgdJlUxFWXzmOQ6acvjarLxzwVb
P05f5j1b5NqernDT+rKL8KwaWZyDGVsO9/SewGkf/I7XYjM+UYa8vrEXa1ClDV6N59vZspoUqZpv7WZ3uJ/Wrw==
SigV4 ECDHE-RSA-AES128-SHA AuthHeader
stevel-london.s3.eu-west-2.amazonaws.com TLSv1.2 - -
```

This is really good for identifying when some inefficient IO is taking
place for the source file...for CSV files you want one big GET; for
orc/parquet you want to read the footer and then the ranges, ideally as
parallel GET requests.

Here's my suggestions

   1. Create a JIRA on this.
   2.  download the cloudstore library, run its storediag app and read its
   performance hints: https://github.com/steveloughran/cloudstore
   3. attach a _SUCCESS file of a slow job
   4.  and ideally the audit logs, which minio should collect -though it
   does require that referrer header to be retained.
   5. If you can't get the logs, you can collect them locally on every
   process by logging org.apache.hadoop.fs.s3a.audit.impl.LoggingAuditor at
   debug -though then you left to collect them and aggregate them so they are
   time ordered. server logs are much better


Hadoop Hadoop 2.7.6  is over a decade old. I'd expect reading and writing
files to be way way faster, even if safely committing work does have
overhead. There's a lot of parameters related to http pool size, worker
threads &c which can be expanded given your minio store is local





On Sun, 23 Mar 2025 at 03:05, Kristopher Kane <kk...@etsy.com.invalid>
wrote:

> We've seen significant performance gains in CSV going from 3.1 -> 3.5.
>
> You've very exactly pointed out the change in fileoutputcommitter.  v1
> (safe, serial, slow) -> v2 (object store unsafe if no atomic rename,
> parallel, faster).  In V1, the output files are moved by the driver
> serially from the staging directory to the final directory.  In V2 they are
> done at the task level.  It's possible the Minio implementation is
> overwhelmed by concurrent inode renames but not likely at 2.6GB.  V2 being
> much, much faster in high performing object stores and HDFS.
>
> The difference in 27 seconds to 34 seconds in Spark can be caused by many
> things and it wouldn't have surfaced on my radar.
>
> Probably an email for the user mailing list.
>
> Kris
>
> On Sat, Mar 22, 2025 at 10:30 PM Prem Sahoo <prem.re...@gmail.com> wrote:
>
>> This is inside my current project , I can’t move data to public domain .
>> But it seems there is something changed which made this slowness .
>> Sent from my iPhone
>>
>> On Mar 22, 2025, at 10:23 PM, Ángel Álvarez Pascua <
>> angel.alvarez.pas...@gmail.com> wrote:
>>
>> 
>> Could you take three thread dumps from one of the executors while Spark
>> is performing the conversion? You can use the Spark UI for that.
>>
>> El dom, 23 mar 2025 a las 3:20, Ángel Álvarez Pascua (<
>> angel.alvarez.pas...@gmail.com>) escribió:
>>
>>> Without the data, it's difficult to analyze. Could you provide some
>>> synthetic data so I can investigate this further? The schema and a few
>>> sample fake rows should be sufficient.
>>>
>>> El dom, 23 mar 2025 a las 3:17, Prem Sahoo (<prem.re...@gmail.com>)
>>> escribió:
>>>
>>>> I am providing the schema , and schema is actually correct means it has
>>>> all the columns available in csv . So we can take out this issue for
>>>> slowness .  May be there is some other contributing options .
>>>> Sent from my iPhone
>>>>
>>>> On Mar 22, 2025, at 10:05 PM, Ángel Álvarez Pascua <
>>>> angel.alvarez.pas...@gmail.com> wrote:
>>>>
>>>> 
>>>>
>>>> Hey, just this week I found some issues with the Univocity library that
>>>> Spark internally uses to read CSV files.
>>>>
>>>> *Spark CSV Read Low Performance: EOFExceptions in Univocity Parser*
>>>> https://issues.apache.org/jira/projects/SPARK/issues/SPARK-51579
>>>>
>>>> I initially assumed this issue had existed since Spark started using
>>>> this library, but perhaps something changed in the versions you mentioned.
>>>>
>>>> Are you providing a schema, or are you letting Spark infer it? I've
>>>> also noticed that when the schema doesn't match the columns in the CSV
>>>> files (for example, different number of columns), exceptions are thrown
>>>> internally.
>>>>
>>>> Given all this, my initial hypothesis is that thousands upon thousands
>>>> of exceptions are being thrown internally, only to be handled by the
>>>> Univocity parser—so the user isn't even aware of what's happening.
>>>>
>>>>
>>>> El dom, 23 mar 2025 a las 2:40, Prem Sahoo (<prem.re...@gmail.com>)
>>>> escribió:
>>>>
>>>>> Hello ,
>>>>> I read the csv file having size of 2.7 gb which is having 100 columns
>>>>> , when I am converting this to parquet with Spark 3.2 and Hadoop 2.7.6 it
>>>>> takes 28 secs but in Spark 3.5.2 and Hadoop 3.4.1 it takes 34 secs . This
>>>>> stat is bad .
>>>>> Sent from my iPhone
>>>>>
>>>>> On Mar 22, 2025, at 9:21 PM, Ángel Álvarez Pascua <
>>>>> angel.alvarez.pas...@gmail.com> wrote:
>>>>>
>>>>> 
>>>>> Sure. I love performance challenges and mysteries!
>>>>>
>>>>> Please, could you provide an example project or the steps to build one?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> El dom, 23 mar 2025, 2:17, Prem Sahoo <prem.re...@gmail.com> escribió:
>>>>>
>>>>>> Hello Team,
>>>>>> I was working with Spark 3.2 and Hadoop 2.7.6 and writing to MinIO
>>>>>> object storage . It was slower when compared to write to MapR FS with 
>>>>>> above
>>>>>> tech stack. Then moved on to later upgraded version of Spark 3.5.2 and
>>>>>> Hadoop 4.3.1 which started writing to MinIO with V2 fileoutputcommitter 
>>>>>> and
>>>>>> check ed the performance which is worse than old tech stack. Then tried
>>>>>> using magic committer and it came out slower than V2 so with the latest
>>>>>> tech stack the performance is down graded. Could some please assist .
>>>>>> Sent from my iPhone
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>
>>>>>>

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

Reply via email to