Re: AWS Consistent S3 & Apache Hadoop's S3A connector

Steve Loughran Thu, 10 Dec 2020 06:38:52 -0800

On Mon, 7 Dec 2020 at 07:36, Chang Chen <baibaic...@gmail.com> wrote:

> Since S3A now works perfectly with S3Guard turned off, Could Magic
> Committor work with S3Guard is off? If Yes, will performance degenerate? Or
> if HADOOP-17400 is fixed, then it will have comparable performance?
>

Yes, works really well.

* It doesn't have problems with race conditions in job IDs (SPARK-3320)
because it does all its work under the dest dir and only supports one job @
a time there.

Performance wise:

* Expect no degradation if you are not working with directories marked as
authoritative (hive does that for managed tables). Indeed, you will save on
DDB writes.
* HADOOP-17400 speeds up all listing code, but for maximum directory
listing performance you need to use the (existing) incremental listing
APIS. See SPARK-33135 for some work there which matches this.

The list performance enhancements will only ship in hadoop-3.3.1. If you
use the incremental list APIs today (listStatusIncremental, listFiles)
everything is lined up, hdfs scales better and it helps motivate the abfs
dev team to do the same.

There's some extra fixes coming in related to this -credit to Donjoon for
contributing and/or reviewing this work.

HADOOP-17258. Magic S3Guard Committer to overwrite existing pendingSet file
on task commit
HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID
(for staging; for magic you can disable aborting all upload under the dest
dir & so have >1 job use the same dest dir)
HADOOP-16798. S3A Committer thread pool shutdown problems.

I'm also actively working on HADOOP-17414, Magic committer files don't have
the count of bytes written collected by spark:
https://github.com/apache/hadoop/pull/2530

Spark doesn't track bytes written as it is only measuring the 0-byte marker
file.

The Hadoop-side patch

* Returns all S3 object headers as XAttr attributes prefixed "header."
* Sets the custom header x-hadoop-s3a-magic-data-length to the length of
the data in the marker file.

There's a matching spark change which looks for the header in the getXAttr
API if the length of the output file is 0 bytes long. If present and parses
to a positive long, it's used as the declaration of output size.

Hadoop Branch-3.3 also has a very leading edge patch to stop deleting
superfluous directory markers when files are created. See
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md
for details
This will avoid throttling when many files are being written to the same
bit of an S3 bucket, and stop creating tombstone markers in versioned S3
buckets. These tombstones were slowing down subsequent calls to LIST; over
time list calls will slow. This is new, needs a patch on older clients to
stop mistaking a marker for an empty dir and needs broader testing. This is
in all maintained hadoop 3.x branches, but not yet shipped other than in
hadoop-3.3.2

If you do want leading edge performance, yes, grab those latest patches in
your own build. I plan to cut a new 3.3.x release soon to get it into
peoples' hands. It will be the one with Arm-M1 binary support in the libs
and codecs. Building and testing now means that you get to have problems
you find now fixed before that release. Hey, you even have an excuse for
the new macbooks "I wanted to test spark on it"

-Steve

Re: AWS Consistent S3 & Apache Hadoop's S3A connector

Reply via email to