[
https://issues.apache.org/jira/browse/HADOOP-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran resolved HADOOP-18757.
-------------------------------------
Fix Version/s: 3.4.0
3.3.9
Resolution: Fixed
> S3A Committer only finalizes the commits in a single thread
> -----------------------------------------------------------
>
> Key: HADOOP-18757
> URL: https://issues.apache.org/jira/browse/HADOOP-18757
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.3.5
> Reporter: Moditha Hewasinghage
> Assignee: Moditha Hewasinghage
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> S3A Committer is being bottle-necked on the driver when finalizing the
> commits. It seems like only a single thread is being used to finalize the
> commit. In the experiment we are saving 36,000 files ending committing for
> almost 2 hours each file taking 0.1 - 0.5 seconds while all the executors
> stay idle while the driver commits. I have attached the driver log snippets
> to support this theory in comparison to spark 3.4.0.
> The most likely reason is the usage of
> [ThreadPoolExecutor]([https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html])
> in
> [https://github.com/apache/hadoop/blob/706d88266abcee09ed78fbaa0ad5f74d818ab0e9/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/impl/CommitContext.java#L239-L244]
> where an LinkedBlockingQueue is used with a corePoolSize of 0 which ends up
> with a single thread and no new threads created.
> From the documentation
> {code:java}
> Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue
> without a predefined capacity) will cause new tasks to wait in the queue when
> all corePoolSize threads are busy. Thus, no more than corePoolSize threads
> will ever be created. (And the value of the maximumPoolSize therefore doesn't
> have any effect.) This may be appropriate when each task is completely
> independent of others, so tasks cannot affect each others execution; for
> example, in a web page server. While this style of queuing can be useful in
> smoothing out transient bursts of requests, it admits the possibility of
> unbounded work queue growth when commands continue to arrive on average
> faster than they can be processed.{code}
> Magic Committer spark 3.5.0-SNAPSHOT
> {code:java}
> 2023-05-26 15:35:04,852 DEBUG impl.CommitContext: creating thread pool of
> size 32
> 2023-05-26 15:35:04,922 INFO yarn.YarnAllocator: Driver requested a total
> number of 0 executor(s) for resource profile id: 0.
> 2023-05-26 15:35:07,910 INFO commit.AbstractS3ACommitter: Starting:
> committing the output of 36000 task(s)
> 2023-05-26 15:35:07,914 INFO commit.AbstractS3ACommitter: Starting: Loading
> and committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448
> e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000000.pendingset
> 2023-05-26 15:35:07,941 DEBUG files.PersistentCommitData: Reading commit data
> from file
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/
> task_202305261454174742717183892533843_0031_m_000000.pendingset
> 2023-05-26 15:35:08,036 DEBUG impl.CommitContext: creating thread pool of
> size 32
> 2023-05-26 15:35:08,037 DEBUG impl.CommitOperations: Committing single commit
> SinglePendingCommit
> {version=2,
> uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-
> 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000000_1920855/__base/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
>
> destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
>
> uploadId='SCp78J9aYOmnPwrtfd5a.Q6B9Zu6olQw3eZcpyt.W.BCu6.M6fz54nlTe2ZYUDicSskFgPocsjlVXAbfiPn3Xu
> 26MzXNdWcg5j_kBGg9iWpeoWh4K21gt7bbKetML95MXAck15yP.VGFeOleoAspUg--',
> created=1685113555232, saved=1685113555232, size=110636173, date='Fri May 26
> 15:05:55 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354',
> taskId='attemp t_202305261454174742717183892533843_0031_m_000000_1920855',
> notes='',
> etags=[64b9d14a89890cf7f931ff05074f6b6a,7a8c22d95d85736806205ad0f84094a5,82f55a1b0595b38420c6b55cb2763a46,212a3105afcf4a9045730f885a622e0a,b563ec91932a66498
>
> e8e9d717b625961,0425b7ae17a8a2af80734f0adf4debe9,496e9fbc08a583657fd29779dcd848d6,72bff33d090699baacb6a6171760a675,b2c6b9b76256416d304392bfbabca382,1a8f63d960cf50589b83b407ddfc77c7,d79cf49f16cd61fb00faf67034268529,e3b7ea2bd0ff
>
> f2e20a89eccd0ebbeb4a,8edc254a94902fee2884350b4a89f223,f0858853a4542b562c6973e28ee40c19]}
> 2023-05-26 15:35:08,037 INFO impl.CommitOperations: Starting: Committing file
> <PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
> s
> ize 110636173
> 2023-05-26 15:35:08,198 DEBUG impl.CommitOperations: Successful commit of
> file length 110636173
> 2023-05-26 15:35:08,199 INFO impl.CommitOperations: Committing file
> <PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
> size 110636
> 173: duration 0:00.162s
> 2023-05-26 15:35:08,208 INFO commit.AbstractS3ACommitter: Loading and
> committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e
> 0f7457354/00/task_202305261454174742717183892533843_0031_m_000000.pendingset:
> duration 0:00.294s
> 2023-05-26 15:35:08,208 INFO commit.AbstractS3ACommitter: Starting: Loading
> and committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448
> e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000001.pendingset
> 2023-05-26 15:35:08,208 DEBUG files.PersistentCommitData: Reading commit data
> from file
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/
> task_202305261454174742717183892533843_0031_m_000001.pendingset
> 2023-05-26 15:35:08,232 DEBUG impl.CommitOperations: Committing single commit
> SinglePendingCommit
> {version=2,
> uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-
> 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000001_1920856/__base/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
>
> destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
>
> uploadId='I.5Kns.n5JKLbupr5.AWbzApL9YDGhQyI438WQT3JXUjevkQS8u1aQwQhkVbwBoxnFXItCbKJCS8EwPdtoijtc
> UFrjU0eU3pNvt7lwcYRJ0iklLwXexHbYMpZq3pDj0udwsHCq0trnqyq2Ee.o.prw--',
> created=1685113548923, saved=1685113548923, size=110570244, date='Fri May 26
> 15:05:48 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354',
> taskId='attemp t_202305261454174742717183892533843_0031_m_000001_1920856',
> notes='',
> etags=[835dd18d4ce995c22c1a84de5cd26fd8,c1f2acd3360b2f94ce90ef0d80edd7e5,208787fce146d917dbc50b1cf4adff4c,a66b228379201a0a278061b096dc2faf,58134b7ae228f0d2a
>
> 6879ec53a3f35c1,e847a7614639d3f1a43a1b3090b603e1,2d8eac2399238e6d356fe0bdfafd076e,e0c3a964ee2a3c270f67e7e4a7791440,84b05192e896e41e1d16aa363f0eb48b,c3cf764e4361da4573b8dfa6361e4174,78b145f682946c7e18750758da3578cd,d9cd438899d2
>
> 0522668b9cd7f61cc098,e8b3944929f17d927ed7aef31800bdbf,52d279a75e9382ead94fe4d9b81c3bf9]}
> 2023-05-26 15:35:08,232 INFO impl.CommitOperations: Starting: Committing file
> <PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
> s
> ize 110570244
> 2023-05-26 15:35:08,438 DEBUG impl.CommitOperations: Successful commit of
> file length 110570244
> 2023-05-26 15:35:08,438 INFO impl.CommitOperations: Committing file
> <PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
> size 110570
> 244: duration 0:00.206s
> 2023-05-26 15:35:08,448 INFO commit.AbstractS3ACommitter: Loading and
> committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e
> 0f7457354/00/task_202305261454174742717183892533843_0031_m_000001.pendingset:
> duration 0:00.240s
> 2023-05-26 15:35:08,448 INFO commit.AbstractS3ACommitter: Starting: Loading
> and committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448
> e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000002.pendingset
> 2023-05-26 15:35:08,448 DEBUG files.PersistentCommitData: Reading commit data
> from file
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/
> task_202305261454174742717183892533843_0031_m_000002.pendingset
> 2023-05-26 15:35:08,489 DEBUG impl.CommitOperations: Committing single commit
> SinglePendingCommit
> {version=2,
> uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-
> 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000002_1920857/__base/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
>
> destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
>
> uploadId='AB982jhG0EAfnSxXWphX7R.QfqrkS2ipP6APqJZYGlqDl335WX4X2xn81wy7l2i5NBQrvG.eVcQgYFudk51wi5
> EMDJRNVz__SEWyF0xTncpIHpaC5xn8La.AY8gzCHQrAfRJ83nErQhe4Idmhcs04w--',
> created=1685113553072, saved=1685113553072, size=110136058, date='Fri May 26
> 15:05:53 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354',
> taskId='attemp t_202305261454174742717183892533843_0031_m_000002_1920857',
> notes='',
> etags=[f043628528fc9bdd6db4107fd175eb1d,11e36b4bde35bdb1d2e160ffd5fcc3f6,5105cb3ebee1b27cfec66f7682679316,1a85e5c9477e855e38f7bd4e81bed28e,7c57360865e12c19b
>
> 3125cbfb8c786f8,59d928629e7afff4c4c038a1dadd0ce7,bfcedee2f51b45c20a097fbbae85a48f,7e3ff2665714d8cdc6ec20134856eb4c,0117e5c47f94a9ff4d8d0eafd1f6b76e,3c4f27e69544e4cc5297c56413231639,a32cd6422aaae63bdcbaafa7400ab889,1d5d07f0a0b2
>
> 58773993a456bc7ec7ee,a220e786aa36f643edc330a184e23d88,e19a85ab3accaa4dc697e7dfbf5d5325]}
> 2023-05-26 15:35:08,489 INFO impl.CommitOperations: Starting: Committing file
> <PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
> s
> ize 110136058
> 2023-05-26 15:35:08,646 DEBUG impl.CommitOperations: Successful commit of
> file length 110136058
> 2023-05-26 15:35:08,646 INFO impl.CommitOperations: Committing file
> <PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
> size 110136058: duration 0:00.157s
> {code}
> Magic Committer spark 3.4.0
>
> {code:java}
> 2023-05-24 09:47:19,906 INFO yarn.YarnAllocator: Driver requested a total
> number of 524 executor(s) for resource profile id: 0.
> 2023-05-24 09:47:23,064 INFO commit.AbstractS3ACommitter: Starting:
> committing the output of 36000 task(s)
> 2023-05-24 09:47:23,069 DEBUG commit.AbstractS3ACommitter: Task committer
> attempt_202305240933298436217226765687147_0000_m_000000_0: creating thread
> pool of size 32
> 2023-05-24 09:47:23,074 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,074 INFO commit.AbstractS3ACommitter: Starting: Loading
> and committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000000.pendingset
> 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading
> and committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000001.pendingset
> 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading
> and committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000002.pendingset
> 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading
> and committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000003.pendingset
> 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading
> and committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000004.pendingset
> 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading
> and committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000005.pendingset
> 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading
> and committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000006.pendingset
> 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in
> file
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000004.pendingset
> 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading
> and committing files in pendingset
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000007.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in
> file
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000003.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in
> file
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000000.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in
> file
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000002.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in
> file
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000001.pendingset
> 2023-05-24 09:47:23,078 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in
> file
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000005.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in
> file
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000007.pendingset{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]