[ https://issues.apache.org/jira/browse/HADOOP-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved HADOOP-18757. ------------------------------------- Fix Version/s: 3.4.0 3.3.9 Resolution: Fixed > S3A Committer only finalizes the commits in a single thread > ----------------------------------------------------------- > > Key: HADOOP-18757 > URL: https://issues.apache.org/jira/browse/HADOOP-18757 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 3.3.5 > Reporter: Moditha Hewasinghage > Assignee: Moditha Hewasinghage > Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > S3A Committer is being bottle-necked on the driver when finalizing the > commits. It seems like only a single thread is being used to finalize the > commit. In the experiment we are saving 36,000 files ending committing for > almost 2 hours each file taking 0.1 - 0.5 seconds while all the executors > stay idle while the driver commits. I have attached the driver log snippets > to support this theory in comparison to spark 3.4.0. > The most likely reason is the usage of > [ThreadPoolExecutor]([https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html]) > in > [https://github.com/apache/hadoop/blob/706d88266abcee09ed78fbaa0ad5f74d818ab0e9/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/impl/CommitContext.java#L239-L244] > where an LinkedBlockingQueue is used with a corePoolSize of 0 which ends up > with a single thread and no new threads created. > From the documentation > {code:java} > Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue > without a predefined capacity) will cause new tasks to wait in the queue when > all corePoolSize threads are busy. Thus, no more than corePoolSize threads > will ever be created. (And the value of the maximumPoolSize therefore doesn't > have any effect.) This may be appropriate when each task is completely > independent of others, so tasks cannot affect each others execution; for > example, in a web page server. While this style of queuing can be useful in > smoothing out transient bursts of requests, it admits the possibility of > unbounded work queue growth when commands continue to arrive on average > faster than they can be processed.{code} > Magic Committer spark 3.5.0-SNAPSHOT > {code:java} > 2023-05-26 15:35:04,852 DEBUG impl.CommitContext: creating thread pool of > size 32 > 2023-05-26 15:35:04,922 INFO yarn.YarnAllocator: Driver requested a total > number of 0 executor(s) for resource profile id: 0. > 2023-05-26 15:35:07,910 INFO commit.AbstractS3ACommitter: Starting: > committing the output of 36000 task(s) > 2023-05-26 15:35:07,914 INFO commit.AbstractS3ACommitter: Starting: Loading > and committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448 > e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000000.pendingset > 2023-05-26 15:35:07,941 DEBUG files.PersistentCommitData: Reading commit data > from file > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/ > task_202305261454174742717183892533843_0031_m_000000.pendingset > 2023-05-26 15:35:08,036 DEBUG impl.CommitContext: creating thread pool of > size 32 > 2023-05-26 15:35:08,037 DEBUG impl.CommitOperations: Committing single commit > SinglePendingCommit > {version=2, > uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- > 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000000_1920855/__base/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', > > destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', > > uploadId='SCp78J9aYOmnPwrtfd5a.Q6B9Zu6olQw3eZcpyt.W.BCu6.M6fz54nlTe2ZYUDicSskFgPocsjlVXAbfiPn3Xu > 26MzXNdWcg5j_kBGg9iWpeoWh4K21gt7bbKetML95MXAck15yP.VGFeOleoAspUg--', > created=1685113555232, saved=1685113555232, size=110636173, date='Fri May 26 > 15:05:55 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', > taskId='attemp t_202305261454174742717183892533843_0031_m_000000_1920855', > notes='', > etags=[64b9d14a89890cf7f931ff05074f6b6a,7a8c22d95d85736806205ad0f84094a5,82f55a1b0595b38420c6b55cb2763a46,212a3105afcf4a9045730f885a622e0a,b563ec91932a66498 > > e8e9d717b625961,0425b7ae17a8a2af80734f0adf4debe9,496e9fbc08a583657fd29779dcd848d6,72bff33d090699baacb6a6171760a675,b2c6b9b76256416d304392bfbabca382,1a8f63d960cf50589b83b407ddfc77c7,d79cf49f16cd61fb00faf67034268529,e3b7ea2bd0ff > > f2e20a89eccd0ebbeb4a,8edc254a94902fee2884350b4a89f223,f0858853a4542b562c6973e28ee40c19]} > 2023-05-26 15:35:08,037 INFO impl.CommitOperations: Starting: Committing file > <PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet > s > ize 110636173 > 2023-05-26 15:35:08,198 DEBUG impl.CommitOperations: Successful commit of > file length 110636173 > 2023-05-26 15:35:08,199 INFO impl.CommitOperations: Committing file > <PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet > size 110636 > 173: duration 0:00.162s > 2023-05-26 15:35:08,208 INFO commit.AbstractS3ACommitter: Loading and > committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e > 0f7457354/00/task_202305261454174742717183892533843_0031_m_000000.pendingset: > duration 0:00.294s > 2023-05-26 15:35:08,208 INFO commit.AbstractS3ACommitter: Starting: Loading > and committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448 > e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000001.pendingset > 2023-05-26 15:35:08,208 DEBUG files.PersistentCommitData: Reading commit data > from file > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/ > task_202305261454174742717183892533843_0031_m_000001.pendingset > 2023-05-26 15:35:08,232 DEBUG impl.CommitOperations: Committing single commit > SinglePendingCommit > {version=2, > uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- > 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000001_1920856/__base/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', > > destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', > > uploadId='I.5Kns.n5JKLbupr5.AWbzApL9YDGhQyI438WQT3JXUjevkQS8u1aQwQhkVbwBoxnFXItCbKJCS8EwPdtoijtc > UFrjU0eU3pNvt7lwcYRJ0iklLwXexHbYMpZq3pDj0udwsHCq0trnqyq2Ee.o.prw--', > created=1685113548923, saved=1685113548923, size=110570244, date='Fri May 26 > 15:05:48 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', > taskId='attemp t_202305261454174742717183892533843_0031_m_000001_1920856', > notes='', > etags=[835dd18d4ce995c22c1a84de5cd26fd8,c1f2acd3360b2f94ce90ef0d80edd7e5,208787fce146d917dbc50b1cf4adff4c,a66b228379201a0a278061b096dc2faf,58134b7ae228f0d2a > > 6879ec53a3f35c1,e847a7614639d3f1a43a1b3090b603e1,2d8eac2399238e6d356fe0bdfafd076e,e0c3a964ee2a3c270f67e7e4a7791440,84b05192e896e41e1d16aa363f0eb48b,c3cf764e4361da4573b8dfa6361e4174,78b145f682946c7e18750758da3578cd,d9cd438899d2 > > 0522668b9cd7f61cc098,e8b3944929f17d927ed7aef31800bdbf,52d279a75e9382ead94fe4d9b81c3bf9]} > 2023-05-26 15:35:08,232 INFO impl.CommitOperations: Starting: Committing file > <PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet > s > ize 110570244 > 2023-05-26 15:35:08,438 DEBUG impl.CommitOperations: Successful commit of > file length 110570244 > 2023-05-26 15:35:08,438 INFO impl.CommitOperations: Committing file > <PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet > size 110570 > 244: duration 0:00.206s > 2023-05-26 15:35:08,448 INFO commit.AbstractS3ACommitter: Loading and > committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e > 0f7457354/00/task_202305261454174742717183892533843_0031_m_000001.pendingset: > duration 0:00.240s > 2023-05-26 15:35:08,448 INFO commit.AbstractS3ACommitter: Starting: Loading > and committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448 > e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000002.pendingset > 2023-05-26 15:35:08,448 DEBUG files.PersistentCommitData: Reading commit data > from file > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/ > task_202305261454174742717183892533843_0031_m_000002.pendingset > 2023-05-26 15:35:08,489 DEBUG impl.CommitOperations: Committing single commit > SinglePendingCommit > {version=2, > uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- > 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000002_1920857/__base/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', > > destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', > > uploadId='AB982jhG0EAfnSxXWphX7R.QfqrkS2ipP6APqJZYGlqDl335WX4X2xn81wy7l2i5NBQrvG.eVcQgYFudk51wi5 > EMDJRNVz__SEWyF0xTncpIHpaC5xn8La.AY8gzCHQrAfRJ83nErQhe4Idmhcs04w--', > created=1685113553072, saved=1685113553072, size=110136058, date='Fri May 26 > 15:05:53 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', > taskId='attemp t_202305261454174742717183892533843_0031_m_000002_1920857', > notes='', > etags=[f043628528fc9bdd6db4107fd175eb1d,11e36b4bde35bdb1d2e160ffd5fcc3f6,5105cb3ebee1b27cfec66f7682679316,1a85e5c9477e855e38f7bd4e81bed28e,7c57360865e12c19b > > 3125cbfb8c786f8,59d928629e7afff4c4c038a1dadd0ce7,bfcedee2f51b45c20a097fbbae85a48f,7e3ff2665714d8cdc6ec20134856eb4c,0117e5c47f94a9ff4d8d0eafd1f6b76e,3c4f27e69544e4cc5297c56413231639,a32cd6422aaae63bdcbaafa7400ab889,1d5d07f0a0b2 > > 58773993a456bc7ec7ee,a220e786aa36f643edc330a184e23d88,e19a85ab3accaa4dc697e7dfbf5d5325]} > 2023-05-26 15:35:08,489 INFO impl.CommitOperations: Starting: Committing file > <PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet > s > ize 110136058 > 2023-05-26 15:35:08,646 DEBUG impl.CommitOperations: Successful commit of > file length 110136058 > 2023-05-26 15:35:08,646 INFO impl.CommitOperations: Committing file > <PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet > size 110136058: duration 0:00.157s > {code} > Magic Committer spark 3.4.0 > > {code:java} > 2023-05-24 09:47:19,906 INFO yarn.YarnAllocator: Driver requested a total > number of 524 executor(s) for resource profile id: 0. > 2023-05-24 09:47:23,064 INFO commit.AbstractS3ACommitter: Starting: > committing the output of 36000 task(s) > 2023-05-24 09:47:23,069 DEBUG commit.AbstractS3ACommitter: Task committer > attempt_202305240933298436217226765687147_0000_m_000000_0: creating thread > pool of size 32 > 2023-05-24 09:47:23,074 DEBUG commit.Tasks: Executing task > 2023-05-24 09:47:23,074 INFO commit.AbstractS3ACommitter: Starting: Loading > and committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic > /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000000.pendingset > 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task > 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading > and committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic > /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000001.pendingset > 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task > 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading > and committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic > /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000002.pendingset > 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task > 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading > and committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic > /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000003.pendingset > 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task > 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading > and committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic > /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000004.pendingset > 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task > 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading > and committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic > /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000005.pendingset > 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task > 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading > and committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic > /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000006.pendingset > 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task > 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in > file > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a > e3c5785060/task_20230524093329924571364925494992_0031_m_000004.pendingset > 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading > and committing files in pendingset > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic > /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000007.pendingset > 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in > file > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a > e3c5785060/task_20230524093329924571364925494992_0031_m_000003.pendingset > 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in > file > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a > e3c5785060/task_20230524093329924571364925494992_0031_m_000000.pendingset > 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in > file > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a > e3c5785060/task_20230524093329924571364925494992_0031_m_000002.pendingset > 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in > file > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a > e3c5785060/task_20230524093329924571364925494992_0031_m_000001.pendingset > 2023-05-24 09:47:23,078 DEBUG commit.Tasks: Executing task > 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in > file > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000005.pendingset > 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in > file > s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000007.pendingset{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org