[
https://issues.apache.org/jira/browse/HADOOP-19399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948485#comment-17948485
]
ASF GitHub Bot commented on HADOOP-19399:
-----------------------------------------
steveloughran commented on code in PR #7443:
URL: https://github.com/apache/hadoop/pull/7443#discussion_r2068647513
##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/crt_client.md:
##########
@@ -0,0 +1,66 @@
+<!---
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. See accompanying LICENSE file.
+-->
+
+# Using the S3 CRT client
+
+This document explains usage of the CRT client.
+
+The [AWS CRT-based S3
client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html)
+is built on top of the AWS Common Runtime (CRT), is an alternative S3
asynchronous client. It can provide higher
+transfer throughput from S3 due to its enhanced connection pool management.
More information
+can be found
+[here](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/).
+
+When making multiple parallel GET requests, using the CRT ensures load is
evenly distributed across S3. This can be
+useful for all three input streams available with versions > 3.4.2, as:
+* The Analytics accelerator will make parallel GETs for columns it predicts
will be required on a Parquet file open.
+* The Prefetch input stream can make up to 8 parallel 8MB GETs by default.
+* The Classic input stream will make async parallel GETs for column reads when
using VectoredIO.
+
+Since the CRT client is an async client, when enabled in S3A, it will
currently be used wherever
+* When reading data using the Analytics stream.
+* When copying files between buckets using the Transfer manager, for example,
on a rename.
+* When copying from the local file system to S3.
+
+This is because all other operations in S3A currently use the S3 Sync client,
and so cannot be replaced by an
+asynchronous client. The move to an async client is tracked in
+[HADOOP-18877](https://issues.apache.org/jira/browse/HADOOP-18877).
+
+## Enabling the CRT Client
+The CRT client can be enabled as follows:
+
+```
Review Comment:
nit: xml
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java:
##########
@@ -371,14 +426,23 @@ private static URI buildURI(String scheme, String host,
int port) {
* @param clientConfig AWS SDK configuration to update
*/
private static void initUserAgent(Configuration conf,
- ClientOverrideConfiguration.Builder clientConfig) {
+ ClientOverrideConfiguration.Builder clientConfig) {
+ clientConfig.putAdvancedOption(SdkAdvancedClientOption.USER_AGENT_PREFIX,
getUserAgent(conf));
+ }
+
+ /**
+ * Builds user agent string
Review Comment:
nit: .
##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/crt_client.md:
##########
@@ -0,0 +1,66 @@
+<!---
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. See accompanying LICENSE file.
+-->
+
+# Using the S3 CRT client
+
+This document explains usage of the CRT client.
+
+The [AWS CRT-based S3
client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html)
+is built on top of the AWS Common Runtime (CRT), is an alternative S3
asynchronous client. It can provide higher
+transfer throughput from S3 due to its enhanced connection pool management.
More information
+can be found
+[here](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/).
+
+When making multiple parallel GET requests, using the CRT ensures load is
evenly distributed across S3. This can be
Review Comment:
S3 front end servers, S3 back end stores -or both?
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java:
##########
@@ -212,6 +246,27 @@ public static RetryPolicy.Builder
createRetryPolicyBuilder(Configuration conf) {
return retryPolicyBuilder;
}
+
+ private static S3CrtProxyConfiguration mapCRTProxyConfiguration(
+ software.amazon.awssdk.http.nio.netty.ProxyConfiguration
proxyConfiguration) {
Review Comment:
this the same for the unshaded SDK JARs?
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java:
##########
@@ -175,6 +163,39 @@ public S3AsyncClient createS3AsyncClient(
return s3AsyncClientBuilder.build();
}
+ private S3AsyncClient createS3CrtAsyncClient(URI uri,
S3ClientCreationParameters parameters)
Review Comment:
Safe if these are the only places of this source file where the crt classes
are referenced.
##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/crt_client.md:
##########
@@ -0,0 +1,66 @@
+<!---
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. See accompanying LICENSE file.
+-->
+
+# Using the S3 CRT client
+
+This document explains usage of the CRT client.
+
+The [AWS CRT-based S3
client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html)
+is built on top of the AWS Common Runtime (CRT), is an alternative S3
asynchronous client. It can provide higher
+transfer throughput from S3 due to its enhanced connection pool management.
More information
+can be found
+[here](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/).
+
+When making multiple parallel GET requests, using the CRT ensures load is
evenly distributed across S3. This can be
+useful for all three input streams available with versions > 3.4.2, as:
Review Comment:
`>=`
##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md:
##########
@@ -50,6 +50,7 @@ full details.
* [Auditing Architecture](./auditing_architecture.html).
* [Testing](./testing.html)
* [S3Guard](./s3guard.html)
+* [Using the S3 CRT Client](./crt_client.html).
Review Comment:
1. point to this in the reading.md file
2. in performance.md, say that the analytics stream and crt client may also
provide performance improvements
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java:
##########
@@ -1837,12 +1837,26 @@ private Constants() {
*/
public static final String S3A_IO_RATE_LIMIT = "fs.s3a.io.rate.limit";
-
/**
* Prefix to configure Analytics Accelerator Library.
* Value: {@value}.
*/
public static final String ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX =
"fs.s3a.analytics.accelerator";
+ /**
+ * Flag to enable the CRT client. Value {@value}.
+ */
+ public static final String CRT_CLIENT_ENABLED = "fs.s3a.crt.enabled";
+
+ /**
+ * Default value for {@link #DEFAULT_CRT_ENABLED}.
+ * Value: {@value}.
+ */
+ public static final boolean DEFAULT_CRT_ENABLED = false;
+
+ /**
+ * Requester pays header value. Value {@value}.
+ */
+ public static final String REQUESTER_PAYS_HEADER_VALUE = "requester";
Review Comment:
make an InternalConstant unless there's a need to expose it
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java:
##########
@@ -1837,12 +1837,26 @@ private Constants() {
*/
public static final String S3A_IO_RATE_LIMIT = "fs.s3a.io.rate.limit";
-
/**
* Prefix to configure Analytics Accelerator Library.
* Value: {@value}.
*/
public static final String ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX =
"fs.s3a.analytics.accelerator";
+ /**
+ * Flag to enable the CRT client. Value {@value}.
+ */
+ public static final String CRT_CLIENT_ENABLED = "fs.s3a.crt.enabled";
Review Comment:
can you export this in the filesystem hasPathCapability, and add to
InternalContants.S3A_DYNAMIC_CAPABILITIES , for bucket-info and storediag.
thanks.
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java:
##########
@@ -709,6 +742,18 @@ public void setEncryptionSecrets(final EncryptionSecrets
secrets) {
encryptionSecrets = secrets;
}
+ private void addRequestLevelHeaders(AwsRequest.Builder requestBuilder) {
+ AwsRequestOverrideConfiguration.Builder builder =
AwsRequestOverrideConfiguration.builder();
+
+ if (isRequesterPays) {
+ builder.putHeader(REQUESTER_PAYS_HEADER, REQUESTER_PAYS_HEADER_VALUE);
+ }
+
+ builder.putHeader(USER_AGENT, userAgent);
Review Comment:
not for this PR, but maybe in future we could add the auditor ? strings here
so it gets through the CRT *and* all the way to cloudtrail
> S3A: Add support for CRT client
> -------------------------------
>
> Key: HADOOP-19399
> URL: https://issues.apache.org/jira/browse/HADOOP-19399
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.1
> Reporter: Ahmar Suhail
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.2
>
>
> * Allow ClientManager to initialise a CRT client
> * Should be optional, java async client to still be supported as defaultĀ
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]