FrankChen021 commented on code in PR #19609:
URL: https://github.com/apache/druid/pull/19609#discussion_r3452101591
##########
processing/src/main/java/org/apache/druid/data/input/impl/CloudObjectLocation.java:
##########
@@ -64,8 +67,9 @@ public CloudObjectLocation(@JsonProperty("bucket") String
bucket, @JsonProperty(
"bucket name cannot be null. Please verify if bucket name
adheres to naming rules");
this.path =
Preconditions.checkNotNull(StringUtils.maybeRemoveLeadingSlash(path));
Preconditions.checkArgument(
- this.bucket.equals(StringUtils.urlEncode(this.bucket)),
- "bucket must follow DNS-compliant naming conventions"
+ this.bucket.equals(StringUtils.urlEncode(this.bucket)) ||
isS3Arn(this.bucket),
Review Comment:
[P2] Normalize slash-form ARN buckets before accepting them
This now accepts both `accesspoint:` and `accesspoint/` ARN buckets, but
stores the slash form unchanged. AWS documents access point/MRAP ARNs with the
slash separator, and direct JSON `objects` or catalog bucket+path inputs can
now construct `CloudObjectLocation` with that value. Later `toUri()` writes the
raw bucket into `s3://%s/%s`; with `arn:...:accesspoint/bucket.mrap`, the
access-point name becomes part of the URI path, so code that re-reads the URI
via `getAuthority()` sees only `arn:...:accesspoint` as the bucket. The
deep-storage config/loadSpec paths normalize this, but direct
`CloudObjectLocation` construction does not. Please normalize slash-form S3 ARN
buckets here, or reject slash-form ARNs from this class and require the
internal colon form.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]