vivek807 opened a new issue, #19608:
URL: https://github.com/apache/druid/issues/19608

   ### Description
   
   Add support for AWS S3 Multi-Region Access Points (MRAPs) and S3 Access 
Point ARNs in Druid's S3 extension.
   
   Currently, the bucket field in Druid's S3 configuration only accepts 
standard DNS-compliant bucket names. AWS Access Point ARNs (eg., 
`arn:aws:s3::123456789123:accesspoint:bucket.mrap`) are rejected at 
construction time in `CloudObjectLocation` because they fail the URL-encoding 
equality check used to enforce DNS naming rules. Additionally, some tools 
produce ARNs with a slash separator (accesspoint/alias) instead of the 
colon-delimited form (accesspoint:alias) expected by the AWS SDK, causing 
further failures downstream. 
    
   This change: 
   - Relaxes the bucket name validation in CloudObjectLocation to permit valid 
S3 Access Point ARNs alongside DNS-compliant names. 
   - Adds S3Utils.normalizeBucketName() to canonicalize the slash-delimited 
form to the colon-delimited form at ingestion points 
(S3DataSegmentPusherConfig, S3LoadSpec). 
   - Supports both regional Access Point ARNs 
(`arn:aws:s3:<region>:<account>:accesspoint:<name>`) and MRAP ARNs 
(`arn:aws:s3::<account>:accesspoint:<name>.mrap`). 
    
   No API surface changes; the bucket configuration field continues to accept 
plain bucket names unchanged.
   
   ### Motivation
   
   **Use case**
   
   AWS Multi-Region Access Points provide a single global S3 endpoint that 
routes requests to the nearest healthy bucket replica across regions. Operators 
use MRAPs for:
   
   - Active-active multi-region Druid deployments backed by S3 Cross-Region 
Replication (CRR).
   - Disaster recovery setups where deep storage must remain accessible during 
a regional outage.
   - Simplifying Druid configuration across regions — one ARN in 
druid.storage.bucket instead of per-region overrides.
   - Access Point ARNs more broadly (single-region) are also used to enforce 
fine-grained IAM access controls on shared buckets without exposing the bucket 
name.
   
   Why the current behavior blocks this
   
   CloudObjectLocation enforces:
   ```java
   Preconditions.checkArgument(
   this.bucket.equals(StringUtils.urlEncode(this.bucket)),
   "bucket must follow DNS-compliant naming conventions"
   );
   ```
   
   An ARN like `arn:aws:s3::123456789123:accesspoint:bucket.mrap` URL-encodes 
to `arn:aws:s3::123456789123:accesspoint:bucket.mrap`, so the check always 
fails. There is no escape hatch. Users who configure an MRAP ARN as the Druid 
storage bucket receive an IllegalArgumentException at startup with no 
workaround short of patching the code.
   
   The AWS SDK for Java (v1 and v2) accepts ARN strings wherever a bucket name 
is expected, so no SDK-level changes are required. The fix is purely a 
validation relaxation and a normalization helper.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to