For iceberg tables stored in AWS S3 buckets, knowing the region of the bucket
is critical for engines using vended credentials (when configured) to access a
table.
E.g - the vended credentials for AWS look like this
{ "s3.access-key-id": "ASI....”,
"s3.secret-access-key": "gbVT9PpFBY...”,
"s3.session-token": "IQoJb3JpZ2luX2VjEN3//////////...”,
"expiration-time": “1725572949000” }
An engine consuming this, would need to either infer (s3api
get-bucket-location) the region or ask the end user to provide the region
separately which misses the point of vended credentials.
A engine engine cannot use get-bucket-location, because the credential
generation explicitly allows only s3:GetObject, s3:GetObjectVersion,
s3:PutObject, s3:DeletObject, s3:ListBucket for the table location prefix.
Refer -
org.apache.polaris.core.storage.aws.AwsCredentialsStorageIntegration#policyString
I propose that
- the storage setup for S3 should have parameter for the bucket region
(org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo)
- if the parameter is not specified, then Polaris attempts to look up
(get-bucket-location) the region.
- the information is returned in vended credentials (if enabled) as
"s3.region”:…
Note - another option could be to allow ’s3:GetBucketLocation’ in the
policyString when generating vended credentials’ iam role, but that is sub
optimal and therefore I am not proposing it. It would engines to make multiple
get-bucket-location calls - one per table being looked up.
--
aniket