How to Set S3 Credentials at bucket level in Iceberg Spark Session

Awasthi, Somesh Mon, 22 Apr 2024 08:49:13 -0700

Hi Jack/Dev Team,

We want to pass separate credential for source reading data from s3 and 
separate credential for target writing data to s3 using glue catalog, but now 
we are unable to set credential at bucket level and not able get any help from 
any forum.


Could you please check and help me asap or guide me with the right forum to get 
it resolve.

Currently we are following below two approaches to set s3 credentials through 
code.

Approach1. We are setting s3 credentials through System's property.

val spark = SparkSession.builder().master("local[*]")
      .config("spark.sql.defaultCatalog", "AwsDataCatalog")
      .config("spark.sql.extensions", 
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
      .config("spark.sql.catalog.AwsDataCatalog", 
"org.apache.iceberg.spark.SparkCatalog")
      .config("spark.sql.catalog.AwsDataCatalog.catalog-impl", 
"org.apache.iceberg.aws.glue.GlueCatalog")
      .config("spark.sql.catalog.AwsDataCatalog.io-impl", 
"org.apache.iceberg.aws.s3.S3FileIO")
      .config("spark.sql.catalog.AwsDataCatalog.s3.use-arn-region-enabled", 
"true")
      .config("spark.sql.catalog.AwsDataCatalog.s3.access-points.xxx", 
"arn:aws:s3:us-west-2:xxxxx")
      .config("spark.sql.catalog.AwsDataCatalog.s3.access-points.xxxx", 
"arn:aws:s3:ap-south-1:xxxxx")
      .getOrCreate();


 System.setProperty("aws.region", "XXXXXXXXXXXX");
     System.setProperty("aws.accessKeyId", "XXXXXXXXXXXXXXXXx")
    System.setProperty("aws.secretAccessKey", "XXXXXXXXXXXXXXXXXXx")

Approach2. CustomCredentialProvider to set S3 credentials through spark.

val spark = SparkSession.builder().master("local[*]")
      .config("spark.sql.defaultCatalog", "AwsDataCatalog")
      .config("spark.sql.extensions", 
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
      .config("spark.sql.catalog.AwsDataCatalog", 
"org.apache.iceberg.spark.SparkCatalog")
      .config("spark.sql.catalog.AwsDataCatalog.catalog-impl", 
"org.apache.iceberg.aws.glue.GlueCatalog")
      .config("spark.sql.catalog.AwsDataCatalog.io-impl", 
"org.apache.iceberg.aws.s3.S3FileIO")
      .config("spark.sql.catalog.AwsDataCatalog.client.credentials-provider", 
"CustomAwsClientFactory")
      .config("spark.sql.catalog.AwsDataCatalog.client.region", "xxxx")
      
.config("spark.sql.catalog.AwsDataCatalog.client.credentials-provider.accessKeyId",
 "XXXXXXXXXXXXXxxx")
      
.config("spark.sql.catalog.AwsDataCatalog.client.credentials-provider.secretAccessKey",
 "XXXXXXXXXXXXXXXXXXXXx")





Problem:- We want to pass separate credential for source reading data from s3 
and separate credential for target writing data to s3 using glue catalog.


Expected Solution: spark.hadoop.fs.s3a.access.key: <YOURACCESSKEY>
                    spark.hadoop.fs.s3a.secret.key: <YOURSECRETKEY>
config("spark.hadoop.fs.s3a.access.key", "XXXXXXXXXXXXXXxxx")
.config("spark.hadoop.fs.s3a.secret.key", "XXXXXXXXXXXXXXXXXXXXXXXXXxx")



TLP are consumed:-  Having iceberg-spark-runtime-3.5_2.12-1.5.0 + 
iceberg-aws-bundle-1.5.0 should be enough or not in terms of dependencies.
Currently we are following official website to integrate iceberg spark 
-https://iceberg.apache.org/docs/nightly/spark-configuration/. Using glue 
catalog.


Could you please help me if it is possible to pass credentials at bucket level 
or its limitation from iceberg side.

Thanks,
Somesh.

How to Set S3 Credentials at bucket level in Iceberg Spark Session

Reply via email to