Hi Awasthi,

How about configuring two catalogs in Spark? One points to the source data,
and another points to the target. You can configure different credentials
in that case.


Yufei


On Mon, Apr 22, 2024 at 8:49 AM Awasthi, Somesh
<soawas...@informatica.com.invalid> wrote:

> Hi Jack/Dev Team,
>
>
>
> We want to pass separate credential for source reading data from s3 and
> separate credential for target writing data to s3 using glue catalog, but
> now we are unable to set credential at bucket level and not able get any
> help from any forum.
>
> Could you please check and help me asap or guide me with the right forum
> to get it resolve.
>
>
>
> Currently we are following below two approaches to set s3 credentials
> through code.
>
>
>
> *Approach1. We are setting s3 credentials through System’s property.*
>
>
>
> *val* spark = SparkSession.builder().master("local[*]")
>
>       .config("spark.sql.defaultCatalog", "AwsDataCatalog")
>
>       .config("spark.sql.extensions",
> "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
>
>       .config("spark.sql.catalog.AwsDataCatalog",
> "org.apache.iceberg.spark.SparkCatalog")
>
>       .config("spark.sql.catalog.AwsDataCatalog.catalog-impl",
> "org.apache.iceberg.aws.glue.GlueCatalog")
>
>       .config("spark.sql.catalog.AwsDataCatalog.io-impl",
> "org.apache.iceberg.aws.s3.S3FileIO")
>
>       
> .*config*("spark.sql.catalog.AwsDataCatalog.s3.use-*arn*-region-enabled",
> "true")
>
>       .*config*("spark.sql.catalog.AwsDataCatalog.s3.access-points.xxx",
> "arn:aws:s3:us-west-2:xxxxx")
>
>       .*config*("spark.sql.catalog.AwsDataCatalog.s3.access-points.xxxx",
> "arn:aws:s3:*ap*-south-1:xxxxx")
>
>       .getOrCreate();
>
>
>
>
>
>  System.setProperty("aws.region", "XXXXXXXXXXXX");
>
>      System.setProperty("aws.accessKeyId", "XXXXXXXXXXXXXXXXx")
>
>     System.setProperty("aws.secretAccessKey", "XXXXXXXXXXXXXXXXXXx")
>
>
>
> *Approach2. CustomCredentialProvider to set S3 credentials through spark.*
>
>
>
> *val* spark = SparkSession.builder().master("local[*]")
>
>       .config("spark.sql.defaultCatalog", "AwsDataCatalog")
>
>       .config("spark.sql.extensions",
> "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
>
>       .config("spark.sql.catalog.AwsDataCatalog",
> "org.apache.iceberg.spark.SparkCatalog")
>
>       .config("spark.sql.catalog.AwsDataCatalog.catalog-impl",
> "org.apache.iceberg.aws.glue.GlueCatalog")
>
>       .config("spark.sql.catalog.AwsDataCatalog.io-impl",
> "org.apache.iceberg.aws.s3.S3FileIO")
>
>       .config(
> "spark.sql.catalog.AwsDataCatalog.client.credentials-provider", "
> *CustomAwsClientFactory*")
>
>       .config("spark.sql.catalog.AwsDataCatalog.client.region", "xxxx")
>
>       .config(
> "spark.sql.catalog.AwsDataCatalog.client.credentials-provider.accessKeyId",
> "XXXXXXXXXXXXXxxx")
>
>       .config(
> "spark.sql.catalog.AwsDataCatalog.client.credentials-provider.secretAccessKey",
> "XXXXXXXXXXXXXXXXXXXXx")
>
>
>
>
>
>
>
>
>
>
>
> *Problem:- We want to pass separate credential for source reading data
> from s3 and separate credential for target writing data to s3 using glue
> catalog.*
>
>
>
> *Expected Solution:* spark.hadoop.fs.s3a.access.key: <YOURACCESSKEY>
>
>                     spark.hadoop.fs.s3a.secret.key: <YOURSECRETKEY>
>
> *config*("spark.hadoop.fs.s3a.access.key", "XXXXXXXXXXXXXXxxx")
>
> .*config*("spark.hadoop.fs.s3a.secret.key", "XXXXXXXXXXXXXXXXXXXXXXXXXxx")
>
>
>
>
>
>
>
> *TLP are consumed:-  **Having **iceberg-spark-runtime-3.5_2.12-1.5.0** + *
> *iceberg-aws-bundle-1.5.0*
> * should be enough or not in terms of dependencies. *Currently we are
> following official website to integrate iceberg spark -
> https://iceberg.apache.org/docs/nightly/spark-configuration/. Using glue
> catalog.
>
>
>
>
>
> Could you please help me if it is possible to pass credentials at bucket
> level or its limitation from iceberg side.
>
>
>
> Thanks,
>
> Somesh.
>
>
>
>
>
>
>
>
>
>
>

Reply via email to