Hello Team,
Does Spark support role-based authentication and access to Amazon S3 for
Kubernetes deployment?
*Note: we have deployed our spark application in the Kubernetes cluster.*
Below are the Hadoop-AWS dependencies we are using:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.3.4</version>
</dependency>
We are using the following configuration when creating the spark session,
but it is not working::
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.aws.credentials.provider",
"org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider");
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.assumed.role.arn",
System.getenv("AWS_ROLE_ARN"));
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.assumed.role.credentials.provider",
"com.amazonaws.auth.WebIdentityTokenCredentialsProvider");
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.assumed.role.sts.endpoint",
"s3.eu-central-1.amazonaws.com");
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.assumed.role.sts.endpoint.region",
Regions.EU_CENTRAL_1.getName());
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.web.identity.token.file",
System.getenv("AWS_WEB_IDENTITY_TOKEN_FILE"));
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.assumed.role.session.duration",
"30m");
Thank you!
Regards,
Atul