Hi Dennis, Were you able to use checkpointing on s3 with native kubernetes. I am using flink 1.13.1 and did tried your solution of passing the webidentitytokencredentialsprovider.
*-Dfs.s3a.aws.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider* I am getting this error in job-manager logs - *Caused by: com.amazonaws.SdkClientException: Unable to locate specified web identity token file: /var/run/secrets/eks.amazonaws.com/serviceaccount/token <http://eks.amazonaws.com/serviceaccount/token>* Describing the pod shows that that volume is mounted to the pod. Is there anything specific that needs to be done as on the same EKS cluster for testing I ran a sample pod with aws cli image and it's able to do *ls* on the same s3 bucket. Thanks, Hemant On Mon, Oct 11, 2021 at 1:56 PM Denis Nutiu <denis.nu...@gmail.com> wrote: > Hi Rommel, > > > > Thanks for getting back to me and for your time. > > I switched to the Hadoop plugin and used the following authentication > method that worked: > *fs.s3a.aws.credentials.provider: > "com.amazonaws.auth.WebIdentityTokenCredentialsProvider"* > > > Turns out that I was using the wrong credentials provider. Reading > AWSCredentialProvider[1] and seeing that I have the > AWS_WEB_IDENTITY_TOKEN_FILE variable in the container allowed me to find > the correct one. > > > [1] > https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/AWSCredentialsProvider.html > > > Best, > > Denis > > > > > > *From:* Rommel Holmes <rommelhol...@gmail.com> > *Sent:* Saturday, October 9, 2021 02:09 > *To:* Denis Nutiu <denis.nu...@gmail.com> > *Cc:* user <user@flink.apache.org> > *Subject:* Re: Flink S3 Presto Checkpointing Permission Forbidden > > > > You already have s3 request ID, you can easily reach out to AWS tech > support to know what account was used to write to S3. I guess that account > probably doesn't have permission to do the following: > > > > "s3:GetObject", > "s3:PutObject", > "s3:DeleteObject", > "s3:ListBucket" > > Then grant the account with the permission in k8s. Then you should be good > to go. > > > > > > > > > > On Fri, Oct 8, 2021 at 6:06 AM Denis Nutiu <denis.nu...@gmail.com> wrote: > > Hello, > > > > I'm trying to deploy my Flink cluster inside of an AWS EKS using Flink > Native. I want to use S3 as a filesystem for checkpointing, and giving the > following options related to flink-s3-fs-presto: > > > > "-Dhive.s3.endpoint": "https://s3.eu-central-1.amazonaws.com" > "-Dhive.s3.iam-role": "arn:aws:iam::xxx:role/s3-flink" > "-Dhive.s3.use-instance-credentials": "true" > "-Dcontainerized.master.env.ENABLE_BUILT_IN_PLUGINS": > "flink-s3-fs-presto-1.13.2.jar" > "-Dcontainerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS": > "flink-s3-fs-presto-1.13.2.jar" > "-Dstate.backend": "rocksdb" > "-Dstate.backend.incremental": "true" > "-Dstate.checkpoints.dir": "s3://bucket/checkpoints/" > "-Dstate.savepoints.dir": "s3://bucket/savepoints/" > > > > But my job fails with: > > > > 2021-10-08 11:38:49,771 WARN > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Could > not properly dispose the private states in the pending checkpoint 45 of job > 75bdd6fb6e689961ef4e096684e867bc. > com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException: > com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: > Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: > JEZ3X8YPDZ2TF4T9; S3 Extended Request ID: > u2RBcDpifTnzO4hIOGqgTOKDY+nw6iSeSepd4eYThITCPCpVddIUGMU7jY5DpJBg1LkPuYXiH9c=; > Proxy: null), S3 Extended Request ID: > u2RBcDpifTnzO4hIOGqgTOKDY+nw6iSeSepd4eYThITCPCpVddIUGMU7jY5DpJBg1LkPuYXiH9c= > (Path: s3://bucket/checkpoints/75bdd6fb6e689961ef4e096684e867bc/chk-45) > at > com.facebook.presto.hive.s3.PrestoS3FileSystem.lambda$getS3ObjectMetadata$2(PrestoS3FileSystem.java:573) > ~[?:?] > at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138) ~[?:?] > at > com.facebook.presto.hive.s3.PrestoS3FileSystem.getS3ObjectMetadata(PrestoS3FileSystem.java:560) > ~[?:?] > at > com.facebook.presto.hive.s3.PrestoS3FileSystem.getFileStatus(PrestoS3FileSystem.java:311) > ~[?:?] > at > com.facebook.presto.hive.s3.PrestoS3FileSystem.directory(PrestoS3FileSystem.java:450) > ~[?:?] > at > com.facebook.presto.hive.s3.PrestoS3FileSystem.delete(PrestoS3FileSystem.java:427) > ~[?:?] > at > org.apache.flink.fs.s3presto.common.HadoopFileSystem.delete(HadoopFileSystem.java:160) > ~[?:?] > at > org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.delete(PluginFileSystemFactory.java:155) > ~[flink-dist_2.11-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.disposeOnFailure(FsCheckpointStorageLocation.java:117) > ~[flink-dist_2.11-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.discard(PendingCheckpoint.java:588) > ~[flink-dist_2.11-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:60) > ~[flink-dist_2.11-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanup$2(CheckpointsCleaner.java:85) > ~[flink-dist_2.11-1.13.2.jar:1.13.2] > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > [?:?] > at java.util.concurrent.FutureTask.run(Unknown Source) [?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown > Source) [?:?] > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?] > at java.lang.Thread.run(Unknown Source) [?:?] > Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden > (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request > ID: JEZ3X8YPDZ2TF4T9; S3 Extended Request ID: > u2RBcDpifTnzO4hIOGqgTOKDY+nw6iSeSepd4eYThITCPCpVddIUGMU7jY5DpJBg1LkPuYXiH9c=; > Proxy: null) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1811) > ~[?:?] > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1395) > ~[?:?] > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1371) > ~[?:?] > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) > ~[?:?] > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) > ~[?:?] > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) > ~[?:?] > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) > ~[?:?] > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) > ~[?:?] > at > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) > ~[?:?] > at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) > ~[?:?] > at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) > ~[?:?] > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062) > ~[?:?] > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008) > ~[?:?] > at > com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1338) > ~[?:?] > at > com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1312) > ~[?:?] > at > com.facebook.presto.hive.s3.PrestoS3FileSystem.lambda$getS3ObjectMetadata$2(PrestoS3FileSystem.java:563) > ~[?:?] > ... 17 more > > > > I can't figure out if it's a permission error or a configuration error of > the Presto S3 plugin. > > > > The EKS pod has the following environment variables: > > Environment: > ENABLE_BUILT_IN_PLUGINS: flink-s3-fs-presto-1.13.2.jar > FLINK_TM_JVM_MEM_OPTS: -Xmx536870902 -Xms536870902 > -XX:MaxDirectMemorySize=268435458 -XX:MaxMetaspaceSize=268435456 > AWS_DEFAULT_REGION: eu-central-1 > AWS_REGION: eu-central-1 > AWS_ROLE_ARN: arn:aws:iam::xxx:role/s3-flink > AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/ > eks.amazonaws.com/serviceaccount/token > > > > Has anyone managed to deploy Flink with IAM access to S3 for checkpointing > on AWS? Could you please share some working flink-s3-fs-presto or > flink-s3-fs-hadoop plugin configuration with IAM authentication to S3? > > > -- > > Best, > > Denis Nutiu > > > > > -- > > Yours > Rommel > *************************************** > * I waited patiently for the **LORD* > > > > > *; he turned to me and heard my cry. He lifted me out of the slimy > pit, out of the mud and mire; he set my feet on a rock and gave me a > firm place to stand. * > > *************************************** >