Re: create iceberg on minio s3 got "The AWS Access Key Id you provided does not exist in our records."

Jack Ye Tue, 17 Aug 2021 11:40:01 -0700

Good to hear the issue is fixed!

ACL is optional, as the javadoc says, "If not set, ACL will not be set for
requests".


But I think to use MinIO you need to use a custom client factory to set
your S3 endpoint as that MinIO endpoint.

-Jack

On Tue, Aug 17, 2021 at 11:36 AM Lian Jiang <jiangok2...@gmail.com> wrote:

> Hi Ryan,
>
> S3FileIO need canned ACL according to:
>
>   /**
>    * Used to configure canned access control list (ACL) for S3 client to
> use during write.
>    * If not set, ACL will not be set for requests.
>    * <p>
>    * The input must be one of {@link
> software.amazon.awssdk.services.s3.model.ObjectCannedACL},
>    * such as 'public-read-write'
>    * For more details:
> https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html
>    */
>   public static final String S3FILEIO_ACL = "s3.acl";
>
>
> Minio does not support canned ACL according to
> https://docs.min.io/docs/minio-server-limits-per-tenant.html:
>
> List of Amazon S3 Bucket API's not supported on MinIO
>
>    - BucketACL (Use bucket policies
>    <https://docs.min.io/docs/minio-client-complete-guide#policy> instead)
>    - BucketCORS (CORS enabled by default on all buckets for all HTTP
>    verbs)
>    - BucketWebsite (Use caddy <https://github.com/caddyserver/caddy> or
>    nginx <https://www.nginx.com/resources/wiki/>)
>    - BucketAnalytics, BucketMetrics, BucketLogging (Use bucket
>    notification
>    <https://docs.min.io/docs/minio-client-complete-guide#events> APIs)
>    - BucketRequestPayment
>
> List of Amazon S3 Object API's not supported on MinIO
>
>    - ObjectACL (Use bucket policies
>    <https://docs.min.io/docs/minio-client-complete-guide#policy> instead)
>    - ObjectTorrent
>
>
>
> Hope this makes sense.
>
> BTW, iceberg + Hive + S3A works after Hive using S3A issue has been fixed.
> Thanks Jack for helping debugging.
>
>
>
> On Tue, Aug 17, 2021 at 8:38 AM Ryan Blue <b...@tabular.io> wrote:
>
>> I'm not sure that I'm following why MinIO won't work with S3FileIO.
>> S3FileIO assumes that the credentials are handled by a credentials provider
>> outside of S3FileIO. How does MinIO handle credentials?
>>
>> Ryan
>>
>> On Mon, Aug 16, 2021 at 7:57 PM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> Talked with Lian on Slack, the user is using a hadoop 3.2.1 + hive
>>> (postgres) + spark + minio docker installation. There might be some S3A
>>> related dependencies missing on the Hive server side based on the stack
>>> trace. Let's see if that fixes the issue.
>>> -Jack
>>>
>>> On Mon, Aug 16, 2021 at 7:32 PM Lian Jiang <jiangok2...@gmail.com>
>>> wrote:
>>>
>>>> This is my full script launching spark-shell:
>>>>
>>>> # add Iceberg dependency
>>>> export AWS_REGION=us-east-1
>>>> export AWS_ACCESS_KEY_ID=minio
>>>> export AWS_SECRET_ACCESS_KEY=minio123
>>>>
>>>> ICEBERG_VERSION=0.11.1
>>>>
>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0"
>>>>
>>>> MINIOSERVER=192.168.176.5
>>>>
>>>>
>>>> # add AWS dependnecy
>>>> AWS_SDK_VERSION=2.15.40
>>>> AWS_MAVEN_GROUP=software.amazon.awssdk
>>>> AWS_PACKAGES=(
>>>>     "bundle"
>>>>     "url-connection-client"
>>>> )
>>>> for pkg in "${AWS_PACKAGES[@]}"; do
>>>>     DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
>>>> done
>>>>
>>>> # start Spark SQL client shell
>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>     --conf
>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>     --conf
>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.hadoop.HadoopFileIO 
>>>> \
>>>>     --conf spark.sql.catalog.hive_test.warehouse=s3a://east/warehouse \
>>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \
>>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>>     --conf
>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>>
>>>>
>>>> Let me know if anything is missing. Thanks.
>>>>
>>>> On Mon, Aug 16, 2021 at 7:29 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>
>>>>> Have you included the hadoop-aws jar?
>>>>> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws
>>>>> -Jack
>>>>>
>>>>> On Mon, Aug 16, 2021 at 7:09 PM Lian Jiang <jiangok2...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Jack,
>>>>>>
>>>>>> You are right. S3FileIO will not work on minio since minio does not
>>>>>> support ACL:
>>>>>> https://docs.min.io/docs/minio-server-limits-per-tenant.html
>>>>>>
>>>>>> To use iceberg, minio + s3a, I used below script to launch
>>>>>> spark-shell:
>>>>>>
>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>>     --conf
>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>> *    --conf
>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.hadoop.HadoopFileIO
>>>>>> \*
>>>>>>     --conf spark.sql.catalog.hive_test.warehouse=s3a://east/warehouse
>>>>>> \
>>>>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \
>>>>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>>>>     --conf
>>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>>>>
>>>>>>
>>>>>>
>>>>>> *The spark code:*
>>>>>>
>>>>>> import org.apache.spark.sql.SparkSession
>>>>>> val values = List(1,2,3,4,5)
>>>>>>
>>>>>> val spark = SparkSession.builder().master("local").getOrCreate()
>>>>>> import spark.implicits._
>>>>>> val df = values.toDF()
>>>>>>
>>>>>> val core = "mytable"
>>>>>> val table = s"hive_test.mydb.${core}"
>>>>>> val s3IcePath = s"s3a://east/${core}.ice"
>>>>>>
>>>>>> df.writeTo(table)
>>>>>>     .tableProperty("write.format.default", "parquet")
>>>>>>     .tableProperty("location", s3IcePath)
>>>>>>     .createOrReplace()
>>>>>>
>>>>>>
>>>>>> *Still the same error:*
>>>>>> java.lang.ClassNotFoundException: Class
>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>
>>>>>>
>>>>>> What else could be wrong? Thanks for any clue.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 16, 2021 at 9:35 AM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>>
>>>>>>> Sorry for the late reply, I thought I replied on Friday but the
>>>>>>> email did not send successfully.
>>>>>>>
>>>>>>> As Daniel said, you don't need to setup S3A if you are using
>>>>>>> S3FileIO.
>>>>>>>
>>>>>>> Th S3FileIO by default reads the default credentials chain to check
>>>>>>> credential setups one by one:
>>>>>>> https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html#credentials-chain
>>>>>>>
>>>>>>> If you would like to use a specialized credential provider, you can
>>>>>>> directly customize your S3 client:
>>>>>>> https://iceberg.apache.org/aws/#aws-client-customization
>>>>>>>
>>>>>>> It looks like you are trying to use MinIO to mount S3A file system?
>>>>>>> If you have to use MinIO then there is not a way to integrate with 
>>>>>>> S3FileIO
>>>>>>> right now. (maybe I am wrong on this, I don't know much about MinIO)
>>>>>>>
>>>>>>> To directly use S3FileIO with HiveCatalog, simply do:
>>>>>>>
>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>>>     --conf
>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>>>     --conf
>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
>>>>>>>     --conf spark.sql.catalog.hive_test.warehouse=s3://bucket
>>>>>>>
>>>>>>> Best,
>>>>>>> Jack Ye
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 15, 2021 at 2:53 PM Lian Jiang <jiangok2...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks. I prefer S3FileIO as it is recommended by iceberg. Do you
>>>>>>>> have a sample using hive catalog, s3FileIO, spark API (as opposed to 
>>>>>>>> SQL),
>>>>>>>> S3 access.key and secret.key? It is hard to get all settings right for 
>>>>>>>> this
>>>>>>>> combination without an example. Appreciate any help.
>>>>>>>>
>>>>>>>> On Fri, Aug 13, 2021 at 6:01 PM Daniel Weeks <
>>>>>>>> daniel.c.we...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> So, if I recall correctly, the hive server does need access to
>>>>>>>>> check and create paths for table locations.
>>>>>>>>>
>>>>>>>>> There may be an option to disable this behavior, but otherwise the
>>>>>>>>> fs implementation probably needs to be available to the hive 
>>>>>>>>> metastore.
>>>>>>>>>
>>>>>>>>> -Dan
>>>>>>>>>
>>>>>>>>> On Fri, Aug 13, 2021, 4:48 PM Lian Jiang <jiangok2...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Daniel.
>>>>>>>>>>
>>>>>>>>>> After modifying the script to,
>>>>>>>>>>
>>>>>>>>>> export AWS_REGION=us-east-1
>>>>>>>>>> export AWS_ACCESS_KEY_ID=minio
>>>>>>>>>> export AWS_SECRET_ACCESS_KEY=minio123
>>>>>>>>>>
>>>>>>>>>> ICEBERG_VERSION=0.11.1
>>>>>>>>>>
>>>>>>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0"
>>>>>>>>>>
>>>>>>>>>> MINIOSERVER=192.168.160.5
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> # add AWS dependnecy
>>>>>>>>>> AWS_SDK_VERSION=2.15.40
>>>>>>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk
>>>>>>>>>> AWS_PACKAGES=(
>>>>>>>>>>     "bundle"
>>>>>>>>>>     "url-connection-client"
>>>>>>>>>> )
>>>>>>>>>> for pkg in "${AWS_PACKAGES[@]}"; do
>>>>>>>>>>     DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
>>>>>>>>>> done
>>>>>>>>>>
>>>>>>>>>> # start Spark SQL client shell
>>>>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>>>>>>     --conf
>>>>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>>>>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000
>>>>>>>>>> \
>>>>>>>>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>>>>>>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>>>>>>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>>>>>>>>     --conf
>>>>>>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>>>>>>>>
>>>>>>>>>> I got: MetaException: java.lang.RuntimeException:
>>>>>>>>>> java.lang.ClassNotFoundException: Class
>>>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found. My hive server is 
>>>>>>>>>> not
>>>>>>>>>> using s3 and should not cause this error. Any ideas? Thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I got "ClassNotFoundException: Class
>>>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found". Any idea what 
>>>>>>>>>> dependency
>>>>>>>>>> could I miss?
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 13, 2021 at 4:03 PM Daniel Weeks <
>>>>>>>>>> daniel.c.we...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey Lian,
>>>>>>>>>>>
>>>>>>>>>>> At a cursory glance, it appears that you might be mixing two
>>>>>>>>>>> different FileIO implementations, which may be why you are not 
>>>>>>>>>>> getting the
>>>>>>>>>>> expected result.
>>>>>>>>>>>
>>>>>>>>>>> When you set: --conf
>>>>>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO
>>>>>>>>>>>  you're
>>>>>>>>>>> actually switching over to the native S3 implementation within 
>>>>>>>>>>> Iceberg (as
>>>>>>>>>>> opposed to S3AFileSystem via HadoopFileIO).  However, all of the 
>>>>>>>>>>> following
>>>>>>>>>>> settings to setup access are then set for the S3AFileSystem (which 
>>>>>>>>>>> would
>>>>>>>>>>> not be used with S3FileIO).
>>>>>>>>>>>
>>>>>>>>>>> You might try just removing that line since it should use the
>>>>>>>>>>> HadoopFileIO at that point and may work.
>>>>>>>>>>>
>>>>>>>>>>> Hope that's helpful,
>>>>>>>>>>> -Dan
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 13, 2021 at 3:50 PM Lian Jiang <
>>>>>>>>>>> jiangok2...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I try to create an iceberg table on minio s3 and hive.
>>>>>>>>>>>>
>>>>>>>>>>>> *This is how I launch spark-shell:*
>>>>>>>>>>>>
>>>>>>>>>>>> # add Iceberg dependency
>>>>>>>>>>>> export AWS_REGION=us-east-1
>>>>>>>>>>>> export AWS_ACCESS_KEY_ID=minio
>>>>>>>>>>>> export AWS_SECRET_ACCESS_KEY=minio123
>>>>>>>>>>>>
>>>>>>>>>>>> ICEBERG_VERSION=0.11.1
>>>>>>>>>>>>
>>>>>>>>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION"
>>>>>>>>>>>>
>>>>>>>>>>>> MINIOSERVER=192.168.160.5
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> # add AWS dependnecy
>>>>>>>>>>>> AWS_SDK_VERSION=2.15.40
>>>>>>>>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk
>>>>>>>>>>>> AWS_PACKAGES=(
>>>>>>>>>>>>     "bundle"
>>>>>>>>>>>>     "url-connection-client"
>>>>>>>>>>>> )
>>>>>>>>>>>> for pkg in "${AWS_PACKAGES[@]}"; do
>>>>>>>>>>>>     DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
>>>>>>>>>>>> done
>>>>>>>>>>>>
>>>>>>>>>>>> # start Spark SQL client shell
>>>>>>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>>>>>>>>     --conf
>>>>>>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>>>>>>>>     --conf
>>>>>>>>>>>> spark.sql.catalog.hive_test.warehouse=s3a://east/prefix \
>>>>>>>>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>>>>>>>>     --conf
>>>>>>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO
>>>>>>>>>>>>  \
>>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000
>>>>>>>>>>>> \
>>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>>>>>>>>>>     --conf
>>>>>>>>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>>>>>>>>>>
>>>>>>>>>>>> *Here is the spark code to create the iceberg table:*
>>>>>>>>>>>>
>>>>>>>>>>>> import org.apache.spark.sql.SparkSession
>>>>>>>>>>>> val values = List(1,2,3,4,5)
>>>>>>>>>>>>
>>>>>>>>>>>> val spark = SparkSession.builder().master("local").getOrCreate()
>>>>>>>>>>>> import spark.implicits._
>>>>>>>>>>>> val df = values.toDF()
>>>>>>>>>>>>
>>>>>>>>>>>> val core = "mytable8"
>>>>>>>>>>>> val table = s"hive_test.mydb.${core}"
>>>>>>>>>>>> val s3IcePath = s"s3a://spark-test/${core}.ice"
>>>>>>>>>>>>
>>>>>>>>>>>> df.writeTo(table)
>>>>>>>>>>>>     .tableProperty("write.format.default", "parquet")
>>>>>>>>>>>>     .tableProperty("location", s3IcePath)
>>>>>>>>>>>>     .createOrReplace()
>>>>>>>>>>>>
>>>>>>>>>>>> I got an error "The AWS Access Key Id you provided does not
>>>>>>>>>>>> exist in our records.".
>>>>>>>>>>>>
>>>>>>>>>>>> I have verified that I can login minio UI using the same
>>>>>>>>>>>> username and password that I passed to spark-shell via 
>>>>>>>>>>>> AWS_ACCESS_KEY_ID
>>>>>>>>>>>> and AWS_SECRET_ACCESS_KEY env variables.
>>>>>>>>>>>> https://github.com/apache/iceberg/issues/2168 is related but
>>>>>>>>>>>> does not help me. Not sure why the credential does not work for 
>>>>>>>>>>>> iceberg +
>>>>>>>>>>>> AWS. Any idea or an example of writing an iceberg table to S3 
>>>>>>>>>>>> using hive
>>>>>>>>>>>> catalog will be highly appreciated! Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Create your own email signature
>>>>>>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Create your own email signature
>>>>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Create your own email signature
>>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Create your own email signature
>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>
>
> --
>
> Create your own email signature
> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>

Re: create iceberg on minio s3 got "The AWS Access Key Id you provided does not exist in our records."

Reply via email to