Re: create iceberg on minio s3 got "The AWS Access Key Id you provided does not exist in our records."

Lian Jiang Tue, 17 Aug 2021 11:36:17 -0700

Hi Ryan,

S3FileIO need canned ACL according to:


  /**
   * Used to configure canned access control list (ACL) for S3 client to
use during write.
   * If not set, ACL will not be set for requests.
   * <p>
   * The input must be one of {@link
software.amazon.awssdk.services.s3.model.ObjectCannedACL},
   * such as 'public-read-write'
   * For more details:
https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html
   */
  public static final String S3FILEIO_ACL = "s3.acl";


Minio does not support canned ACL according to
https://docs.min.io/docs/minio-server-limits-per-tenant.html:

List of Amazon S3 Bucket API's not supported on MinIO

   - BucketACL (Use bucket policies
   <https://docs.min.io/docs/minio-client-complete-guide#policy> instead)
   - BucketCORS (CORS enabled by default on all buckets for all HTTP verbs)
   - BucketWebsite (Use caddy <https://github.com/caddyserver/caddy> or
   nginx <https://www.nginx.com/resources/wiki/>)
   - BucketAnalytics, BucketMetrics, BucketLogging (Use bucket notification
   <https://docs.min.io/docs/minio-client-complete-guide#events> APIs)
   - BucketRequestPayment

List of Amazon S3 Object API's not supported on MinIO

   - ObjectACL (Use bucket policies
   <https://docs.min.io/docs/minio-client-complete-guide#policy> instead)
   - ObjectTorrent



Hope this makes sense.

BTW, iceberg + Hive + S3A works after Hive using S3A issue has been fixed.
Thanks Jack for helping debugging.



On Tue, Aug 17, 2021 at 8:38 AM Ryan Blue <[email protected]> wrote:

> I'm not sure that I'm following why MinIO won't work with S3FileIO.
> S3FileIO assumes that the credentials are handled by a credentials provider
> outside of S3FileIO. How does MinIO handle credentials?
>
> Ryan
>
> On Mon, Aug 16, 2021 at 7:57 PM Jack Ye <[email protected]> wrote:
>
>> Talked with Lian on Slack, the user is using a hadoop 3.2.1 + hive
>> (postgres) + spark + minio docker installation. There might be some S3A
>> related dependencies missing on the Hive server side based on the stack
>> trace. Let's see if that fixes the issue.
>> -Jack
>>
>> On Mon, Aug 16, 2021 at 7:32 PM Lian Jiang <[email protected]> wrote:
>>
>>> This is my full script launching spark-shell:
>>>
>>> # add Iceberg dependency
>>> export AWS_REGION=us-east-1
>>> export AWS_ACCESS_KEY_ID=minio
>>> export AWS_SECRET_ACCESS_KEY=minio123
>>>
>>> ICEBERG_VERSION=0.11.1
>>>
>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0"
>>>
>>> MINIOSERVER=192.168.176.5
>>>
>>>
>>> # add AWS dependnecy
>>> AWS_SDK_VERSION=2.15.40
>>> AWS_MAVEN_GROUP=software.amazon.awssdk
>>> AWS_PACKAGES=(
>>>     "bundle"
>>>     "url-connection-client"
>>> )
>>> for pkg in "${AWS_PACKAGES[@]}"; do
>>>     DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
>>> done
>>>
>>> # start Spark SQL client shell
>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>     --conf
>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>     --conf
>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.hadoop.HadoopFileIO \
>>>     --conf spark.sql.catalog.hive_test.warehouse=s3a://east/warehouse \
>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \
>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>     --conf
>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>
>>>
>>> Let me know if anything is missing. Thanks.
>>>
>>> On Mon, Aug 16, 2021 at 7:29 PM Jack Ye <[email protected]> wrote:
>>>
>>>> Have you included the hadoop-aws jar?
>>>> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws
>>>> -Jack
>>>>
>>>> On Mon, Aug 16, 2021 at 7:09 PM Lian Jiang <[email protected]>
>>>> wrote:
>>>>
>>>>> Jack,
>>>>>
>>>>> You are right. S3FileIO will not work on minio since minio does not
>>>>> support ACL:
>>>>> https://docs.min.io/docs/minio-server-limits-per-tenant.html
>>>>>
>>>>> To use iceberg, minio + s3a, I used below script to launch spark-shell:
>>>>>
>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>     --conf
>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>> *    --conf
>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.hadoop.HadoopFileIO
>>>>> \*
>>>>>     --conf spark.sql.catalog.hive_test.warehouse=s3a://east/warehouse \
>>>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \
>>>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>>>     --conf
>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>>>
>>>>>
>>>>>
>>>>> *The spark code:*
>>>>>
>>>>> import org.apache.spark.sql.SparkSession
>>>>> val values = List(1,2,3,4,5)
>>>>>
>>>>> val spark = SparkSession.builder().master("local").getOrCreate()
>>>>> import spark.implicits._
>>>>> val df = values.toDF()
>>>>>
>>>>> val core = "mytable"
>>>>> val table = s"hive_test.mydb.${core}"
>>>>> val s3IcePath = s"s3a://east/${core}.ice"
>>>>>
>>>>> df.writeTo(table)
>>>>>     .tableProperty("write.format.default", "parquet")
>>>>>     .tableProperty("location", s3IcePath)
>>>>>     .createOrReplace()
>>>>>
>>>>>
>>>>> *Still the same error:*
>>>>> java.lang.ClassNotFoundException: Class
>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>
>>>>>
>>>>> What else could be wrong? Thanks for any clue.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 16, 2021 at 9:35 AM Jack Ye <[email protected]> wrote:
>>>>>
>>>>>> Sorry for the late reply, I thought I replied on Friday but the email
>>>>>> did not send successfully.
>>>>>>
>>>>>> As Daniel said, you don't need to setup S3A if you are using S3FileIO.
>>>>>>
>>>>>> Th S3FileIO by default reads the default credentials chain to check
>>>>>> credential setups one by one:
>>>>>> https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html#credentials-chain
>>>>>>
>>>>>> If you would like to use a specialized credential provider, you can
>>>>>> directly customize your S3 client:
>>>>>> https://iceberg.apache.org/aws/#aws-client-customization
>>>>>>
>>>>>> It looks like you are trying to use MinIO to mount S3A file system?
>>>>>> If you have to use MinIO then there is not a way to integrate with 
>>>>>> S3FileIO
>>>>>> right now. (maybe I am wrong on this, I don't know much about MinIO)
>>>>>>
>>>>>> To directly use S3FileIO with HiveCatalog, simply do:
>>>>>>
>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>>     --conf
>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>>     --conf
>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
>>>>>>     --conf spark.sql.catalog.hive_test.warehouse=s3://bucket
>>>>>>
>>>>>> Best,
>>>>>> Jack Ye
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Aug 15, 2021 at 2:53 PM Lian Jiang <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks. I prefer S3FileIO as it is recommended by iceberg. Do you
>>>>>>> have a sample using hive catalog, s3FileIO, spark API (as opposed to 
>>>>>>> SQL),
>>>>>>> S3 access.key and secret.key? It is hard to get all settings right for 
>>>>>>> this
>>>>>>> combination without an example. Appreciate any help.
>>>>>>>
>>>>>>> On Fri, Aug 13, 2021 at 6:01 PM Daniel Weeks <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> So, if I recall correctly, the hive server does need access to
>>>>>>>> check and create paths for table locations.
>>>>>>>>
>>>>>>>> There may be an option to disable this behavior, but otherwise the
>>>>>>>> fs implementation probably needs to be available to the hive metastore.
>>>>>>>>
>>>>>>>> -Dan
>>>>>>>>
>>>>>>>> On Fri, Aug 13, 2021, 4:48 PM Lian Jiang <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Daniel.
>>>>>>>>>
>>>>>>>>> After modifying the script to,
>>>>>>>>>
>>>>>>>>> export AWS_REGION=us-east-1
>>>>>>>>> export AWS_ACCESS_KEY_ID=minio
>>>>>>>>> export AWS_SECRET_ACCESS_KEY=minio123
>>>>>>>>>
>>>>>>>>> ICEBERG_VERSION=0.11.1
>>>>>>>>>
>>>>>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0"
>>>>>>>>>
>>>>>>>>> MINIOSERVER=192.168.160.5
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> # add AWS dependnecy
>>>>>>>>> AWS_SDK_VERSION=2.15.40
>>>>>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk
>>>>>>>>> AWS_PACKAGES=(
>>>>>>>>>     "bundle"
>>>>>>>>>     "url-connection-client"
>>>>>>>>> )
>>>>>>>>> for pkg in "${AWS_PACKAGES[@]}"; do
>>>>>>>>>     DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
>>>>>>>>> done
>>>>>>>>>
>>>>>>>>> # start Spark SQL client shell
>>>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>>>>>     --conf
>>>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>>>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \
>>>>>>>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>>>>>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>>>>>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>>>>>>>     --conf
>>>>>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>>>>>>>
>>>>>>>>> I got: MetaException: java.lang.RuntimeException:
>>>>>>>>> java.lang.ClassNotFoundException: Class
>>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found. My hive server is 
>>>>>>>>> not
>>>>>>>>> using s3 and should not cause this error. Any ideas? Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I got "ClassNotFoundException: Class
>>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found". Any idea what 
>>>>>>>>> dependency
>>>>>>>>> could I miss?
>>>>>>>>>
>>>>>>>>> On Fri, Aug 13, 2021 at 4:03 PM Daniel Weeks <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hey Lian,
>>>>>>>>>>
>>>>>>>>>> At a cursory glance, it appears that you might be mixing two
>>>>>>>>>> different FileIO implementations, which may be why you are not 
>>>>>>>>>> getting the
>>>>>>>>>> expected result.
>>>>>>>>>>
>>>>>>>>>> When you set: --conf
>>>>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO
>>>>>>>>>>  you're
>>>>>>>>>> actually switching over to the native S3 implementation within 
>>>>>>>>>> Iceberg (as
>>>>>>>>>> opposed to S3AFileSystem via HadoopFileIO).  However, all of the 
>>>>>>>>>> following
>>>>>>>>>> settings to setup access are then set for the S3AFileSystem (which 
>>>>>>>>>> would
>>>>>>>>>> not be used with S3FileIO).
>>>>>>>>>>
>>>>>>>>>> You might try just removing that line since it should use the
>>>>>>>>>> HadoopFileIO at that point and may work.
>>>>>>>>>>
>>>>>>>>>> Hope that's helpful,
>>>>>>>>>> -Dan
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 13, 2021 at 3:50 PM Lian Jiang <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I try to create an iceberg table on minio s3 and hive.
>>>>>>>>>>>
>>>>>>>>>>> *This is how I launch spark-shell:*
>>>>>>>>>>>
>>>>>>>>>>> # add Iceberg dependency
>>>>>>>>>>> export AWS_REGION=us-east-1
>>>>>>>>>>> export AWS_ACCESS_KEY_ID=minio
>>>>>>>>>>> export AWS_SECRET_ACCESS_KEY=minio123
>>>>>>>>>>>
>>>>>>>>>>> ICEBERG_VERSION=0.11.1
>>>>>>>>>>>
>>>>>>>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION"
>>>>>>>>>>>
>>>>>>>>>>> MINIOSERVER=192.168.160.5
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> # add AWS dependnecy
>>>>>>>>>>> AWS_SDK_VERSION=2.15.40
>>>>>>>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk
>>>>>>>>>>> AWS_PACKAGES=(
>>>>>>>>>>>     "bundle"
>>>>>>>>>>>     "url-connection-client"
>>>>>>>>>>> )
>>>>>>>>>>> for pkg in "${AWS_PACKAGES[@]}"; do
>>>>>>>>>>>     DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
>>>>>>>>>>> done
>>>>>>>>>>>
>>>>>>>>>>> # start Spark SQL client shell
>>>>>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>>>>>>>     --conf
>>>>>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>>>>>>>     --conf
>>>>>>>>>>> spark.sql.catalog.hive_test.warehouse=s3a://east/prefix \
>>>>>>>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>>>>>>>     --conf
>>>>>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO
>>>>>>>>>>>  \
>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000
>>>>>>>>>>> \
>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>>>>>>>>>     --conf
>>>>>>>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>>>>>>>>>
>>>>>>>>>>> *Here is the spark code to create the iceberg table:*
>>>>>>>>>>>
>>>>>>>>>>> import org.apache.spark.sql.SparkSession
>>>>>>>>>>> val values = List(1,2,3,4,5)
>>>>>>>>>>>
>>>>>>>>>>> val spark = SparkSession.builder().master("local").getOrCreate()
>>>>>>>>>>> import spark.implicits._
>>>>>>>>>>> val df = values.toDF()
>>>>>>>>>>>
>>>>>>>>>>> val core = "mytable8"
>>>>>>>>>>> val table = s"hive_test.mydb.${core}"
>>>>>>>>>>> val s3IcePath = s"s3a://spark-test/${core}.ice"
>>>>>>>>>>>
>>>>>>>>>>> df.writeTo(table)
>>>>>>>>>>>     .tableProperty("write.format.default", "parquet")
>>>>>>>>>>>     .tableProperty("location", s3IcePath)
>>>>>>>>>>>     .createOrReplace()
>>>>>>>>>>>
>>>>>>>>>>> I got an error "The AWS Access Key Id you provided does not
>>>>>>>>>>> exist in our records.".
>>>>>>>>>>>
>>>>>>>>>>> I have verified that I can login minio UI using the same
>>>>>>>>>>> username and password that I passed to spark-shell via 
>>>>>>>>>>> AWS_ACCESS_KEY_ID
>>>>>>>>>>> and AWS_SECRET_ACCESS_KEY env variables.
>>>>>>>>>>> https://github.com/apache/iceberg/issues/2168 is related but
>>>>>>>>>>> does not help me. Not sure why the credential does not work for 
>>>>>>>>>>> iceberg +
>>>>>>>>>>> AWS. Any idea or an example of writing an iceberg table to S3 using 
>>>>>>>>>>> hive
>>>>>>>>>>> catalog will be highly appreciated! Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Create your own email signature
>>>>>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Create your own email signature
>>>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Create your own email signature
>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>>>
>>>>
>>>
>>> --
>>>
>>> Create your own email signature
>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>
>>
>
> --
> Ryan Blue
> Tabular
>


-- 

Create your own email signature
<https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>

Re: create iceberg on minio s3 got "The AWS Access Key Id you provided does not exist in our records."

Reply via email to