Hi Ryan, S3FileIO need canned ACL according to:
/** * Used to configure canned access control list (ACL) for S3 client to use during write. * If not set, ACL will not be set for requests. * <p> * The input must be one of {@link software.amazon.awssdk.services.s3.model.ObjectCannedACL}, * such as 'public-read-write' * For more details: https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html */ public static final String S3FILEIO_ACL = "s3.acl"; Minio does not support canned ACL according to https://docs.min.io/docs/minio-server-limits-per-tenant.html: List of Amazon S3 Bucket API's not supported on MinIO - BucketACL (Use bucket policies <https://docs.min.io/docs/minio-client-complete-guide#policy> instead) - BucketCORS (CORS enabled by default on all buckets for all HTTP verbs) - BucketWebsite (Use caddy <https://github.com/caddyserver/caddy> or nginx <https://www.nginx.com/resources/wiki/>) - BucketAnalytics, BucketMetrics, BucketLogging (Use bucket notification <https://docs.min.io/docs/minio-client-complete-guide#events> APIs) - BucketRequestPayment List of Amazon S3 Object API's not supported on MinIO - ObjectACL (Use bucket policies <https://docs.min.io/docs/minio-client-complete-guide#policy> instead) - ObjectTorrent Hope this makes sense. BTW, iceberg + Hive + S3A works after Hive using S3A issue has been fixed. Thanks Jack for helping debugging. On Tue, Aug 17, 2021 at 8:38 AM Ryan Blue <b...@tabular.io> wrote: > I'm not sure that I'm following why MinIO won't work with S3FileIO. > S3FileIO assumes that the credentials are handled by a credentials provider > outside of S3FileIO. How does MinIO handle credentials? > > Ryan > > On Mon, Aug 16, 2021 at 7:57 PM Jack Ye <yezhao...@gmail.com> wrote: > >> Talked with Lian on Slack, the user is using a hadoop 3.2.1 + hive >> (postgres) + spark + minio docker installation. There might be some S3A >> related dependencies missing on the Hive server side based on the stack >> trace. Let's see if that fixes the issue. >> -Jack >> >> On Mon, Aug 16, 2021 at 7:32 PM Lian Jiang <jiangok2...@gmail.com> wrote: >> >>> This is my full script launching spark-shell: >>> >>> # add Iceberg dependency >>> export AWS_REGION=us-east-1 >>> export AWS_ACCESS_KEY_ID=minio >>> export AWS_SECRET_ACCESS_KEY=minio123 >>> >>> ICEBERG_VERSION=0.11.1 >>> >>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0" >>> >>> MINIOSERVER=192.168.176.5 >>> >>> >>> # add AWS dependnecy >>> AWS_SDK_VERSION=2.15.40 >>> AWS_MAVEN_GROUP=software.amazon.awssdk >>> AWS_PACKAGES=( >>> "bundle" >>> "url-connection-client" >>> ) >>> for pkg in "${AWS_PACKAGES[@]}"; do >>> DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION" >>> done >>> >>> # start Spark SQL client shell >>> /spark/bin/spark-shell --packages $DEPENDENCIES \ >>> --conf >>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >>> --conf spark.sql.catalog.hive_test.type=hive \ >>> --conf >>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.hadoop.HadoopFileIO \ >>> --conf spark.sql.catalog.hive_test.warehouse=s3a://east/warehouse \ >>> --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \ >>> --conf spark.hadoop.fs.s3a.access.key=minio \ >>> --conf spark.hadoop.fs.s3a.secret.key=minio123 \ >>> --conf spark.hadoop.fs.s3a.path.style.access=true \ >>> --conf >>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem >>> >>> >>> Let me know if anything is missing. Thanks. >>> >>> On Mon, Aug 16, 2021 at 7:29 PM Jack Ye <yezhao...@gmail.com> wrote: >>> >>>> Have you included the hadoop-aws jar? >>>> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws >>>> -Jack >>>> >>>> On Mon, Aug 16, 2021 at 7:09 PM Lian Jiang <jiangok2...@gmail.com> >>>> wrote: >>>> >>>>> Jack, >>>>> >>>>> You are right. S3FileIO will not work on minio since minio does not >>>>> support ACL: >>>>> https://docs.min.io/docs/minio-server-limits-per-tenant.html >>>>> >>>>> To use iceberg, minio + s3a, I used below script to launch spark-shell: >>>>> >>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \ >>>>> --conf >>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >>>>> --conf spark.sql.catalog.hive_test.type=hive \ >>>>> * --conf >>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.hadoop.HadoopFileIO >>>>> \* >>>>> --conf spark.sql.catalog.hive_test.warehouse=s3a://east/warehouse \ >>>>> --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \ >>>>> --conf spark.hadoop.fs.s3a.access.key=minio \ >>>>> --conf spark.hadoop.fs.s3a.secret.key=minio123 \ >>>>> --conf spark.hadoop.fs.s3a.path.style.access=true \ >>>>> --conf >>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem >>>>> >>>>> >>>>> >>>>> *The spark code:* >>>>> >>>>> import org.apache.spark.sql.SparkSession >>>>> val values = List(1,2,3,4,5) >>>>> >>>>> val spark = SparkSession.builder().master("local").getOrCreate() >>>>> import spark.implicits._ >>>>> val df = values.toDF() >>>>> >>>>> val core = "mytable" >>>>> val table = s"hive_test.mydb.${core}" >>>>> val s3IcePath = s"s3a://east/${core}.ice" >>>>> >>>>> df.writeTo(table) >>>>> .tableProperty("write.format.default", "parquet") >>>>> .tableProperty("location", s3IcePath) >>>>> .createOrReplace() >>>>> >>>>> >>>>> *Still the same error:* >>>>> java.lang.ClassNotFoundException: Class >>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found >>>>> >>>>> >>>>> What else could be wrong? Thanks for any clue. >>>>> >>>>> >>>>> >>>>> On Mon, Aug 16, 2021 at 9:35 AM Jack Ye <yezhao...@gmail.com> wrote: >>>>> >>>>>> Sorry for the late reply, I thought I replied on Friday but the email >>>>>> did not send successfully. >>>>>> >>>>>> As Daniel said, you don't need to setup S3A if you are using S3FileIO. >>>>>> >>>>>> Th S3FileIO by default reads the default credentials chain to check >>>>>> credential setups one by one: >>>>>> https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html#credentials-chain >>>>>> >>>>>> If you would like to use a specialized credential provider, you can >>>>>> directly customize your S3 client: >>>>>> https://iceberg.apache.org/aws/#aws-client-customization >>>>>> >>>>>> It looks like you are trying to use MinIO to mount S3A file system? >>>>>> If you have to use MinIO then there is not a way to integrate with >>>>>> S3FileIO >>>>>> right now. (maybe I am wrong on this, I don't know much about MinIO) >>>>>> >>>>>> To directly use S3FileIO with HiveCatalog, simply do: >>>>>> >>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \ >>>>>> --conf >>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >>>>>> --conf spark.sql.catalog.hive_test.type=hive \ >>>>>> --conf >>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \ >>>>>> --conf spark.sql.catalog.hive_test.warehouse=s3://bucket >>>>>> >>>>>> Best, >>>>>> Jack Ye >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Aug 15, 2021 at 2:53 PM Lian Jiang <jiangok2...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks. I prefer S3FileIO as it is recommended by iceberg. Do you >>>>>>> have a sample using hive catalog, s3FileIO, spark API (as opposed to >>>>>>> SQL), >>>>>>> S3 access.key and secret.key? It is hard to get all settings right for >>>>>>> this >>>>>>> combination without an example. Appreciate any help. >>>>>>> >>>>>>> On Fri, Aug 13, 2021 at 6:01 PM Daniel Weeks < >>>>>>> daniel.c.we...@gmail.com> wrote: >>>>>>> >>>>>>>> So, if I recall correctly, the hive server does need access to >>>>>>>> check and create paths for table locations. >>>>>>>> >>>>>>>> There may be an option to disable this behavior, but otherwise the >>>>>>>> fs implementation probably needs to be available to the hive metastore. >>>>>>>> >>>>>>>> -Dan >>>>>>>> >>>>>>>> On Fri, Aug 13, 2021, 4:48 PM Lian Jiang <jiangok2...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks Daniel. >>>>>>>>> >>>>>>>>> After modifying the script to, >>>>>>>>> >>>>>>>>> export AWS_REGION=us-east-1 >>>>>>>>> export AWS_ACCESS_KEY_ID=minio >>>>>>>>> export AWS_SECRET_ACCESS_KEY=minio123 >>>>>>>>> >>>>>>>>> ICEBERG_VERSION=0.11.1 >>>>>>>>> >>>>>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0" >>>>>>>>> >>>>>>>>> MINIOSERVER=192.168.160.5 >>>>>>>>> >>>>>>>>> >>>>>>>>> # add AWS dependnecy >>>>>>>>> AWS_SDK_VERSION=2.15.40 >>>>>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk >>>>>>>>> AWS_PACKAGES=( >>>>>>>>> "bundle" >>>>>>>>> "url-connection-client" >>>>>>>>> ) >>>>>>>>> for pkg in "${AWS_PACKAGES[@]}"; do >>>>>>>>> DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION" >>>>>>>>> done >>>>>>>>> >>>>>>>>> # start Spark SQL client shell >>>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \ >>>>>>>>> --conf >>>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >>>>>>>>> --conf spark.sql.catalog.hive_test.type=hive \ >>>>>>>>> --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \ >>>>>>>>> --conf spark.hadoop.fs.s3a.access.key=minio \ >>>>>>>>> --conf spark.hadoop.fs.s3a.secret.key=minio123 \ >>>>>>>>> --conf spark.hadoop.fs.s3a.path.style.access=true \ >>>>>>>>> --conf >>>>>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem >>>>>>>>> >>>>>>>>> I got: MetaException: java.lang.RuntimeException: >>>>>>>>> java.lang.ClassNotFoundException: Class >>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found. My hive server is >>>>>>>>> not >>>>>>>>> using s3 and should not cause this error. Any ideas? Thanks. >>>>>>>>> >>>>>>>>> >>>>>>>>> I got "ClassNotFoundException: Class >>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found". Any idea what >>>>>>>>> dependency >>>>>>>>> could I miss? >>>>>>>>> >>>>>>>>> On Fri, Aug 13, 2021 at 4:03 PM Daniel Weeks < >>>>>>>>> daniel.c.we...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hey Lian, >>>>>>>>>> >>>>>>>>>> At a cursory glance, it appears that you might be mixing two >>>>>>>>>> different FileIO implementations, which may be why you are not >>>>>>>>>> getting the >>>>>>>>>> expected result. >>>>>>>>>> >>>>>>>>>> When you set: --conf >>>>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO >>>>>>>>>> you're >>>>>>>>>> actually switching over to the native S3 implementation within >>>>>>>>>> Iceberg (as >>>>>>>>>> opposed to S3AFileSystem via HadoopFileIO). However, all of the >>>>>>>>>> following >>>>>>>>>> settings to setup access are then set for the S3AFileSystem (which >>>>>>>>>> would >>>>>>>>>> not be used with S3FileIO). >>>>>>>>>> >>>>>>>>>> You might try just removing that line since it should use the >>>>>>>>>> HadoopFileIO at that point and may work. >>>>>>>>>> >>>>>>>>>> Hope that's helpful, >>>>>>>>>> -Dan >>>>>>>>>> >>>>>>>>>> On Fri, Aug 13, 2021 at 3:50 PM Lian Jiang <jiangok2...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I try to create an iceberg table on minio s3 and hive. >>>>>>>>>>> >>>>>>>>>>> *This is how I launch spark-shell:* >>>>>>>>>>> >>>>>>>>>>> # add Iceberg dependency >>>>>>>>>>> export AWS_REGION=us-east-1 >>>>>>>>>>> export AWS_ACCESS_KEY_ID=minio >>>>>>>>>>> export AWS_SECRET_ACCESS_KEY=minio123 >>>>>>>>>>> >>>>>>>>>>> ICEBERG_VERSION=0.11.1 >>>>>>>>>>> >>>>>>>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION" >>>>>>>>>>> >>>>>>>>>>> MINIOSERVER=192.168.160.5 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> # add AWS dependnecy >>>>>>>>>>> AWS_SDK_VERSION=2.15.40 >>>>>>>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk >>>>>>>>>>> AWS_PACKAGES=( >>>>>>>>>>> "bundle" >>>>>>>>>>> "url-connection-client" >>>>>>>>>>> ) >>>>>>>>>>> for pkg in "${AWS_PACKAGES[@]}"; do >>>>>>>>>>> DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION" >>>>>>>>>>> done >>>>>>>>>>> >>>>>>>>>>> # start Spark SQL client shell >>>>>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \ >>>>>>>>>>> --conf >>>>>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >>>>>>>>>>> --conf >>>>>>>>>>> spark.sql.catalog.hive_test.warehouse=s3a://east/prefix \ >>>>>>>>>>> --conf spark.sql.catalog.hive_test.type=hive \ >>>>>>>>>>> --conf >>>>>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO >>>>>>>>>>> \ >>>>>>>>>>> --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 >>>>>>>>>>> \ >>>>>>>>>>> --conf spark.hadoop.fs.s3a.access.key=minio \ >>>>>>>>>>> --conf spark.hadoop.fs.s3a.secret.key=minio123 \ >>>>>>>>>>> --conf spark.hadoop.fs.s3a.path.style.access=true \ >>>>>>>>>>> --conf >>>>>>>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem >>>>>>>>>>> >>>>>>>>>>> *Here is the spark code to create the iceberg table:* >>>>>>>>>>> >>>>>>>>>>> import org.apache.spark.sql.SparkSession >>>>>>>>>>> val values = List(1,2,3,4,5) >>>>>>>>>>> >>>>>>>>>>> val spark = SparkSession.builder().master("local").getOrCreate() >>>>>>>>>>> import spark.implicits._ >>>>>>>>>>> val df = values.toDF() >>>>>>>>>>> >>>>>>>>>>> val core = "mytable8" >>>>>>>>>>> val table = s"hive_test.mydb.${core}" >>>>>>>>>>> val s3IcePath = s"s3a://spark-test/${core}.ice" >>>>>>>>>>> >>>>>>>>>>> df.writeTo(table) >>>>>>>>>>> .tableProperty("write.format.default", "parquet") >>>>>>>>>>> .tableProperty("location", s3IcePath) >>>>>>>>>>> .createOrReplace() >>>>>>>>>>> >>>>>>>>>>> I got an error "The AWS Access Key Id you provided does not >>>>>>>>>>> exist in our records.". >>>>>>>>>>> >>>>>>>>>>> I have verified that I can login minio UI using the same >>>>>>>>>>> username and password that I passed to spark-shell via >>>>>>>>>>> AWS_ACCESS_KEY_ID >>>>>>>>>>> and AWS_SECRET_ACCESS_KEY env variables. >>>>>>>>>>> https://github.com/apache/iceberg/issues/2168 is related but >>>>>>>>>>> does not help me. Not sure why the credential does not work for >>>>>>>>>>> iceberg + >>>>>>>>>>> AWS. Any idea or an example of writing an iceberg table to S3 using >>>>>>>>>>> hive >>>>>>>>>>> catalog will be highly appreciated! Thanks. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Create your own email signature >>>>>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Create your own email signature >>>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> >>>>> Create your own email signature >>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592> >>>>> >>>> >>> >>> -- >>> >>> Create your own email signature >>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592> >>> >> > > -- > Ryan Blue > Tabular > -- Create your own email signature <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>