Good to hear the issue is fixed! ACL is optional, as the javadoc says, "If not set, ACL will not be set for requests".
But I think to use MinIO you need to use a custom client factory to set your S3 endpoint as that MinIO endpoint. -Jack On Tue, Aug 17, 2021 at 11:36 AM Lian Jiang <jiangok2...@gmail.com> wrote: > Hi Ryan, > > S3FileIO need canned ACL according to: > > /** > * Used to configure canned access control list (ACL) for S3 client to > use during write. > * If not set, ACL will not be set for requests. > * <p> > * The input must be one of {@link > software.amazon.awssdk.services.s3.model.ObjectCannedACL}, > * such as 'public-read-write' > * For more details: > https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html > */ > public static final String S3FILEIO_ACL = "s3.acl"; > > > Minio does not support canned ACL according to > https://docs.min.io/docs/minio-server-limits-per-tenant.html: > > List of Amazon S3 Bucket API's not supported on MinIO > > - BucketACL (Use bucket policies > <https://docs.min.io/docs/minio-client-complete-guide#policy> instead) > - BucketCORS (CORS enabled by default on all buckets for all HTTP > verbs) > - BucketWebsite (Use caddy <https://github.com/caddyserver/caddy> or > nginx <https://www.nginx.com/resources/wiki/>) > - BucketAnalytics, BucketMetrics, BucketLogging (Use bucket > notification > <https://docs.min.io/docs/minio-client-complete-guide#events> APIs) > - BucketRequestPayment > > List of Amazon S3 Object API's not supported on MinIO > > - ObjectACL (Use bucket policies > <https://docs.min.io/docs/minio-client-complete-guide#policy> instead) > - ObjectTorrent > > > > Hope this makes sense. > > BTW, iceberg + Hive + S3A works after Hive using S3A issue has been fixed. > Thanks Jack for helping debugging. > > > > On Tue, Aug 17, 2021 at 8:38 AM Ryan Blue <b...@tabular.io> wrote: > >> I'm not sure that I'm following why MinIO won't work with S3FileIO. >> S3FileIO assumes that the credentials are handled by a credentials provider >> outside of S3FileIO. How does MinIO handle credentials? >> >> Ryan >> >> On Mon, Aug 16, 2021 at 7:57 PM Jack Ye <yezhao...@gmail.com> wrote: >> >>> Talked with Lian on Slack, the user is using a hadoop 3.2.1 + hive >>> (postgres) + spark + minio docker installation. There might be some S3A >>> related dependencies missing on the Hive server side based on the stack >>> trace. Let's see if that fixes the issue. >>> -Jack >>> >>> On Mon, Aug 16, 2021 at 7:32 PM Lian Jiang <jiangok2...@gmail.com> >>> wrote: >>> >>>> This is my full script launching spark-shell: >>>> >>>> # add Iceberg dependency >>>> export AWS_REGION=us-east-1 >>>> export AWS_ACCESS_KEY_ID=minio >>>> export AWS_SECRET_ACCESS_KEY=minio123 >>>> >>>> ICEBERG_VERSION=0.11.1 >>>> >>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0" >>>> >>>> MINIOSERVER=192.168.176.5 >>>> >>>> >>>> # add AWS dependnecy >>>> AWS_SDK_VERSION=2.15.40 >>>> AWS_MAVEN_GROUP=software.amazon.awssdk >>>> AWS_PACKAGES=( >>>> "bundle" >>>> "url-connection-client" >>>> ) >>>> for pkg in "${AWS_PACKAGES[@]}"; do >>>> DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION" >>>> done >>>> >>>> # start Spark SQL client shell >>>> /spark/bin/spark-shell --packages $DEPENDENCIES \ >>>> --conf >>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >>>> --conf spark.sql.catalog.hive_test.type=hive \ >>>> --conf >>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.hadoop.HadoopFileIO >>>> \ >>>> --conf spark.sql.catalog.hive_test.warehouse=s3a://east/warehouse \ >>>> --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \ >>>> --conf spark.hadoop.fs.s3a.access.key=minio \ >>>> --conf spark.hadoop.fs.s3a.secret.key=minio123 \ >>>> --conf spark.hadoop.fs.s3a.path.style.access=true \ >>>> --conf >>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem >>>> >>>> >>>> Let me know if anything is missing. Thanks. >>>> >>>> On Mon, Aug 16, 2021 at 7:29 PM Jack Ye <yezhao...@gmail.com> wrote: >>>> >>>>> Have you included the hadoop-aws jar? >>>>> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws >>>>> -Jack >>>>> >>>>> On Mon, Aug 16, 2021 at 7:09 PM Lian Jiang <jiangok2...@gmail.com> >>>>> wrote: >>>>> >>>>>> Jack, >>>>>> >>>>>> You are right. S3FileIO will not work on minio since minio does not >>>>>> support ACL: >>>>>> https://docs.min.io/docs/minio-server-limits-per-tenant.html >>>>>> >>>>>> To use iceberg, minio + s3a, I used below script to launch >>>>>> spark-shell: >>>>>> >>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \ >>>>>> --conf >>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >>>>>> --conf spark.sql.catalog.hive_test.type=hive \ >>>>>> * --conf >>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.hadoop.HadoopFileIO >>>>>> \* >>>>>> --conf spark.sql.catalog.hive_test.warehouse=s3a://east/warehouse >>>>>> \ >>>>>> --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \ >>>>>> --conf spark.hadoop.fs.s3a.access.key=minio \ >>>>>> --conf spark.hadoop.fs.s3a.secret.key=minio123 \ >>>>>> --conf spark.hadoop.fs.s3a.path.style.access=true \ >>>>>> --conf >>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem >>>>>> >>>>>> >>>>>> >>>>>> *The spark code:* >>>>>> >>>>>> import org.apache.spark.sql.SparkSession >>>>>> val values = List(1,2,3,4,5) >>>>>> >>>>>> val spark = SparkSession.builder().master("local").getOrCreate() >>>>>> import spark.implicits._ >>>>>> val df = values.toDF() >>>>>> >>>>>> val core = "mytable" >>>>>> val table = s"hive_test.mydb.${core}" >>>>>> val s3IcePath = s"s3a://east/${core}.ice" >>>>>> >>>>>> df.writeTo(table) >>>>>> .tableProperty("write.format.default", "parquet") >>>>>> .tableProperty("location", s3IcePath) >>>>>> .createOrReplace() >>>>>> >>>>>> >>>>>> *Still the same error:* >>>>>> java.lang.ClassNotFoundException: Class >>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found >>>>>> >>>>>> >>>>>> What else could be wrong? Thanks for any clue. >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Aug 16, 2021 at 9:35 AM Jack Ye <yezhao...@gmail.com> wrote: >>>>>> >>>>>>> Sorry for the late reply, I thought I replied on Friday but the >>>>>>> email did not send successfully. >>>>>>> >>>>>>> As Daniel said, you don't need to setup S3A if you are using >>>>>>> S3FileIO. >>>>>>> >>>>>>> Th S3FileIO by default reads the default credentials chain to check >>>>>>> credential setups one by one: >>>>>>> https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html#credentials-chain >>>>>>> >>>>>>> If you would like to use a specialized credential provider, you can >>>>>>> directly customize your S3 client: >>>>>>> https://iceberg.apache.org/aws/#aws-client-customization >>>>>>> >>>>>>> It looks like you are trying to use MinIO to mount S3A file system? >>>>>>> If you have to use MinIO then there is not a way to integrate with >>>>>>> S3FileIO >>>>>>> right now. (maybe I am wrong on this, I don't know much about MinIO) >>>>>>> >>>>>>> To directly use S3FileIO with HiveCatalog, simply do: >>>>>>> >>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \ >>>>>>> --conf >>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >>>>>>> --conf spark.sql.catalog.hive_test.type=hive \ >>>>>>> --conf >>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \ >>>>>>> --conf spark.sql.catalog.hive_test.warehouse=s3://bucket >>>>>>> >>>>>>> Best, >>>>>>> Jack Ye >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Aug 15, 2021 at 2:53 PM Lian Jiang <jiangok2...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks. I prefer S3FileIO as it is recommended by iceberg. Do you >>>>>>>> have a sample using hive catalog, s3FileIO, spark API (as opposed to >>>>>>>> SQL), >>>>>>>> S3 access.key and secret.key? It is hard to get all settings right for >>>>>>>> this >>>>>>>> combination without an example. Appreciate any help. >>>>>>>> >>>>>>>> On Fri, Aug 13, 2021 at 6:01 PM Daniel Weeks < >>>>>>>> daniel.c.we...@gmail.com> wrote: >>>>>>>> >>>>>>>>> So, if I recall correctly, the hive server does need access to >>>>>>>>> check and create paths for table locations. >>>>>>>>> >>>>>>>>> There may be an option to disable this behavior, but otherwise the >>>>>>>>> fs implementation probably needs to be available to the hive >>>>>>>>> metastore. >>>>>>>>> >>>>>>>>> -Dan >>>>>>>>> >>>>>>>>> On Fri, Aug 13, 2021, 4:48 PM Lian Jiang <jiangok2...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks Daniel. >>>>>>>>>> >>>>>>>>>> After modifying the script to, >>>>>>>>>> >>>>>>>>>> export AWS_REGION=us-east-1 >>>>>>>>>> export AWS_ACCESS_KEY_ID=minio >>>>>>>>>> export AWS_SECRET_ACCESS_KEY=minio123 >>>>>>>>>> >>>>>>>>>> ICEBERG_VERSION=0.11.1 >>>>>>>>>> >>>>>>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0" >>>>>>>>>> >>>>>>>>>> MINIOSERVER=192.168.160.5 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> # add AWS dependnecy >>>>>>>>>> AWS_SDK_VERSION=2.15.40 >>>>>>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk >>>>>>>>>> AWS_PACKAGES=( >>>>>>>>>> "bundle" >>>>>>>>>> "url-connection-client" >>>>>>>>>> ) >>>>>>>>>> for pkg in "${AWS_PACKAGES[@]}"; do >>>>>>>>>> DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION" >>>>>>>>>> done >>>>>>>>>> >>>>>>>>>> # start Spark SQL client shell >>>>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \ >>>>>>>>>> --conf >>>>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >>>>>>>>>> --conf spark.sql.catalog.hive_test.type=hive \ >>>>>>>>>> --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 >>>>>>>>>> \ >>>>>>>>>> --conf spark.hadoop.fs.s3a.access.key=minio \ >>>>>>>>>> --conf spark.hadoop.fs.s3a.secret.key=minio123 \ >>>>>>>>>> --conf spark.hadoop.fs.s3a.path.style.access=true \ >>>>>>>>>> --conf >>>>>>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem >>>>>>>>>> >>>>>>>>>> I got: MetaException: java.lang.RuntimeException: >>>>>>>>>> java.lang.ClassNotFoundException: Class >>>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found. My hive server is >>>>>>>>>> not >>>>>>>>>> using s3 and should not cause this error. Any ideas? Thanks. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I got "ClassNotFoundException: Class >>>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found". Any idea what >>>>>>>>>> dependency >>>>>>>>>> could I miss? >>>>>>>>>> >>>>>>>>>> On Fri, Aug 13, 2021 at 4:03 PM Daniel Weeks < >>>>>>>>>> daniel.c.we...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hey Lian, >>>>>>>>>>> >>>>>>>>>>> At a cursory glance, it appears that you might be mixing two >>>>>>>>>>> different FileIO implementations, which may be why you are not >>>>>>>>>>> getting the >>>>>>>>>>> expected result. >>>>>>>>>>> >>>>>>>>>>> When you set: --conf >>>>>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO >>>>>>>>>>> you're >>>>>>>>>>> actually switching over to the native S3 implementation within >>>>>>>>>>> Iceberg (as >>>>>>>>>>> opposed to S3AFileSystem via HadoopFileIO). However, all of the >>>>>>>>>>> following >>>>>>>>>>> settings to setup access are then set for the S3AFileSystem (which >>>>>>>>>>> would >>>>>>>>>>> not be used with S3FileIO). >>>>>>>>>>> >>>>>>>>>>> You might try just removing that line since it should use the >>>>>>>>>>> HadoopFileIO at that point and may work. >>>>>>>>>>> >>>>>>>>>>> Hope that's helpful, >>>>>>>>>>> -Dan >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 13, 2021 at 3:50 PM Lian Jiang < >>>>>>>>>>> jiangok2...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I try to create an iceberg table on minio s3 and hive. >>>>>>>>>>>> >>>>>>>>>>>> *This is how I launch spark-shell:* >>>>>>>>>>>> >>>>>>>>>>>> # add Iceberg dependency >>>>>>>>>>>> export AWS_REGION=us-east-1 >>>>>>>>>>>> export AWS_ACCESS_KEY_ID=minio >>>>>>>>>>>> export AWS_SECRET_ACCESS_KEY=minio123 >>>>>>>>>>>> >>>>>>>>>>>> ICEBERG_VERSION=0.11.1 >>>>>>>>>>>> >>>>>>>>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION" >>>>>>>>>>>> >>>>>>>>>>>> MINIOSERVER=192.168.160.5 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> # add AWS dependnecy >>>>>>>>>>>> AWS_SDK_VERSION=2.15.40 >>>>>>>>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk >>>>>>>>>>>> AWS_PACKAGES=( >>>>>>>>>>>> "bundle" >>>>>>>>>>>> "url-connection-client" >>>>>>>>>>>> ) >>>>>>>>>>>> for pkg in "${AWS_PACKAGES[@]}"; do >>>>>>>>>>>> DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION" >>>>>>>>>>>> done >>>>>>>>>>>> >>>>>>>>>>>> # start Spark SQL client shell >>>>>>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \ >>>>>>>>>>>> --conf >>>>>>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >>>>>>>>>>>> --conf >>>>>>>>>>>> spark.sql.catalog.hive_test.warehouse=s3a://east/prefix \ >>>>>>>>>>>> --conf spark.sql.catalog.hive_test.type=hive \ >>>>>>>>>>>> --conf >>>>>>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO >>>>>>>>>>>> \ >>>>>>>>>>>> --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 >>>>>>>>>>>> \ >>>>>>>>>>>> --conf spark.hadoop.fs.s3a.access.key=minio \ >>>>>>>>>>>> --conf spark.hadoop.fs.s3a.secret.key=minio123 \ >>>>>>>>>>>> --conf spark.hadoop.fs.s3a.path.style.access=true \ >>>>>>>>>>>> --conf >>>>>>>>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem >>>>>>>>>>>> >>>>>>>>>>>> *Here is the spark code to create the iceberg table:* >>>>>>>>>>>> >>>>>>>>>>>> import org.apache.spark.sql.SparkSession >>>>>>>>>>>> val values = List(1,2,3,4,5) >>>>>>>>>>>> >>>>>>>>>>>> val spark = SparkSession.builder().master("local").getOrCreate() >>>>>>>>>>>> import spark.implicits._ >>>>>>>>>>>> val df = values.toDF() >>>>>>>>>>>> >>>>>>>>>>>> val core = "mytable8" >>>>>>>>>>>> val table = s"hive_test.mydb.${core}" >>>>>>>>>>>> val s3IcePath = s"s3a://spark-test/${core}.ice" >>>>>>>>>>>> >>>>>>>>>>>> df.writeTo(table) >>>>>>>>>>>> .tableProperty("write.format.default", "parquet") >>>>>>>>>>>> .tableProperty("location", s3IcePath) >>>>>>>>>>>> .createOrReplace() >>>>>>>>>>>> >>>>>>>>>>>> I got an error "The AWS Access Key Id you provided does not >>>>>>>>>>>> exist in our records.". >>>>>>>>>>>> >>>>>>>>>>>> I have verified that I can login minio UI using the same >>>>>>>>>>>> username and password that I passed to spark-shell via >>>>>>>>>>>> AWS_ACCESS_KEY_ID >>>>>>>>>>>> and AWS_SECRET_ACCESS_KEY env variables. >>>>>>>>>>>> https://github.com/apache/iceberg/issues/2168 is related but >>>>>>>>>>>> does not help me. Not sure why the credential does not work for >>>>>>>>>>>> iceberg + >>>>>>>>>>>> AWS. Any idea or an example of writing an iceberg table to S3 >>>>>>>>>>>> using hive >>>>>>>>>>>> catalog will be highly appreciated! Thanks. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Create your own email signature >>>>>>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Create your own email signature >>>>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Create your own email signature >>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592> >>>>>> >>>>> >>>> >>>> -- >>>> >>>> Create your own email signature >>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592> >>>> >>> >> >> -- >> Ryan Blue >> Tabular >> > > > -- > > Create your own email signature > <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592> >