Jack, You are right. S3FileIO will not work on minio since minio does not support ACL: https://docs.min.io/docs/minio-server-limits-per-tenant.html
To use iceberg, minio + s3a, I used below script to launch spark-shell: /spark/bin/spark-shell --packages $DEPENDENCIES \ --conf spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.hive_test.type=hive \ * --conf spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.hadoop.HadoopFileIO \* --conf spark.sql.catalog.hive_test.warehouse=s3a://east/warehouse \ --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \ --conf spark.hadoop.fs.s3a.access.key=minio \ --conf spark.hadoop.fs.s3a.secret.key=minio123 \ --conf spark.hadoop.fs.s3a.path.style.access=true \ --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem *The spark code:* import org.apache.spark.sql.SparkSession val values = List(1,2,3,4,5) val spark = SparkSession.builder().master("local").getOrCreate() import spark.implicits._ val df = values.toDF() val core = "mytable" val table = s"hive_test.mydb.${core}" val s3IcePath = s"s3a://east/${core}.ice" df.writeTo(table) .tableProperty("write.format.default", "parquet") .tableProperty("location", s3IcePath) .createOrReplace() *Still the same error:* java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found What else could be wrong? Thanks for any clue. On Mon, Aug 16, 2021 at 9:35 AM Jack Ye <yezhao...@gmail.com> wrote: > Sorry for the late reply, I thought I replied on Friday but the email did > not send successfully. > > As Daniel said, you don't need to setup S3A if you are using S3FileIO. > > Th S3FileIO by default reads the default credentials chain to check > credential setups one by one: > https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html#credentials-chain > > If you would like to use a specialized credential provider, you can > directly customize your S3 client: > https://iceberg.apache.org/aws/#aws-client-customization > > It looks like you are trying to use MinIO to mount S3A file system? If you > have to use MinIO then there is not a way to integrate with S3FileIO right > now. (maybe I am wrong on this, I don't know much about MinIO) > > To directly use S3FileIO with HiveCatalog, simply do: > > /spark/bin/spark-shell --packages $DEPENDENCIES \ > --conf > spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ > --conf spark.sql.catalog.hive_test.type=hive \ > --conf > spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \ > --conf spark.sql.catalog.hive_test.warehouse=s3://bucket > > Best, > Jack Ye > > > > On Sun, Aug 15, 2021 at 2:53 PM Lian Jiang <jiangok2...@gmail.com> wrote: > >> Thanks. I prefer S3FileIO as it is recommended by iceberg. Do you have a >> sample using hive catalog, s3FileIO, spark API (as opposed to SQL), S3 >> access.key and secret.key? It is hard to get all settings right for this >> combination without an example. Appreciate any help. >> >> On Fri, Aug 13, 2021 at 6:01 PM Daniel Weeks <daniel.c.we...@gmail.com> >> wrote: >> >>> So, if I recall correctly, the hive server does need access to check and >>> create paths for table locations. >>> >>> There may be an option to disable this behavior, but otherwise the fs >>> implementation probably needs to be available to the hive metastore. >>> >>> -Dan >>> >>> On Fri, Aug 13, 2021, 4:48 PM Lian Jiang <jiangok2...@gmail.com> wrote: >>> >>>> Thanks Daniel. >>>> >>>> After modifying the script to, >>>> >>>> export AWS_REGION=us-east-1 >>>> export AWS_ACCESS_KEY_ID=minio >>>> export AWS_SECRET_ACCESS_KEY=minio123 >>>> >>>> ICEBERG_VERSION=0.11.1 >>>> >>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0" >>>> >>>> MINIOSERVER=192.168.160.5 >>>> >>>> >>>> # add AWS dependnecy >>>> AWS_SDK_VERSION=2.15.40 >>>> AWS_MAVEN_GROUP=software.amazon.awssdk >>>> AWS_PACKAGES=( >>>> "bundle" >>>> "url-connection-client" >>>> ) >>>> for pkg in "${AWS_PACKAGES[@]}"; do >>>> DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION" >>>> done >>>> >>>> # start Spark SQL client shell >>>> /spark/bin/spark-shell --packages $DEPENDENCIES \ >>>> --conf >>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >>>> --conf spark.sql.catalog.hive_test.type=hive \ >>>> --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \ >>>> --conf spark.hadoop.fs.s3a.access.key=minio \ >>>> --conf spark.hadoop.fs.s3a.secret.key=minio123 \ >>>> --conf spark.hadoop.fs.s3a.path.style.access=true \ >>>> --conf >>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem >>>> >>>> I got: MetaException: java.lang.RuntimeException: >>>> java.lang.ClassNotFoundException: Class >>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found. My hive server is not >>>> using s3 and should not cause this error. Any ideas? Thanks. >>>> >>>> >>>> I got "ClassNotFoundException: Class >>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found". Any idea what dependency >>>> could I miss? >>>> >>>> On Fri, Aug 13, 2021 at 4:03 PM Daniel Weeks <daniel.c.we...@gmail.com> >>>> wrote: >>>> >>>>> Hey Lian, >>>>> >>>>> At a cursory glance, it appears that you might be mixing two different >>>>> FileIO implementations, which may be why you are not getting the expected >>>>> result. >>>>> >>>>> When you set: --conf >>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO >>>>> you're >>>>> actually switching over to the native S3 implementation within Iceberg (as >>>>> opposed to S3AFileSystem via HadoopFileIO). However, all of the following >>>>> settings to setup access are then set for the S3AFileSystem (which would >>>>> not be used with S3FileIO). >>>>> >>>>> You might try just removing that line since it should use the >>>>> HadoopFileIO at that point and may work. >>>>> >>>>> Hope that's helpful, >>>>> -Dan >>>>> >>>>> On Fri, Aug 13, 2021 at 3:50 PM Lian Jiang <jiangok2...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I try to create an iceberg table on minio s3 and hive. >>>>>> >>>>>> *This is how I launch spark-shell:* >>>>>> >>>>>> # add Iceberg dependency >>>>>> export AWS_REGION=us-east-1 >>>>>> export AWS_ACCESS_KEY_ID=minio >>>>>> export AWS_SECRET_ACCESS_KEY=minio123 >>>>>> >>>>>> ICEBERG_VERSION=0.11.1 >>>>>> >>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION" >>>>>> >>>>>> MINIOSERVER=192.168.160.5 >>>>>> >>>>>> >>>>>> # add AWS dependnecy >>>>>> AWS_SDK_VERSION=2.15.40 >>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk >>>>>> AWS_PACKAGES=( >>>>>> "bundle" >>>>>> "url-connection-client" >>>>>> ) >>>>>> for pkg in "${AWS_PACKAGES[@]}"; do >>>>>> DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION" >>>>>> done >>>>>> >>>>>> # start Spark SQL client shell >>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \ >>>>>> --conf >>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >>>>>> --conf spark.sql.catalog.hive_test.warehouse=s3a://east/prefix \ >>>>>> --conf spark.sql.catalog.hive_test.type=hive \ >>>>>> --conf >>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \ >>>>>> --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \ >>>>>> --conf spark.hadoop.fs.s3a.access.key=minio \ >>>>>> --conf spark.hadoop.fs.s3a.secret.key=minio123 \ >>>>>> --conf spark.hadoop.fs.s3a.path.style.access=true \ >>>>>> --conf >>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem >>>>>> >>>>>> *Here is the spark code to create the iceberg table:* >>>>>> >>>>>> import org.apache.spark.sql.SparkSession >>>>>> val values = List(1,2,3,4,5) >>>>>> >>>>>> val spark = SparkSession.builder().master("local").getOrCreate() >>>>>> import spark.implicits._ >>>>>> val df = values.toDF() >>>>>> >>>>>> val core = "mytable8" >>>>>> val table = s"hive_test.mydb.${core}" >>>>>> val s3IcePath = s"s3a://spark-test/${core}.ice" >>>>>> >>>>>> df.writeTo(table) >>>>>> .tableProperty("write.format.default", "parquet") >>>>>> .tableProperty("location", s3IcePath) >>>>>> .createOrReplace() >>>>>> >>>>>> I got an error "The AWS Access Key Id you provided does not exist in >>>>>> our records.". >>>>>> >>>>>> I have verified that I can login minio UI using the same username and >>>>>> password that I passed to spark-shell via AWS_ACCESS_KEY_ID and >>>>>> AWS_SECRET_ACCESS_KEY env variables. >>>>>> https://github.com/apache/iceberg/issues/2168 is related but does >>>>>> not help me. Not sure why the credential does not work for iceberg + AWS. >>>>>> Any idea or an example of writing an iceberg table to S3 using hive >>>>>> catalog >>>>>> will be highly appreciated! Thanks. >>>>>> >>>>>> >>>>>> >>>> >>>> -- >>>> >>>> Create your own email signature >>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592> >>>> >>> >> >> -- >> >> Create your own email signature >> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592> >> > -- Create your own email signature <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>