Hey Lian, At a cursory glance, it appears that you might be mixing two different FileIO implementations, which may be why you are not getting the expected result.
When you set: --conf spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO you're actually switching over to the native S3 implementation within Iceberg (as opposed to S3AFileSystem via HadoopFileIO). However, all of the following settings to setup access are then set for the S3AFileSystem (which would not be used with S3FileIO). You might try just removing that line since it should use the HadoopFileIO at that point and may work. Hope that's helpful, -Dan On Fri, Aug 13, 2021 at 3:50 PM Lian Jiang <jiangok2...@gmail.com> wrote: > Hi, > > I try to create an iceberg table on minio s3 and hive. > > *This is how I launch spark-shell:* > > # add Iceberg dependency > export AWS_REGION=us-east-1 > export AWS_ACCESS_KEY_ID=minio > export AWS_SECRET_ACCESS_KEY=minio123 > > ICEBERG_VERSION=0.11.1 > DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION" > > MINIOSERVER=192.168.160.5 > > > # add AWS dependnecy > AWS_SDK_VERSION=2.15.40 > AWS_MAVEN_GROUP=software.amazon.awssdk > AWS_PACKAGES=( > "bundle" > "url-connection-client" > ) > for pkg in "${AWS_PACKAGES[@]}"; do > DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION" > done > > # start Spark SQL client shell > /spark/bin/spark-shell --packages $DEPENDENCIES \ > --conf > spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ > --conf spark.sql.catalog.hive_test.warehouse=s3a://east/prefix \ > --conf spark.sql.catalog.hive_test.type=hive \ > --conf > spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \ > --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \ > --conf spark.hadoop.fs.s3a.access.key=minio \ > --conf spark.hadoop.fs.s3a.secret.key=minio123 \ > --conf spark.hadoop.fs.s3a.path.style.access=true \ > --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem > > *Here is the spark code to create the iceberg table:* > > import org.apache.spark.sql.SparkSession > val values = List(1,2,3,4,5) > > val spark = SparkSession.builder().master("local").getOrCreate() > import spark.implicits._ > val df = values.toDF() > > val core = "mytable8" > val table = s"hive_test.mydb.${core}" > val s3IcePath = s"s3a://spark-test/${core}.ice" > > df.writeTo(table) > .tableProperty("write.format.default", "parquet") > .tableProperty("location", s3IcePath) > .createOrReplace() > > I got an error "The AWS Access Key Id you provided does not exist in our > records.". > > I have verified that I can login minio UI using the same username and > password that I passed to spark-shell via AWS_ACCESS_KEY_ID and > AWS_SECRET_ACCESS_KEY env variables. > https://github.com/apache/iceberg/issues/2168 is related but does not > help me. Not sure why the credential does not work for iceberg + AWS. Any > idea or an example of writing an iceberg table to S3 using hive catalog > will be highly appreciated! Thanks. > > >