Thanks Daniel. After modifying the script to,
export AWS_REGION=us-east-1 export AWS_ACCESS_KEY_ID=minio export AWS_SECRET_ACCESS_KEY=minio123 ICEBERG_VERSION=0.11.1 DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0" MINIOSERVER=192.168.160.5 # add AWS dependnecy AWS_SDK_VERSION=2.15.40 AWS_MAVEN_GROUP=software.amazon.awssdk AWS_PACKAGES=( "bundle" "url-connection-client" ) for pkg in "${AWS_PACKAGES[@]}"; do DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION" done # start Spark SQL client shell /spark/bin/spark-shell --packages $DEPENDENCIES \ --conf spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.hive_test.type=hive \ --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \ --conf spark.hadoop.fs.s3a.access.key=minio \ --conf spark.hadoop.fs.s3a.secret.key=minio123 \ --conf spark.hadoop.fs.s3a.path.style.access=true \ --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem I got: MetaException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found. My hive server is not using s3 and should not cause this error. Any ideas? Thanks. I got "ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found". Any idea what dependency could I miss? On Fri, Aug 13, 2021 at 4:03 PM Daniel Weeks <daniel.c.we...@gmail.com> wrote: > Hey Lian, > > At a cursory glance, it appears that you might be mixing two different > FileIO implementations, which may be why you are not getting the expected > result. > > When you set: --conf > spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO you're > actually switching over to the native S3 implementation within Iceberg (as > opposed to S3AFileSystem via HadoopFileIO). However, all of the following > settings to setup access are then set for the S3AFileSystem (which would > not be used with S3FileIO). > > You might try just removing that line since it should use the HadoopFileIO > at that point and may work. > > Hope that's helpful, > -Dan > > On Fri, Aug 13, 2021 at 3:50 PM Lian Jiang <jiangok2...@gmail.com> wrote: > >> Hi, >> >> I try to create an iceberg table on minio s3 and hive. >> >> *This is how I launch spark-shell:* >> >> # add Iceberg dependency >> export AWS_REGION=us-east-1 >> export AWS_ACCESS_KEY_ID=minio >> export AWS_SECRET_ACCESS_KEY=minio123 >> >> ICEBERG_VERSION=0.11.1 >> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION" >> >> MINIOSERVER=192.168.160.5 >> >> >> # add AWS dependnecy >> AWS_SDK_VERSION=2.15.40 >> AWS_MAVEN_GROUP=software.amazon.awssdk >> AWS_PACKAGES=( >> "bundle" >> "url-connection-client" >> ) >> for pkg in "${AWS_PACKAGES[@]}"; do >> DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION" >> done >> >> # start Spark SQL client shell >> /spark/bin/spark-shell --packages $DEPENDENCIES \ >> --conf >> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \ >> --conf spark.sql.catalog.hive_test.warehouse=s3a://east/prefix \ >> --conf spark.sql.catalog.hive_test.type=hive \ >> --conf >> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \ >> --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \ >> --conf spark.hadoop.fs.s3a.access.key=minio \ >> --conf spark.hadoop.fs.s3a.secret.key=minio123 \ >> --conf spark.hadoop.fs.s3a.path.style.access=true \ >> --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem >> >> *Here is the spark code to create the iceberg table:* >> >> import org.apache.spark.sql.SparkSession >> val values = List(1,2,3,4,5) >> >> val spark = SparkSession.builder().master("local").getOrCreate() >> import spark.implicits._ >> val df = values.toDF() >> >> val core = "mytable8" >> val table = s"hive_test.mydb.${core}" >> val s3IcePath = s"s3a://spark-test/${core}.ice" >> >> df.writeTo(table) >> .tableProperty("write.format.default", "parquet") >> .tableProperty("location", s3IcePath) >> .createOrReplace() >> >> I got an error "The AWS Access Key Id you provided does not exist in our >> records.". >> >> I have verified that I can login minio UI using the same username and >> password that I passed to spark-shell via AWS_ACCESS_KEY_ID and >> AWS_SECRET_ACCESS_KEY env variables. >> https://github.com/apache/iceberg/issues/2168 is related but does not >> help me. Not sure why the credential does not work for iceberg + AWS. Any >> idea or an example of writing an iceberg table to S3 using hive catalog >> will be highly appreciated! Thanks. >> >> >> -- Create your own email signature <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>