Hi,

I try to create an iceberg table on minio s3 and hive.

*This is how I launch spark-shell:*

# add Iceberg dependency
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=minio
export AWS_SECRET_ACCESS_KEY=minio123

ICEBERG_VERSION=0.11.1
DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION"

MINIOSERVER=192.168.160.5


# add AWS dependnecy
AWS_SDK_VERSION=2.15.40
AWS_MAVEN_GROUP=software.amazon.awssdk
AWS_PACKAGES=(
    "bundle"
    "url-connection-client"
)
for pkg in "${AWS_PACKAGES[@]}"; do
    DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
done

# start Spark SQL client shell
/spark/bin/spark-shell --packages $DEPENDENCIES \
    --conf
spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.hive_test.warehouse=s3a://east/prefix \
    --conf spark.sql.catalog.hive_test.type=hive  \
    --conf
spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
    --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \
    --conf spark.hadoop.fs.s3a.access.key=minio \
    --conf spark.hadoop.fs.s3a.secret.key=minio123 \
    --conf spark.hadoop.fs.s3a.path.style.access=true \
    --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem

*Here is the spark code to create the iceberg table:*

import org.apache.spark.sql.SparkSession
val values = List(1,2,3,4,5)

val spark = SparkSession.builder().master("local").getOrCreate()
import spark.implicits._
val df = values.toDF()

val core = "mytable8"
val table = s"hive_test.mydb.${core}"
val s3IcePath = s"s3a://spark-test/${core}.ice"

df.writeTo(table)
    .tableProperty("write.format.default", "parquet")
    .tableProperty("location", s3IcePath)
    .createOrReplace()

I got an error "The AWS Access Key Id you provided does not exist in our
records.".

I have verified that I can login minio UI using the same username and
password that I passed to spark-shell via AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY env variables.
https://github.com/apache/iceberg/issues/2168 is related but does not help
me. Not sure why the credential does not work for iceberg + AWS. Any idea
or an example of writing an iceberg table to S3 using hive catalog will be
highly appreciated! Thanks.

Reply via email to