Thanks Daniel.

After modifying the script to,

export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=minio
export AWS_SECRET_ACCESS_KEY=minio123

ICEBERG_VERSION=0.11.1
DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0"

MINIOSERVER=192.168.160.5


# add AWS dependnecy
AWS_SDK_VERSION=2.15.40
AWS_MAVEN_GROUP=software.amazon.awssdk
AWS_PACKAGES=(
    "bundle"
    "url-connection-client"
)
for pkg in "${AWS_PACKAGES[@]}"; do
    DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
done

# start Spark SQL client shell
/spark/bin/spark-shell --packages $DEPENDENCIES \
    --conf
spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.hive_test.type=hive  \
    --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \
    --conf spark.hadoop.fs.s3a.access.key=minio \
    --conf spark.hadoop.fs.s3a.secret.key=minio123 \
    --conf spark.hadoop.fs.s3a.path.style.access=true \
    --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem

I got: MetaException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3a.S3AFileSystem not found. My hive server is not
using s3 and should not cause this error. Any ideas? Thanks.


I got "ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem
not found". Any idea what dependency could I miss?

On Fri, Aug 13, 2021 at 4:03 PM Daniel Weeks <daniel.c.we...@gmail.com>
wrote:

> Hey Lian,
>
> At a cursory glance, it appears that you might be mixing two different
> FileIO implementations, which may be why you are not getting the expected
> result.
>
> When you set: --conf
> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO you're
> actually switching over to the native S3 implementation within Iceberg (as
> opposed to S3AFileSystem via HadoopFileIO).  However, all of the following
> settings to setup access are then set for the S3AFileSystem (which would
> not be used with S3FileIO).
>
> You might try just removing that line since it should use the HadoopFileIO
> at that point and may work.
>
> Hope that's helpful,
> -Dan
>
> On Fri, Aug 13, 2021 at 3:50 PM Lian Jiang <jiangok2...@gmail.com> wrote:
>
>> Hi,
>>
>> I try to create an iceberg table on minio s3 and hive.
>>
>> *This is how I launch spark-shell:*
>>
>> # add Iceberg dependency
>> export AWS_REGION=us-east-1
>> export AWS_ACCESS_KEY_ID=minio
>> export AWS_SECRET_ACCESS_KEY=minio123
>>
>> ICEBERG_VERSION=0.11.1
>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION"
>>
>> MINIOSERVER=192.168.160.5
>>
>>
>> # add AWS dependnecy
>> AWS_SDK_VERSION=2.15.40
>> AWS_MAVEN_GROUP=software.amazon.awssdk
>> AWS_PACKAGES=(
>>     "bundle"
>>     "url-connection-client"
>> )
>> for pkg in "${AWS_PACKAGES[@]}"; do
>>     DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
>> done
>>
>> # start Spark SQL client shell
>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>     --conf
>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>     --conf spark.sql.catalog.hive_test.warehouse=s3a://east/prefix \
>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>     --conf
>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \
>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>     --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>
>> *Here is the spark code to create the iceberg table:*
>>
>> import org.apache.spark.sql.SparkSession
>> val values = List(1,2,3,4,5)
>>
>> val spark = SparkSession.builder().master("local").getOrCreate()
>> import spark.implicits._
>> val df = values.toDF()
>>
>> val core = "mytable8"
>> val table = s"hive_test.mydb.${core}"
>> val s3IcePath = s"s3a://spark-test/${core}.ice"
>>
>> df.writeTo(table)
>>     .tableProperty("write.format.default", "parquet")
>>     .tableProperty("location", s3IcePath)
>>     .createOrReplace()
>>
>> I got an error "The AWS Access Key Id you provided does not exist in our
>> records.".
>>
>> I have verified that I can login minio UI using the same username and
>> password that I passed to spark-shell via AWS_ACCESS_KEY_ID and
>> AWS_SECRET_ACCESS_KEY env variables.
>> https://github.com/apache/iceberg/issues/2168 is related but does not
>> help me. Not sure why the credential does not work for iceberg + AWS. Any
>> idea or an example of writing an iceberg table to S3 using hive catalog
>> will be highly appreciated! Thanks.
>>
>>
>>

-- 

Create your own email signature
<https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>

Reply via email to