Sorry for the late reply, I thought I replied on Friday but the email did
not send successfully.

As Daniel said, you don't need to setup S3A if you are using S3FileIO.

Th S3FileIO by default reads the default credentials chain to check
credential setups one by one:
https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html#credentials-chain

If you would like to use a specialized credential provider, you can
directly customize your S3 client:
https://iceberg.apache.org/aws/#aws-client-customization

It looks like you are trying to use MinIO to mount S3A file system? If you
have to use MinIO then there is not a way to integrate with S3FileIO right
now. (maybe I am wrong on this, I don't know much about MinIO)

To directly use S3FileIO with HiveCatalog, simply do:

/spark/bin/spark-shell --packages $DEPENDENCIES \
    --conf
spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.hive_test.type=hive  \
    --conf
spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
    --conf spark.sql.catalog.hive_test.warehouse=s3://bucket

Best,
Jack Ye



On Sun, Aug 15, 2021 at 2:53 PM Lian Jiang <jiangok2...@gmail.com> wrote:

> Thanks. I prefer S3FileIO as it is recommended by iceberg. Do you have a
> sample using hive catalog, s3FileIO, spark API (as opposed to SQL), S3
> access.key and secret.key? It is hard to get all settings right for this
> combination without an example. Appreciate any help.
>
> On Fri, Aug 13, 2021 at 6:01 PM Daniel Weeks <daniel.c.we...@gmail.com>
> wrote:
>
>> So, if I recall correctly, the hive server does need access to check and
>> create paths for table locations.
>>
>> There may be an option to disable this behavior, but otherwise the fs
>> implementation probably needs to be available to the hive metastore.
>>
>> -Dan
>>
>> On Fri, Aug 13, 2021, 4:48 PM Lian Jiang <jiangok2...@gmail.com> wrote:
>>
>>> Thanks Daniel.
>>>
>>> After modifying the script to,
>>>
>>> export AWS_REGION=us-east-1
>>> export AWS_ACCESS_KEY_ID=minio
>>> export AWS_SECRET_ACCESS_KEY=minio123
>>>
>>> ICEBERG_VERSION=0.11.1
>>>
>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0"
>>>
>>> MINIOSERVER=192.168.160.5
>>>
>>>
>>> # add AWS dependnecy
>>> AWS_SDK_VERSION=2.15.40
>>> AWS_MAVEN_GROUP=software.amazon.awssdk
>>> AWS_PACKAGES=(
>>>     "bundle"
>>>     "url-connection-client"
>>> )
>>> for pkg in "${AWS_PACKAGES[@]}"; do
>>>     DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
>>> done
>>>
>>> # start Spark SQL client shell
>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>     --conf
>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \
>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>     --conf
>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>
>>> I got: MetaException: java.lang.RuntimeException:
>>> java.lang.ClassNotFoundException: Class
>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found. My hive server is not
>>> using s3 and should not cause this error. Any ideas? Thanks.
>>>
>>>
>>> I got "ClassNotFoundException: Class
>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found". Any idea what dependency
>>> could I miss?
>>>
>>> On Fri, Aug 13, 2021 at 4:03 PM Daniel Weeks <daniel.c.we...@gmail.com>
>>> wrote:
>>>
>>>> Hey Lian,
>>>>
>>>> At a cursory glance, it appears that you might be mixing two different
>>>> FileIO implementations, which may be why you are not getting the expected
>>>> result.
>>>>
>>>> When you set: --conf
>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO 
>>>> you're
>>>> actually switching over to the native S3 implementation within Iceberg (as
>>>> opposed to S3AFileSystem via HadoopFileIO).  However, all of the following
>>>> settings to setup access are then set for the S3AFileSystem (which would
>>>> not be used with S3FileIO).
>>>>
>>>> You might try just removing that line since it should use the
>>>> HadoopFileIO at that point and may work.
>>>>
>>>> Hope that's helpful,
>>>> -Dan
>>>>
>>>> On Fri, Aug 13, 2021 at 3:50 PM Lian Jiang <jiangok2...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I try to create an iceberg table on minio s3 and hive.
>>>>>
>>>>> *This is how I launch spark-shell:*
>>>>>
>>>>> # add Iceberg dependency
>>>>> export AWS_REGION=us-east-1
>>>>> export AWS_ACCESS_KEY_ID=minio
>>>>> export AWS_SECRET_ACCESS_KEY=minio123
>>>>>
>>>>> ICEBERG_VERSION=0.11.1
>>>>>
>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION"
>>>>>
>>>>> MINIOSERVER=192.168.160.5
>>>>>
>>>>>
>>>>> # add AWS dependnecy
>>>>> AWS_SDK_VERSION=2.15.40
>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk
>>>>> AWS_PACKAGES=(
>>>>>     "bundle"
>>>>>     "url-connection-client"
>>>>> )
>>>>> for pkg in "${AWS_PACKAGES[@]}"; do
>>>>>     DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
>>>>> done
>>>>>
>>>>> # start Spark SQL client shell
>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>     --conf
>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>     --conf spark.sql.catalog.hive_test.warehouse=s3a://east/prefix \
>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>     --conf
>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
>>>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \
>>>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>>>     --conf
>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>>>
>>>>> *Here is the spark code to create the iceberg table:*
>>>>>
>>>>> import org.apache.spark.sql.SparkSession
>>>>> val values = List(1,2,3,4,5)
>>>>>
>>>>> val spark = SparkSession.builder().master("local").getOrCreate()
>>>>> import spark.implicits._
>>>>> val df = values.toDF()
>>>>>
>>>>> val core = "mytable8"
>>>>> val table = s"hive_test.mydb.${core}"
>>>>> val s3IcePath = s"s3a://spark-test/${core}.ice"
>>>>>
>>>>> df.writeTo(table)
>>>>>     .tableProperty("write.format.default", "parquet")
>>>>>     .tableProperty("location", s3IcePath)
>>>>>     .createOrReplace()
>>>>>
>>>>> I got an error "The AWS Access Key Id you provided does not exist in
>>>>> our records.".
>>>>>
>>>>> I have verified that I can login minio UI using the same username and
>>>>> password that I passed to spark-shell via AWS_ACCESS_KEY_ID and
>>>>> AWS_SECRET_ACCESS_KEY env variables.
>>>>> https://github.com/apache/iceberg/issues/2168 is related but does not
>>>>> help me. Not sure why the credential does not work for iceberg + AWS. Any
>>>>> idea or an example of writing an iceberg table to S3 using hive catalog
>>>>> will be highly appreciated! Thanks.
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>>
>>> Create your own email signature
>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>
>>
>
> --
>
> Create your own email signature
> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>

Reply via email to