Re: Re: How to integrate apache iceberg with aliyun oss?

Daniel Weeks Thu, 11 Jul 2024 11:16:07 -0700

Hey Casel,

It looks like you might be using the wrong URI scheme when defining the
warehouse location.  If you look at the OSSURI implementation, it defines
the valid schemes
<http://9c912ac008c64edea7db6d047f0/aliyun/src/main/java/org/apache/iceberg/aliyun/oss/OSSURI.java#L40>
('s3'
is not a valid scheme).


My guess is that the 's3' scheme is coming from the REST server, so you
might want to take a look at how that is configured (it may default to
S3FileIO, not OSSFileIO).

-Dan

On Thu, Jul 11, 2024 at 1:17 AM casel.chen <casel_c...@126.com> wrote:

> Hi All, I'm tried to use spark sql write iceberg table to aliyun oss[1]
> with rest catalog[2] .
>
> When I inserted some data into the iceberg table, it complained "Invalid
> scheme: s3 in OSS location
> s3://warehouse/testdb/testtable/data/00001-1-e551ee1f-f888-4506-8d60-b9b1b726852d-0-00001.parquet"，what's
> wrong?
>
> How to use iceberg rest catalog with aliyun oss? I can't find related
> guideline in the iceberg official document. Any suggestion is apprecaite,
> thanks!
>
>
> spark-sql --packages
> org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.2 \
>
>   --conf
> spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
> \
>
>   --conf spark.sql.catalog.rest=org.apache.iceberg.spark.SparkCatalog \
>
>   --conf spark.sql.catalog.rest.type=rest \
>
>   --conf spark.sql.catalog.rest.uri=http://xxx.xxx.xxx.xxx:8181 \
>
>   --conf
> spark.sql.catalog.rest.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \
>
>   --conf spark.sql.catalog.rest.oss.endpoint=
> oss-cn-shanghai-internal.aliyuncs.com \
>
>   --conf spark.sql.catalog.rest.client.access-key-id={AK} \
>
>   --conf spark.sql.catalog.rest.client.access-key-secret={SK} \
>
>   --conf
> spark.sql.catalog.rest.warehouse=oss://odps-prd/lakehouse/iceberg-rest/warehouse
> \
>
>   --conf spark.sql.defaultCatalog=rest
>
>
> spark-sql (testdb)> create database testdb;
>
> Time taken: 0.12 seconds
>
> spark-sql (testdb)> CREATE TABLE rest.testdb.testtable (id bigint, data
> string) USING iceberg;
>
> Time taken: 0.228 seconds
>
> spark-sql (testdb)> INSERT INTO rest.testdb.testtable VALUES (1, 'a'), (2,
> 'b'), (3, 'c');
>
> 24/07/11 15:25:43 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID
> 1)
>
> org.apache.iceberg.exceptions.ValidationException: Invalid scheme: s3 in
> OSS location
> s3://warehouse/testdb/testtable/data/00001-1-e551ee1f-f888-4506-8d60-b9b1b726852d-0-00001.parquet
>
> at
> org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
>
> at org.apache.iceberg.aliyun.oss.OSSURI.<init>(OSSURI.java:66)
>
> at org.apache.iceberg.aliyun.oss.OSSFileIO.newOutputFile(OSSFileIO.java:81)
>
> at
> org.apache.iceberg.io.OutputFileFactory.newOutputFile(OutputFileFactory.java:105)
>
> at
> org.apache.iceberg.io.RollingFileWriter.newFile(RollingFileWriter.java:113)
>
> at
> org.apache.iceberg.io.RollingFileWriter.openCurrentWriter(RollingFileWriter.java:106)
>
> at
> org.apache.iceberg.io.RollingDataWriter.<init>(RollingDataWriter.java:47)
>
> at
> org.apache.iceberg.spark.source.SparkWrite$UnpartitionedDataWriter.<init>(SparkWrite.java:717)
>
> at
> org.apache.iceberg.spark.source.SparkWrite$UnpartitionedDataWriter.<init>(SparkWrite.java:707)
>
> at
> org.apache.iceberg.spark.source.SparkWrite$WriterFactory.createWriter(SparkWrite.java:691)
>
> at
> org.apache.iceberg.spark.source.SparkWrite$WriterFactory.createWriter(SparkWrite.java:668)
>
> at
> org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:436)
>
> at
> org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:425)
>
> at
> org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:491)
>
> at
> org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:388)
>
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
>
> at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:141)
>
> at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>
> at
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>
> at
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
>
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
> at java.lang.Thread.run(Thread.java:748)
>
>
> [1] https://www.alibabacloud.com/product/oss
>
> [2] https://github.com/tabular-io/iceberg-rest-image
>
>
>
> At 2023-10-30 12:58:46, "Renjie Liu" <liurenjie2...@gmail.com> wrote:
>
> Seems that the code is available, but the doc is missing.
>
>
> https://github.com/apache/iceberg/blob/2268bd8acaaee2748b20ad93430ca7073ac53009/aliyun
>
> On Sun, Oct 29, 2023 at 7:04 PM casel.chen <casel_c...@126.com> wrote:
>
>> Hi, I am seeking for a guideline of integration apache iceberg with
>> aliyun oss, thanks advance!
>>
>

Re: Re: How to integrate apache iceberg with aliyun oss?

Reply via email to