Hi Casel

I think Daniel is correct, and  There is indeed no official document about
aliyun oss integration, I think we can write a doc for this.
There are some other related documents  (but not exactly in your case I
think) about aliyun iceberg integration, if you are interested.

[1]
https://www.alibabacloud.com/help/en/flink/developer-reference/apache-iceberg-connector
[2] https://www.yuque.com/huzijin-og9kx/gywdy7/srwqht

Best.

On Fri, Jul 12, 2024 at 2:17 AM Daniel Weeks <dwe...@apache.org> wrote:

> Hey Casel,
>
> It looks like you might be using the wrong URI scheme when defining the
> warehouse location.  If you look at the OSSURI implementation, it defines
> the valid schemes
> <http://9c912ac008c64edea7db6d047f0/aliyun/src/main/java/org/apache/iceberg/aliyun/oss/OSSURI.java#L40>
>  ('s3'
> is not a valid scheme).
>
> My guess is that the 's3' scheme is coming from the REST server, so you
> might want to take a look at how that is configured (it may default to
> S3FileIO, not OSSFileIO).
>
> -Dan
>
> On Thu, Jul 11, 2024 at 1:17 AM casel.chen <casel_c...@126.com> wrote:
>
>> Hi All, I'm tried to use spark sql write iceberg table to aliyun oss[1]
>> with rest catalog[2] .
>>
>> When I inserted some data into the iceberg table, it complained "Invalid
>> scheme: s3 in OSS location
>> s3://warehouse/testdb/testtable/data/00001-1-e551ee1f-f888-4506-8d60-b9b1b726852d-0-00001.parquet",what's
>> wrong?
>>
>> How to use iceberg rest catalog with aliyun oss? I can't find related
>> guideline in the iceberg official document. Any suggestion is apprecaite,
>> thanks!
>>
>>
>> spark-sql --packages
>> org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.2 \
>>
>>   --conf
>> spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
>> \
>>
>>   --conf spark.sql.catalog.rest=org.apache.iceberg.spark.SparkCatalog \
>>
>>   --conf spark.sql.catalog.rest.type=rest \
>>
>>   --conf spark.sql.catalog.rest.uri=http://xxx.xxx.xxx.xxx:8181 \
>>
>>   --conf
>> spark.sql.catalog.rest.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \
>>
>>   --conf spark.sql.catalog.rest.oss.endpoint=
>> oss-cn-shanghai-internal.aliyuncs.com \
>>
>>   --conf spark.sql.catalog.rest.client.access-key-id={AK} \
>>
>>   --conf spark.sql.catalog.rest.client.access-key-secret={SK} \
>>
>>   --conf
>> spark.sql.catalog.rest.warehouse=oss://odps-prd/lakehouse/iceberg-rest/warehouse
>> \
>>
>>   --conf spark.sql.defaultCatalog=rest
>>
>>
>> spark-sql (testdb)> create database testdb;
>>
>> Time taken: 0.12 seconds
>>
>> spark-sql (testdb)> CREATE TABLE rest.testdb.testtable (id bigint, data
>> string) USING iceberg;
>>
>> Time taken: 0.228 seconds
>>
>> spark-sql (testdb)> INSERT INTO rest.testdb.testtable VALUES (1, 'a'),
>> (2, 'b'), (3, 'c');
>>
>> 24/07/11 15:25:43 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID
>> 1)
>>
>> org.apache.iceberg.exceptions.ValidationException: Invalid scheme: s3 in
>> OSS location
>> s3://warehouse/testdb/testtable/data/00001-1-e551ee1f-f888-4506-8d60-b9b1b726852d-0-00001.parquet
>>
>> at
>> org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
>>
>> at org.apache.iceberg.aliyun.oss.OSSURI.<init>(OSSURI.java:66)
>>
>> at
>> org.apache.iceberg.aliyun.oss.OSSFileIO.newOutputFile(OSSFileIO.java:81)
>>
>> at
>> org.apache.iceberg.io.OutputFileFactory.newOutputFile(OutputFileFactory.java:105)
>>
>> at
>> org.apache.iceberg.io.RollingFileWriter.newFile(RollingFileWriter.java:113)
>>
>> at
>> org.apache.iceberg.io.RollingFileWriter.openCurrentWriter(RollingFileWriter.java:106)
>>
>> at
>> org.apache.iceberg.io.RollingDataWriter.<init>(RollingDataWriter.java:47)
>>
>> at
>> org.apache.iceberg.spark.source.SparkWrite$UnpartitionedDataWriter.<init>(SparkWrite.java:717)
>>
>> at
>> org.apache.iceberg.spark.source.SparkWrite$UnpartitionedDataWriter.<init>(SparkWrite.java:707)
>>
>> at
>> org.apache.iceberg.spark.source.SparkWrite$WriterFactory.createWriter(SparkWrite.java:691)
>>
>> at
>> org.apache.iceberg.spark.source.SparkWrite$WriterFactory.createWriter(SparkWrite.java:668)
>>
>> at
>> org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:436)
>>
>> at
>> org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:425)
>>
>> at
>> org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:491)
>>
>> at
>> org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:388)
>>
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
>>
>> at
>> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
>>
>> at org.apache.spark.scheduler.Task.run(Task.scala:141)
>>
>> at
>> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>>
>> at
>> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>>
>> at
>> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>>
>> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
>>
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>
>> at java.lang.Thread.run(Thread.java:748)
>>
>>
>> [1] https://www.alibabacloud.com/product/oss
>>
>> [2] https://github.com/tabular-io/iceberg-rest-image
>>
>>
>>
>> At 2023-10-30 12:58:46, "Renjie Liu" <liurenjie2...@gmail.com> wrote:
>>
>> Seems that the code is available, but the doc is missing.
>>
>>
>> https://github.com/apache/iceberg/blob/2268bd8acaaee2748b20ad93430ca7073ac53009/aliyun
>>
>> On Sun, Oct 29, 2023 at 7:04 PM casel.chen <casel_c...@126.com> wrote:
>>
>>> Hi, I am seeking for a guideline of integration apache iceberg with
>>> aliyun oss, thanks advance!
>>>
>>

Reply via email to