Hello Everyone,

I was considering discussing the standardization of Iceberg properties, and I 
believe this thread could be a great place to start.

I'm writing an Iceberg client in Elixir and using the Java, Python, and Rust 
implementations as references. However, I've had some difficulty determining 
which configurations we must support and what each client has implemented. 
Therefore, I agree with Xuanwo about having a separate section as a single 
source of truth (SSOT).

Additionally, I think it would be beneficial for each client to show what it 
does not support. This would make it easier for users to know that a particular 
client might not work with some configuration that their catalog could define 
as default or override. It would also help us, as contributors, to know which 
configurations we need to implement support for.

For example, the "s3.signer"[1] and "s3.proxy-uri"[2] configurations only exist 
in the Python implementation. I believe it is not clear that these 
configurations are exclusive to Python, and they might be configurations that 
the catalog could override or define as defaults in the get info endpoint. 
Without an SSOT, this could be harder to track.
Another example is the "rest.authorization-url" in Python and Rust versus 
"oauth2_server_uri" in Java. Although this is a bit out of scope for this 
thread, I will open another discussion topic about broader standardization of 
available properties.

[1]: 
https://github.com/search?q=repo%3Aapache%2Ficeberg-python+s3.signer&type=code
[2]: 
https://github.com/search?q=repo%3Aapache%2Ficeberg-python%20S3_PROXY_URI&type=code

On Wednesday, July 10th, 2024 at 7:51 AM, Fokko Driesprong <fo...@apache.org> 
wrote:

> Hey Xuanwo,
>
> Thanks for raising this.
>
> - The S3 properties are largely covered under the S3FileIO page: 
> https://iceberg.apache.org/docs/nightly/aws/#s3-fileio. But it looks like 
> some important ones are missing indeed. I've raised [an issue 
> here](https://github.com/apache/iceberg/issues/10674).
> - For PyIceberg it only supports like a subset of the functionality, and 
> therefore also many properties are missing there.
> - For the REST Catalog, there is [an open PR to 
> add](https://github.com/apache/iceberg/pull/10576) the options for GCS and 
> ADLS. It would be great to get some more eyes on there.
>
> That being said, I do think there is value in formalizing them. When adding 
> configuration options to PyIceberg, I'll make sure to check out the Java 
> implementation to ensure that we use the same property.
>
> Kind regards,
> Fokko
>
> Op wo 10 jul 2024 om 09:22 schreef Xuanwo <xua...@apache.org>:
>
>> Hello everyone
>>
>> I've been working on the iceberg-rust FileIO recently and have found it 
>> challenging to identify all the necessary IO properties we need to support.
>>
>> For instance, consider AWS S3. There are no documents specifying which 
>> properties are supported by S3.
>>
>> The only relevant documentation I could find includes:
>>
>> - Iceberg AWS Integrations[1]: Does not define `s3.access-key-id` or 
>> `s3.secret-access-key`.
>> - Pyiceberg configuration[2]: Missing several S3-related properties.
>> - Iceberg REST Catalog[3]: Does not cover all storage services.
>>
>> To gather this information, we must refer to the S3FileIO Java code[4].
>>
>> I propose adding a separate section for agreeing upon these properties. We 
>> could create a specification that outlines all IO properties with 
>> indications of whether they are required or optional, along with their 
>> expected behaviors. This would help ensure consistency across different 
>> implementations without any conflicts.
>>
>> [1]: https://iceberg.apache.org/docs/latest/aws/
>> [2]: https://py.iceberg.apache.org/configuration/#s3
>> [3]: 
>> https://github.com/apache/iceberg/blob/eee81c59199a54e749ea58dae070eb066d9a5f9e/open-api/rest-catalog-open-api.yaml#L2737
>> [4]: 
>> https://github.com/apache/iceberg/blob/2b21020aedb63c26295005d150c05f0a5a5f0eb2/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L46
>>
>> Xuanwo
>>
>> https://xuanwo.io/

Reply via email to