Hi,

Also +1 on standardizing properties. I'm looking forward to this discussion
topic. In particular, REST catalog properties imho should be standardized
with the "rest." prefix, and REST auth properties should imho have a prefix
like "rest.auth.<auth-type>.", e.g. "rest.auth.oauth2.issuer-url".

Thanks,
Alex

On Wed, Jul 10, 2024 at 4:59 PM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> Sounds reasonable to me
>
> On Wed, Jul 10, 2024 at 9:28 AM Renjie Liu <liurenjie2...@gmail.com>
> wrote:
>
>> Hi:
>>
>> +1 for standardizing iceberg properties. This will help to align
>> different language implementations.
>>
>> On Wed, Jul 10, 2024 at 9:44 PM <ndrl...@proton.me.invalid> wrote:
>>
>>> Hello Everyone,
>>>
>>> I was considering discussing the standardization of Iceberg properties,
>>> and I believe this thread could be a great place to start.
>>>
>>> I'm writing an Iceberg client in Elixir and using the Java, Python, and
>>> Rust implementations as references. However, I've had some difficulty
>>> determining which configurations we must support and what each client has
>>> implemented. Therefore, I agree with Xuanwo about having a separate
>>> section as a single source of truth (SSOT).
>>>
>>> Additionally, I think it would be beneficial for each client to show
>>> what it does not support. This would make it easier for users to know that
>>> a particular client might not work with some configuration that their
>>> catalog could define as default or override. It would also help us, as
>>> contributors, to know which configurations we need to implement support for.
>>>
>>> For example, the "s3.signer"[1] and "s3.proxy-uri"[2] configurations
>>> only exist in the Python implementation. I believe it is not clear that
>>> these configurations are exclusive to Python, and they might be
>>> configurations that the catalog could override or define as defaults in the
>>> get info endpoint. Without an SSOT, this could be harder to track.
>>>
>>> Another example is the "rest.authorization-url" in Python and Rust
>>> versus "oauth2_server_uri" in Java. Although this is a bit out of scope for
>>> this thread, I will open another discussion topic about broader
>>> standardization of available properties.
>>>
>>> [1]:
>>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python+s3.signer&type=code
>>> [2]:
>>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python%20S3_PROXY_URI&type=code
>>> On Wednesday, July 10th, 2024 at 7:51 AM, Fokko Driesprong <
>>> fo...@apache.org> wrote:
>>>
>>> Hey Xuanwo,
>>>
>>> Thanks for raising this.
>>>
>>>    - The S3 properties are largely covered under the S3FileIO page:
>>>    https://iceberg.apache.org/docs/nightly/aws/#s3-fileio. But it looks
>>>    like some important ones are missing indeed. I've raised an issue
>>>    here <https://github.com/apache/iceberg/issues/10674>.
>>>    - For PyIceberg it only supports like a subset of the functionality,
>>>    and therefore also many properties are missing there.
>>>    - For the REST Catalog, there is an open PR to add
>>>    <https://github.com/apache/iceberg/pull/10576> the options for GCS
>>>    and ADLS. It would be great to get some more eyes on there.
>>>
>>> That being said, I do think there is value in formalizing them. When
>>> adding configuration options to PyIceberg, I'll make sure to check out the
>>> Java implementation to ensure that we use the same property.
>>>
>>> Kind regards,
>>> Fokko
>>>
>>> Op wo 10 jul 2024 om 09:22 schreef Xuanwo <xua...@apache.org>:
>>>
>>>> Hello everyone
>>>>
>>>> I've been working on the iceberg-rust FileIO recently and have found it
>>>> challenging to identify all the necessary IO properties we need to support.
>>>>
>>>> For instance, consider AWS S3. There are no documents specifying which
>>>> properties are supported by S3.
>>>>
>>>> The only relevant documentation I could find includes:
>>>>
>>>> - Iceberg AWS Integrations[1]: Does not define `s3.access-key-id` or
>>>> `s3.secret-access-key`.
>>>> - Pyiceberg configuration[2]: Missing several S3-related properties.
>>>> - Iceberg REST Catalog[3]: Does not cover all storage services.
>>>>
>>>> To gather this information, we must refer to the S3FileIO Java code[4].
>>>>
>>>> I propose adding a separate section for agreeing upon these properties.
>>>> We could create a specification that outlines all IO properties with
>>>> indications of whether they are required or optional, along with their
>>>> expected behaviors. This would help ensure consistency across different
>>>> implementations without any conflicts.
>>>>
>>>>
>>>> [1]: https://iceberg.apache.org/docs/latest/aws/
>>>> [2]: https://py.iceberg.apache.org/configuration/#s3
>>>> [3]:
>>>> https://github.com/apache/iceberg/blob/eee81c59199a54e749ea58dae070eb066d9a5f9e/open-api/rest-catalog-open-api.yaml#L2737
>>>> [4]:
>>>> https://github.com/apache/iceberg/blob/2b21020aedb63c26295005d150c05f0a5a5f0eb2/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L46
>>>>
>>>> Xuanwo
>>>>
>>>> https://xuanwo.io/
>>>>
>>>
>>>

Reply via email to