Hi, Also +1 on standardizing properties. I'm looking forward to this discussion topic. In particular, REST catalog properties imho should be standardized with the "rest." prefix, and REST auth properties should imho have a prefix like "rest.auth.<auth-type>.", e.g. "rest.auth.oauth2.issuer-url".
Thanks, Alex On Wed, Jul 10, 2024 at 4:59 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > Sounds reasonable to me > > On Wed, Jul 10, 2024 at 9:28 AM Renjie Liu <liurenjie2...@gmail.com> > wrote: > >> Hi: >> >> +1 for standardizing iceberg properties. This will help to align >> different language implementations. >> >> On Wed, Jul 10, 2024 at 9:44 PM <ndrl...@proton.me.invalid> wrote: >> >>> Hello Everyone, >>> >>> I was considering discussing the standardization of Iceberg properties, >>> and I believe this thread could be a great place to start. >>> >>> I'm writing an Iceberg client in Elixir and using the Java, Python, and >>> Rust implementations as references. However, I've had some difficulty >>> determining which configurations we must support and what each client has >>> implemented. Therefore, I agree with Xuanwo about having a separate >>> section as a single source of truth (SSOT). >>> >>> Additionally, I think it would be beneficial for each client to show >>> what it does not support. This would make it easier for users to know that >>> a particular client might not work with some configuration that their >>> catalog could define as default or override. It would also help us, as >>> contributors, to know which configurations we need to implement support for. >>> >>> For example, the "s3.signer"[1] and "s3.proxy-uri"[2] configurations >>> only exist in the Python implementation. I believe it is not clear that >>> these configurations are exclusive to Python, and they might be >>> configurations that the catalog could override or define as defaults in the >>> get info endpoint. Without an SSOT, this could be harder to track. >>> >>> Another example is the "rest.authorization-url" in Python and Rust >>> versus "oauth2_server_uri" in Java. Although this is a bit out of scope for >>> this thread, I will open another discussion topic about broader >>> standardization of available properties. >>> >>> [1]: >>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python+s3.signer&type=code >>> [2]: >>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python%20S3_PROXY_URI&type=code >>> On Wednesday, July 10th, 2024 at 7:51 AM, Fokko Driesprong < >>> fo...@apache.org> wrote: >>> >>> Hey Xuanwo, >>> >>> Thanks for raising this. >>> >>> - The S3 properties are largely covered under the S3FileIO page: >>> https://iceberg.apache.org/docs/nightly/aws/#s3-fileio. But it looks >>> like some important ones are missing indeed. I've raised an issue >>> here <https://github.com/apache/iceberg/issues/10674>. >>> - For PyIceberg it only supports like a subset of the functionality, >>> and therefore also many properties are missing there. >>> - For the REST Catalog, there is an open PR to add >>> <https://github.com/apache/iceberg/pull/10576> the options for GCS >>> and ADLS. It would be great to get some more eyes on there. >>> >>> That being said, I do think there is value in formalizing them. When >>> adding configuration options to PyIceberg, I'll make sure to check out the >>> Java implementation to ensure that we use the same property. >>> >>> Kind regards, >>> Fokko >>> >>> Op wo 10 jul 2024 om 09:22 schreef Xuanwo <xua...@apache.org>: >>> >>>> Hello everyone >>>> >>>> I've been working on the iceberg-rust FileIO recently and have found it >>>> challenging to identify all the necessary IO properties we need to support. >>>> >>>> For instance, consider AWS S3. There are no documents specifying which >>>> properties are supported by S3. >>>> >>>> The only relevant documentation I could find includes: >>>> >>>> - Iceberg AWS Integrations[1]: Does not define `s3.access-key-id` or >>>> `s3.secret-access-key`. >>>> - Pyiceberg configuration[2]: Missing several S3-related properties. >>>> - Iceberg REST Catalog[3]: Does not cover all storage services. >>>> >>>> To gather this information, we must refer to the S3FileIO Java code[4]. >>>> >>>> I propose adding a separate section for agreeing upon these properties. >>>> We could create a specification that outlines all IO properties with >>>> indications of whether they are required or optional, along with their >>>> expected behaviors. This would help ensure consistency across different >>>> implementations without any conflicts. >>>> >>>> >>>> [1]: https://iceberg.apache.org/docs/latest/aws/ >>>> [2]: https://py.iceberg.apache.org/configuration/#s3 >>>> [3]: >>>> https://github.com/apache/iceberg/blob/eee81c59199a54e749ea58dae070eb066d9a5f9e/open-api/rest-catalog-open-api.yaml#L2737 >>>> [4]: >>>> https://github.com/apache/iceberg/blob/2b21020aedb63c26295005d150c05f0a5a5f0eb2/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L46 >>>> >>>> Xuanwo >>>> >>>> https://xuanwo.io/ >>>> >>> >>>