I don't know what the recommended way to start standardizing is. We can start a proposal for each context or have one proposal to handle all.
Suggested contexts to start with: - Rest Catalog - FileIO I believe that most of the other cases are supported by the configuration topic in the Table section[1], but this is about the Java implementation. Maybe we need to create a page in the project section[2] to handle the properties in the table section and the Rest and FileIO contexts. [1]: https://iceberg.apache.org/docs/latest/configuration/ [2]: https://iceberg.apache.org/community/ On Wednesday, July 10th, 2024 at 11:58 AM, Russell Spitzer <russell.spit...@gmail.com> wrote: > Sounds reasonable to me > > On Wed, Jul 10, 2024 at 9:28 AM Renjie Liu <liurenjie2...@gmail.com> wrote: > >> Hi: >> >> +1 for standardizing iceberg properties. This will help to align different >> language implementations. >> >> On Wed, Jul 10, 2024 at 9:44 PM <ndrl...@proton.me.invalid> wrote: >> >>> Hello Everyone, >>> >>> I was considering discussing the standardization of Iceberg properties, and >>> I believe this thread could be a great place to start. >>> >>> I'm writing an Iceberg client in Elixir and using the Java, Python, and >>> Rust implementations as references. However, I've had some difficulty >>> determining which configurations we must support and what each client has >>> implemented. Therefore, I agree with Xuanwo about having a separate section >>> as a single source of truth (SSOT). >>> >>> Additionally, I think it would be beneficial for each client to show what >>> it does not support. This would make it easier for users to know that a >>> particular client might not work with some configuration that their catalog >>> could define as default or override. It would also help us, as >>> contributors, to know which configurations we need to implement support for. >>> >>> For example, the "s3.signer"[1] and "s3.proxy-uri"[2] configurations only >>> exist in the Python implementation. I believe it is not clear that these >>> configurations are exclusive to Python, and they might be configurations >>> that the catalog could override or define as defaults in the get info >>> endpoint. Without an SSOT, this could be harder to track. >>> Another example is the "rest.authorization-url" in Python and Rust versus >>> "oauth2_server_uri" in Java. Although this is a bit out of scope for this >>> thread, I will open another discussion topic about broader standardization >>> of available properties. >>> >>> [1]: >>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python+s3.signer&type=code >>> [2]: >>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python%20S3_PROXY_URI&type=code >>> >>> On Wednesday, July 10th, 2024 at 7:51 AM, Fokko Driesprong >>> <fo...@apache.org> wrote: >>> >>>> Hey Xuanwo, >>>> >>>> Thanks for raising this. >>>> >>>> - The S3 properties are largely covered under the S3FileIO page: >>>> https://iceberg.apache.org/docs/nightly/aws/#s3-fileio. But it looks like >>>> some important ones are missing indeed. I've raised [an issue >>>> here](https://github.com/apache/iceberg/issues/10674). >>>> - For PyIceberg it only supports like a subset of the functionality, and >>>> therefore also many properties are missing there. >>>> - For the REST Catalog, there is [an open PR to >>>> add](https://github.com/apache/iceberg/pull/10576) the options for GCS and >>>> ADLS. It would be great to get some more eyes on there. >>>> >>>> That being said, I do think there is value in formalizing them. When >>>> adding configuration options to PyIceberg, I'll make sure to check out the >>>> Java implementation to ensure that we use the same property. >>>> >>>> Kind regards, >>>> Fokko >>>> >>>> Op wo 10 jul 2024 om 09:22 schreef Xuanwo <xua...@apache.org>: >>>> >>>>> Hello everyone >>>>> >>>>> I've been working on the iceberg-rust FileIO recently and have found it >>>>> challenging to identify all the necessary IO properties we need to >>>>> support. >>>>> >>>>> For instance, consider AWS S3. There are no documents specifying which >>>>> properties are supported by S3. >>>>> >>>>> The only relevant documentation I could find includes: >>>>> >>>>> - Iceberg AWS Integrations[1]: Does not define `s3.access-key-id` or >>>>> `s3.secret-access-key`. >>>>> - Pyiceberg configuration[2]: Missing several S3-related properties. >>>>> - Iceberg REST Catalog[3]: Does not cover all storage services. >>>>> >>>>> To gather this information, we must refer to the S3FileIO Java code[4]. >>>>> >>>>> I propose adding a separate section for agreeing upon these properties. >>>>> We could create a specification that outlines all IO properties with >>>>> indications of whether they are required or optional, along with their >>>>> expected behaviors. This would help ensure consistency across different >>>>> implementations without any conflicts. >>>>> >>>>> [1]: https://iceberg.apache.org/docs/latest/aws/ >>>>> [2]: https://py.iceberg.apache.org/configuration/#s3 >>>>> [3]: >>>>> https://github.com/apache/iceberg/blob/eee81c59199a54e749ea58dae070eb066d9a5f9e/open-api/rest-catalog-open-api.yaml#L2737 >>>>> [4]: >>>>> https://github.com/apache/iceberg/blob/2b21020aedb63c26295005d150c05f0a5a5f0eb2/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L46 >>>>> >>>>> Xuanwo >>>>> >>>>> https://xuanwo.io/