+1 on standardizing, and possibly extending this to include catalog

On the PyIceberg side, a recent development is the ability to separate S3
FileIO configurations from the Glue Catalog configurations, with an
optional configuration to use the same for both if specified. See Unified
AWS Credentials
<https://py.iceberg.apache.org/configuration/#unified-aws-credentials> and
Github Issue #892 <https://github.com/apache/iceberg-python/issues/892>

So for AWS credentials, there are currently 3 different properties for
* `s3.access-key-id` (S3 FileIO specific)
* `glue.access-key-id` (Glue Catalog specific)
* `client.access-key-id` (Unified)

Kevin Liu

On Wed, Jul 31, 2024 at 10:05 AM Xuanwo <xua...@apache.org> wrote:

> Thanks you all. I'm going to prepare a proposal PR for this.
> On Fri, Jul 12, 2024, at 10:06, Honah J. wrote:
> Hello everyone,
> Thank you all for the valuable insights. I am also +1 on having
> standardized names for File IO properties. Creating a dedicated section to
> summarize property names in the Java implementation is a good starting
> point. Since pyiceberg, icebergRust, and IcebergGolang will support only
> subsets of these properties for some time (with the rest to be added in
> future development), the existing Java implementation will serve as a
> useful reference. Additionally, we could establish general naming
> conventions in the doc, such as using the “s3.” prefix for S3 properties
> and hyphens to connect words.
> Best regards,
> Honah
> On Wed, Jul 10, 2024 at 10:47 AM <ndrl...@proton.me.invalid> wrote:
> I don't know what the recommended way to start standardizing is. We can
> start a proposal for each context or have one proposal to handle all.
> Suggested contexts to start with:
>    - Rest Catalog
>    - FileIO
> I believe that most of the other cases are supported by the configuration
> topic in the Table section[1], but this is about the Java implementation.
> Maybe we need to create a page in the project section[2] to handle the
> properties in the table section and the Rest and FileIO contexts.
> [1]: https://iceberg.apache.org/docs/latest/configuration/
> [2]: https://iceberg.apache.org/community/
> On Wednesday, July 10th, 2024 at 11:58 AM, Russell Spitzer <
> russell.spit...@gmail.com> wrote:
> Sounds reasonable to me
> On Wed, Jul 10, 2024 at 9:28 AM Renjie Liu <liurenjie2...@gmail.com>
> wrote:
> Hi:
> +1 for standardizing iceberg properties. This will help to align different
> language implementations.
> On Wed, Jul 10, 2024 at 9:44 PM <ndrl...@proton.me.invalid> wrote:
> Hello Everyone,
> I was considering discussing the standardization of Iceberg properties,
> and I believe this thread could be a great place to start.
> I'm writing an Iceberg client in Elixir and using the Java, Python, and
> Rust implementations as references. However, I've had some difficulty
> determining which configurations we must support and what each client has
> implemented. Therefore, I agree with Xuanwo about having a separate
> section as a single source of truth (SSOT).
> Additionally, I think it would be beneficial for each client to show what
> it does not support. This would make it easier for users to know that a
> particular client might not work with some configuration that their catalog
> could define as default or override. It would also help us, as
> contributors, to know which configurations we need to implement support for.
> For example, the "s3.signer"[1] and "s3.proxy-uri"[2] configurations only
> exist in the Python implementation. I believe it is not clear that these
> configurations are exclusive to Python, and they might be configurations
> that the catalog could override or define as defaults in the get info
> endpoint. Without an SSOT, this could be harder to track.
> Another example is the "rest.authorization-url" in Python and Rust versus
> "oauth2_server_uri" in Java. Although this is a bit out of scope for this
> thread, I will open another discussion topic about broader standardization
> of available properties.
> [1]:
> https://github.com/search?q=repo%3Aapache%2Ficeberg-python+s3.signer&type=code
> [2]:
> https://github.com/search?q=repo%3Aapache%2Ficeberg-python%20S3_PROXY_URI&type=code
> On Wednesday, July 10th, 2024 at 7:51 AM, Fokko Driesprong <
> fo...@apache.org> wrote:
> Hey Xuanwo,
> Thanks for raising this.
>    - The S3 properties are largely covered under the S3FileIO page:
>    https://iceberg.apache.org/docs/nightly/aws/#s3-fileio. But it looks
>    like some important ones are missing indeed. I've raised an issue here
>    <https://github.com/apache/iceberg/issues/10674>.
>    - For PyIceberg it only supports like a subset of the functionality,
>    and therefore also many properties are missing there.
>    - For the REST Catalog, there is an open PR to add
>    <https://github.com/apache/iceberg/pull/10576> the options for GCS and
>    ADLS. It would be great to get some more eyes on there.
> That being said, I do think there is value in formalizing them. When
> adding configuration options to PyIceberg, I'll make sure to check out the
> Java implementation to ensure that we use the same property.
> Kind regards,
> Fokko
> Op wo 10 jul 2024 om 09:22 schreef Xuanwo <xua...@apache.org>:
> Hello everyone
> I've been working on the iceberg-rust FileIO recently and have found it
> challenging to identify all the necessary IO properties we need to support.
> For instance, consider AWS S3. There are no documents specifying which
> properties are supported by S3.
> The only relevant documentation I could find includes:
> - Iceberg AWS Integrations[1]: Does not define `s3.access-key-id` or
> `s3.secret-access-key`.
> - Pyiceberg configuration[2]: Missing several S3-related properties.
> - Iceberg REST Catalog[3]: Does not cover all storage services.
> To gather this information, we must refer to the S3FileIO Java code[4].
> I propose adding a separate section for agreeing upon these properties. We
> could create a specification that outlines all IO properties with
> indications of whether they are required or optional, along with their
> expected behaviors. This would help ensure consistency across different
> implementations without any conflicts.
> [1]: https://iceberg.apache.org/docs/latest/aws/
> [2]: https://py.iceberg.apache.org/configuration/#s3
> [3]:
> https://github.com/apache/iceberg/blob/eee81c59199a54e749ea58dae070eb066d9a5f9e/open-api/rest-catalog-open-api.yaml#L2737
> [4]:
> https://github.com/apache/iceberg/blob/2b21020aedb63c26295005d150c05f0a5a5f0eb2/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L46
> Xuanwo
> https://xuanwo.io/
> Xuanwo
> https://xuanwo.io/

Reply via email to