Re: [DISCUSS] Spec clarifications on reading/writing Identity partitioned columns

2024-07-31 Thread Micah Kornfield
I might have missed it but in skimming I couldn't find a section in the spec about writing all columns to the data file. I posted https://github.com/apache/iceberg/pull/10835 which says implementations should write the column for redundancy but leaves the option open for others. Thanks, Micah O

Re: [DISCUSS] adoption of format version 3

2024-07-31 Thread Micah Kornfield
It sounds like most of the opinions so far are waiting for the scope of work to finish before finalizing the specification. An alternative view: Would it make sense to start releasing the table specification on a regular cadence (e.g. quarterly, every 6 months or yearly)? I think the problem with

Re: [VOTE] Clarify "File System Tables" in the table spec

2024-07-31 Thread Micah Kornfield
+1 (non-binding) On Wed, Jul 31, 2024 at 5:12 PM Ryan Blue wrote: > As promised in the discussion thread, I've opened a PR to clarify the > "File System Tables" section and mark it deprecated since there appears to > be consensus for at least warning people that this is unsafe in most cases > an

Re: [DISCUSS] Use iceberg-rust as pyiceberg file io

2024-07-31 Thread Joe Stein
Kafka did this with librdkafka and was wildly successful. The underlying bindings being in rust are great with a layer for access in Python +1 ~ Joe Stein On Wed, Jul 31, 2024 at 10:29 PM Xuanwo wrote: > Hello everyone > > I start this thread to discuss the idea about using iceberg-rust as >

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-31 Thread Manu Zhang
A reminder. GitHub Discussion has been enabled on iceberg-rust and there are already interesting ideas open for discussion . Please weigh in. On Mon, Jul 15, 2024 at 9:39 PM Renjie Liu wrote: > Hi: > > >> But one minor concern

Re: [EXTERNAL] Re: Case-insensitive schemas

2024-07-31 Thread Lessard, Steve
Ryan, I also replied in the PR. In my testing I do not see any runtime failure when trying to use a case-sensitive schema in a case-insensitive way. -Steve Lessard, Teradata From: Ryan Blue Date: Wednesday, July 31, 2024 at 5:02 PM To: dev@iceberg.apache.org Subject: [EXTERNAL] Re: Case-insen

Re: Meeting time for catalog community sync

2024-07-31 Thread Dmitri Bourlatchkov
The proposed schedule (Wednesdays) looks good to me. Cheers, Dmitri. On Sun, Jul 28, 2024 at 10:07 PM Jack Ye wrote: > Hi everyone, > > Looks like we have some general preference to do it on Wednesday when > there is no community sync, and also rotate the time time to accommodate > participatio

[VOTE] Clarify "File System Tables" in the table spec

2024-07-31 Thread Ryan Blue
As promised in the discussion thread, I've opened a PR to clarify the "File System Tables" section and mark it deprecated since there appears to be consensus for at least warning people that this is unsafe in most cases and discouraged. The PR is here: https://github.com/apache/iceberg/pull/10833

Re: Case-insensitive schemas

2024-07-31 Thread Ryan Blue
Steve, I replied on the PR, but the gist is that you're right. Using a schema that has fields that would be considered identical in a case insensitive context will fail at runtime. That's the right behavior because Iceberg can't control the case sensitivity of applications or engines. Ryan On We

Re: [DISCUSS] Describing REST Server capabilities

2024-07-31 Thread Dmitri Bourlatchkov
> endpoint version should bump (e.g. GET /v1/namespaces to GET /v2/namespaces) when there is a significant backwards incompatible change. That makes sense to me too. > (2) version the entire catalog spec. A released catalog spec version will contain a list of configs it supports, and also a set o

Case-insensitive schemas

2024-07-31 Thread Lessard, Steve
Is there some kind of configuration or metadata flag that hints whether a Schema is intended to be used case-sensitive or case-insensitive? In my PR for adding case-insensitivity support to PartitionSpec Steven Wu asked: caseI

[DISCUSS] PyIceberg: Remove optional support for instance-level identifier in Catalog and Table APIs

2024-07-31 Thread Sung Yun
Today in PyIceberg, we have support for identifier parsing in public APIs belonging to two different classes: - Catalog class: load_table, purge_table, drop_table - Table class: scan These APIs currently have optional support for the identifier that the instance itself belongs to. For ex

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-31 Thread Ryan Blue
I think “is it worth it” is the right question. We *could* use putIfAbsent in some back-ends, exclusive file creation in others, atomic renames in some more. We *could* come up with an API that abstracts those things. In the end we would end up with a situation in which: - These operations need

Re: [VOTE] Drop Java 8 support in Iceberg 1.7.0

2024-07-31 Thread Daniel Weeks
+1 (binding) On Sun, Jul 28, 2024 at 11:35 PM Huang-Hsiang Cheng wrote: > +1 (non-binding) > > Thanks, > Huang-Hsiang > > On Jul 27, 2024, at 12:42 AM, Steve Zhang > wrote: > > +1 (non-binding) > > Thanks, > Steve Zhang > > > > On Jul 26, 2024, at 9:15 AM, Amogh Jahagirdar <2am...@gmail.com> wr

Bayarea Apache Iceberg meeting - Sept. 2024

2024-07-31 Thread Aihua Xu
Hi community, We're thrilled to announce an upcoming Apache Iceberg Community Meetup in the Bay Area! This is a fantastic opportunity to connect with fellow enthusiasts, share insights, and dive into the latest developments in the Apache Iceberg ecosystem. 📅 Date: September 5th, 2024 ⏰ Time: 5:0

Re: [DISCUSS] adoption of format version 3

2024-07-31 Thread Russell Spitzer
I think this all sounds good, the real question is whether or not we have someone to actively work on the proposals. I think for things like Default Values and Geo Types we have folks actively working on them so it's not a big deal. On Wed, Jul 31, 2024 at 2:09 PM Szehon Ho wrote: > Sorry I miss

Re: [DISCUSS] adoption of format version 3

2024-07-31 Thread Szehon Ho
Sorry I missed the sync this morning (sick), I'd like to push for geo too. I think on this front as per the last sync, Ryan recommended to wait for Parquet support to land, to avoid having two versions on Iceberg side (Iceberg-native vs Parquet-native). Parquet support is being actively worked on

Re: [ANNOUNCE] Apache PyIceberg release 0.7.0

2024-07-31 Thread Yufei Gu
Awesome. Thanks Sung and every contributor! Yufei On Wed, Jul 31, 2024 at 12:44 AM Honah J. wrote: > Thanks Sung for running the release and thanks everyone for contributing! > This is a great milestone for PyIceberg! > > Best regards, > Honah > > On Tue, Jul 30, 2024 at 10:47 PM Renjie Liu >

Re: [DISCUSS] adoption of format version 3

2024-07-31 Thread Walaa Eldin Moustafa
Another feature that was planned for V3 is support for default values. Spec doc update was already merged a while ago [1]. Implementation is ongoing in this PR [2]. [1] https://iceberg.apache.org/spec/#default-values [2] https://github.com/apache/iceberg/pull/9502 Thanks, Walaa. On Wed, Jul 31,

Re: [DISCUSS] adoption of format version 3

2024-07-31 Thread Russell Spitzer
Thanks for bringing this up, I would say that from my perspective I have time to really push through hopefully two things Variant Type and Row Lineage (which I will have a proposal for on the mailing list next week) I'm using the Project to try to track logistics and minutia required for the new

[DISCUSS] adoption of format version 3

2024-07-31 Thread Jacob Marble
Good morning, To continue the community sync today when format version 3 was discussed. Questions answered by consensus: - Format version releases should _not_ be tied to Iceberg version releases. - Several planned features will require format version releases; the process shouldn't be onerous.

Re: [DISCUSS] Formalized File IO Properties

2024-07-31 Thread Xuanwo
Thanks you all. I'm going to prepare a proposal PR for this. On Fri, Jul 12, 2024, at 10:06, Honah J. wrote: > Hello everyone, > > Thank you all for the valuable insights. I am also +1 on having standardized > names for File IO properties. Creating a dedicated section to summarize > property n

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-31 Thread Jack Ye
I guess the problem with an OVERWRITE flag for rename is that, with this flag, file mutual exclusion seems to be more difficult to enforce, and the difference among file systems becomes really nuanced. If 2 writers both have OVERWRITE flag on, then it seems like the file system should just let one

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-31 Thread Jack Ye
Oh I remember now, I think it was because HDFS semantics of rename fails when a file already exists. However, I think in the latest HDFS with FileContext API, an OVERWRITE flag can be passed to the context to make the rename succeed [1]: > If OVERWRITE option is not passed as an argument, rename f

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-31 Thread Russell Spitzer
My guess would be to avoid complications with multiple committers attempting to swap at the same time. On Wed, Jul 31, 2024 at 9:50 AM Jack Ye wrote: > I see, thank you Fokko, this is a very helpful context. > > Looking at the discussion in the PR and discussions in it, it seems like > the versi

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-31 Thread Jack Ye
I see, thank you Fokko, this is a very helpful context. Looking at the discussion in the PR and discussions in it, it seems like the version hint file is the key problem here. The file system table spec [1] is technically correct and only uses a single rename operation to perform the atomic commit

Flink Table Maintenance - Tag based locking

2024-07-31 Thread Péter Váry
Hi Team, During the discussion around the Flink Table Maintenance [1], [2], I have highlighted that one of the main decision points is the way we prevent concurrent Maintenance Tasks from happening concurrently. At that time we did not find better solution than providing an interface for locking,

Re: [DISCUSS] Describing REST Server capabilities

2024-07-31 Thread Jack Ye
One thing to clarify, regarding per-endpoint versioning, my understanding is that endpoint version should bump (e.g. GET /v1/namespaces to GET /v2/namespaces) when there is a significant backwards incompatible change. -Jack On Tue, Jul 30, 2024 at 7:56 PM Jack Ye wrote: > > are you talking abou

Re: [ANNOUNCE] Apache PyIceberg release 0.7.0

2024-07-31 Thread Honah J.
Thanks Sung for running the release and thanks everyone for contributing! This is a great milestone for PyIceberg! Best regards, Honah On Tue, Jul 30, 2024 at 10:47 PM Renjie Liu wrote: > Thanks Sung for driving the release, and all contributors! > > On Wed, Jul 31, 2024 at 1:35 PM Fokko Driesp