Re: [DISCUSS] Spark 3.3 support?

2025-02-18 Thread Manu Zhang
Since 1.8.0 has been released, I submitted https://github.com/apache/iceberg/pull/12279 to remove Spark 3.3 support on the main branch. Please help review. Thanks, Manu On Wed, Nov 20, 2024 at 6:32 AM Anton Okolnychyi wrote: > Here we go then: > > https://github.com/apache/iceberg/pull/11596 >

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-18 Thread Steven Wu
agree with Kevin to move "Blogs" and "Talks" into a new "Community" group, along with the current "Community" page. but it can be tackled in a separate PR. On Tue, Feb 18, 2025 at 8:55 AM Kevin Liu wrote: > Thanks for starting this discussion Manu! > > Big +1 to re-organizing the tabs on the mai

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread Daniel Weeks
I agree that we shouldn't retroactively change the spec to align with the implementation. We're trying to strike a balance here and I think the notes are an effective way to convey that while omitting or producing null with v1/v2 is technically to spec, it's going to be incompatible with most impl

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread Russell Spitzer
`For compatibility with existing libraries, we should maintain that `-1` is equivalent to no snapshot and it should be written for v1/v2.` The only issue I have with this is are we saying that for v1 and v2 we are changing the spec to say that current-snapshot-id is required? Or are we adding an i

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread Daniel Weeks
I would agree the best path forward is to note the current behavior for v1/v2 since that's well established and address the behavior in v3. For compatibility with existing libraries, we should maintain that `-1` is equivalent to no snapshot and it should be written for v1/v2. With V3 we should su

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread rdb...@gmail.com
+1 to reverting PT 11560 in main and 1.8.1. That avoids unnecessary incompatibility with older readers. I also agree that we should update the spec to say what Russell suggests: > that -1 has meant "no current snapshot" in the past and is equivalent to missing/null. That's a correct description o

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread Bryan Keller
I had a couple of small fixes that would be great to get into 1.8.1: https://github.com/apache/iceberg/pull/12305 https://github.com/apache/iceberg/pull/12224 I added those to the GitHub 1.8.1 milestone in case this is possible. Thanks, Bryan > On Feb 18, 2025, at 8:53 AM, Robert Stupp wrote: >

Re: Remove deprecated table properties

2025-02-18 Thread Steven Wu
> When you migrate a table, I don't think everyone cleans up the old properties That is exactly the scenario we are trying to guard against. Maybe old stables are still using those deprecated properties. Hence we want to guard against silent behavior change. > and then jobs start failing. Warnin

Re: Remove deprecated table properties

2025-02-18 Thread Fokko Driesprong
I'm hesitant to fail the job. When you migrate a table, I don't think everyone cleans up the old properties, and then jobs start failing. Another approach is to warn until 2.0, and then remove them: https://github.com/apache/iceberg/pull/12315 LMKWYT Kind regards, Fokko Op di 18 feb 2025 om 16:

Re: pre-proposal: schema_id on DataFile

2025-02-18 Thread Devin Smith
I'm coming at this from a mental model where a producer(s) to a given Table is tightly-coupled to a specific Schema. That is, even as the Table's Schema is evolved, the producer's logic will be unchanged - they produce parquet files that have the same parquet metadata and columns. (This model may p

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-18 Thread Kevin Liu
Thanks for starting this discussion Manu! Big +1 to re-organizing the tabs on the main site. I agree with Russell and think we should make an even bigger change. Happy to continue the conversation here or we can start a new thread. For one, I think we can move "Blogs", "Talks", and perhaps even "

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread Robert Stupp
On 18.02.25 17:10, Fokko Driesprong wrote: Reality is that Iceberg did write '-1' into current-snapshot-id (and other "non-exist" marker values for schema/spec/sort) instead of omitting the field. Yes, but this is wrong. The spec dictates

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread Fokko Driesprong
> > Reality is that Iceberg did write '-1' into current-snapshot-id (and other > "non-exist" marker values for schema/spec/sort) instead of omitting the > field. Yes, but this is wrong. The spec dictates under current-snapshot-id: long ID

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread Russell Spitzer
The only thing I think I agree with is defining that -1 has meant "no current snapshot" in the past and is equivalent to a missing/nuil (if we have to specify that) . I don't think there is any reason to change the behavior of writing null / missing unless that's really a point of confusion for fo

Re: Remove deprecated table properties

2025-02-18 Thread Robert Stupp
Also, as an idea, REST catalog services could return an error if those deprecated properties are being set. Thoughts? On 18.02.25 16:21, Robert Stupp wrote: Agree with both Steve's. Personally, I'm okay with removing those properties - but using the proposed phased approach. On 17.02.25 23:

Re: Remove deprecated table properties

2025-02-18 Thread Robert Stupp
Agree with both Steve's. Personally, I'm okay with removing those properties - but using the proposed phased approach. On 17.02.25 23:25, Steven Wu wrote: I have some concerns on the issue of silent behavior change that Steve Zhang raised in the PR comment. E.g., users may set the location base

Re: [DISCUSS] PyIceberg 0.9.0 release

2025-02-18 Thread Fokko Driesprong
Hey Kevin, Thanks for raising this. That sounds like a great idea to me, and thanks Drew for being the release manager for 0.9.0. Kind regards, Fokko Op ma 17 feb 2025 om 23:41 schreef Kevin Liu : > Thanks for volunteering! I'm happy to assist in any way I can. Let's > coordinate on Slack :) >

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread Robert Stupp
Correcting myself: schema/spec/sort seem to be always present - please ignore that part in my previous email. The valid values for those fields however should be defined. On 18.02.25 14:29, Robert Stupp wrote: Reality is that Iceberg did write '-1' into current-snapshot-id (and other "non-ex

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread Robert Stupp
Reality is that Iceberg did write '-1' into current-snapshot-id (and other "non-exist" marker values for schema/spec/sort) instead of omitting the field. Side note: the table-spec says that these fields are optional, but nothing about whether it is nullable. The spec should at least be amend

Re: FileRewrite API refactor

2025-02-18 Thread Péter Váry
Thanks for all the feedback! Create the PR for the API discussion: https://github.com/apache/iceberg/pull/12306 Thanks, Peter Steven Wu ezt írta (időpont: 2025. febr. 13., Cs, 22:28): > looking at "RewriteDataFilesSparkAction" from your PR #11513, I am fine > that the RewriteExecutionContext i

Re: [DISCUSS] FileFormat API proposal

2025-02-18 Thread Péter Váry
Accidentally force-pushed :( The new links are here: - https://github.com/apache/iceberg/pull/12298/commits/583cccb6e036323ee74a74bf3b06a40bf16f8982 - The API Interface classes - https://github.com/apache/iceberg/pull/12298/commits/217e68caa61667032da3d710401078bb50b0a99f - Mov

Re: [DISCUSS] FileFormat API proposal

2025-02-18 Thread Péter Váry
Hi Renjie, Based on your feedback, I have created a PR which separates out the different logical parts to different commits: https://github.com/apache/iceberg/pull/12298 The following parts are separated: - https://github.com/apache/iceberg/pull/12298/commits/1ad230f67df014b424c3547603831f

Re: [DISCUSS] Cleanup unreferenced statistics files through DropTableData

2025-02-18 Thread Ajantha Bhat
I believe the reason stats files allow replacing statistics with the same snapshot ID is to enable the recomputation of optional stats for the same snapshot. This process does leave the old stats files orphaned, but they will be properly cleaned up by the `remove_orphan_files` action or procedure.