Re: [DISCUSS] Apache Iceberg 1.7.0 Release Cutoff

2024-10-21 Thread Russell Spitzer
volunteer as release manager for the 1.7.0 release. >> >> Regards >> JB >> >> On Fri, Oct 4, 2024 at 5:55 AM Russell Spitzer >> wrote: >> > >> > Hi y'all! >> > >> > As discussed at the community sync on Wednesday, October

Re: Spec changes for deletion vectors

2024-10-21 Thread Russell Spitzer
>>>> delete vectors. That means we either go with the current proposal (a) >>>>>>> that >>>>>>> preserves the ability for existing Delta clients to read, or we go with >>>>>>> a >>>>>>> different proposal tha

Re: Spec changes for deletion vectors

2024-10-17 Thread Russell Spitzer
all consistent, even if unused. >>> >>> >>> >>> Separately, I think it might pay to take a step back and restate >>> desired requirements of this design (in no particular order): >>> >>> 1. The best possible implementation of DVs (limit

[DISCUSS] Apache Iceberg 1.7.0 Release Cutoff

2024-10-03 Thread Russell Spitzer
Hi y'all! As discussed at the community sync on Wednesday, October has begun and we are beginning to flesh out the 1.7.0 release as well as the V3 Table Spec. Since we are a little worried that we won't have all of the Spec items we want by the end of October, we discussed that we may want to jus

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-28 Thread Russell Spitzer
generate > the operation field) then the table would be wedged. We can hopefully fix a > lot of this in catalog service implementations (reject bad commits) but I'm > worried about the idea of just not being able to read or fix tables. > > On Mon, Oct 28, 2024 at 9:35 AM Russell

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-29 Thread Russell Spitzer
t; So we may need to look further at these, depending on the direction we >>> want to go. >>> >>> I was originally going to propose having a notion of an "unknown" >>> operation on read when the field is null but the operations are defined in >>> the s

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-11-04 Thread Russell Spitzer
cases, I’m not sure if it >>>> also affects Iceberg cases. >>>> >>>> https://github.com/apache/parquet-java/issues/3040 >>>> >>>> Thanks, >>>> Cheng Pan >>>> >>>> >>>> >>>> On Oct

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-11-04 Thread Russell Spitzer
s Iceberg cases. > > https://github.com/apache/parquet-java/issues/3040 > > Thanks, > Cheng Pan > > > > On Oct 31, 2024, at 06:06, Russell Spitzer > wrote: > > Hey Y'all, > > I propose that we release the following RC as the offic

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-11-04 Thread Russell Spitzer
https://github.com/apache/iceberg/pull/11462 <- For Main, I created a branch for 1.7.x off of RC0 which I will backport the fix to after we merge to main. On Mon, Nov 4, 2024 at 10:26 AM Russell Spitzer wrote: > Sounds good to me. I'll work on that this morning > > On Mon,

[VOTE] Release Apache Iceberg 1.7.0 RC1

2024-11-04 Thread Russell Spitzer
Hi y'all! I propose that we release the following RC as the official Apache Iceberg 1.7.0 release. The commit ID is 5f7c992ca673bf41df1d37543b24d646c24568a9 * This corresponds to the tag: apache-iceberg-1.7.0-rc1 * https://github.com/apache/iceberg/commits/apache-iceberg-1.7.0-rc1 * https://githu

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-11-04 Thread Russell Spitzer
New Release Candidate was made and a new vote started! Thanks everyone, let's do this again On Mon, Nov 4, 2024 at 11:06 AM Russell Spitzer wrote: > https://github.com/apache/iceberg/pull/11462 <- For Main, I created a > branch for 1.7.x off of RC0 which I will backport the f

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-11-04 Thread Russell Spitzer
Sounds good to me. I'll work on that this morning On Mon, Nov 4, 2024 at 10:16 AM Amogh Jahagirdar <2am...@gmail.com> wrote: > Thanks for identifying this issue and bringing it up here Cheng, that's > really appreciated! @Russell Spitzer I do > think it is an issue,

Re: [VOTE] Deletion Vectors in V3

2024-10-30 Thread Russell Spitzer
+1 - My comments are clear on the other thread for future proposals On Wed, Oct 30, 2024 at 11:18 AM Szehon Ho wrote: > -0 > > Great work and exciting functionality, but transferring concerns from the > other thread about the decision. > > Thanks > Szehon > > On Wed, Oct 30, 2024 at 9:12 AM Ste

Re: [DISCUSS] Change Behavior for SchemaUpdate.UnionByName

2024-10-31 Thread Russell Spitzer
I'm in favor of 1 since previously these inputs would have thrown an exception that wasn't really that helpful. @Test public void testDowncastoLongToInt() { Schema currentSchema = new Schema(required(1, "aCol", LongType.get())); Schema newSchema = new Schema(required(1, "aCol", IntegerType.get

[DISCUSS] - Deprecate Equality Deletes

2024-10-30 Thread Russell Spitzer
Background: 1) Position Deletes Writers determine what rows are deleted and mark them in a 1 for 1 representation. With delete vectors this means every data file has at most 1 delete vector that it is read in conjunction with to excise deleted rows. Reader overhead is more or less constant and i

[VOTE] Release Apache Iceberg 1.7.0 RC0

2024-10-30 Thread Russell Spitzer
Hey Y'all, I propose that we release the following RC as the official Apache Iceberg 1.7.0 release. The commit ID is 91e04c9c88b63dc01d6c8e69dfdc8cd27ee811cc * This corresponds to the tag: apache-iceberg-1.7.0-rc0 * https://github.com/apache/iceberg/commits/apache-iceberg-1.7.0-rc0 * https://gith

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-10-30 Thread Russell Spitzer
eally wish this doc change[1] can be included into >> 1.7.0. >> >> [1] https://github.com/apache/iceberg/pull/11417 >> >> >> On Thu, Oct 31, 2024 at 7:00 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> Conve

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-10-30 Thread Russell Spitzer
Sorry forgot binary, doing that now On Wed, Oct 30, 2024 at 5:06 PM Russell Spitzer wrote: > Hey Y'all, > > I propose that we release the following RC as the official Apache Iceberg > 1.7.0 release. > > The commit ID is 91e04c9c88b63dc01d6c8e69dfdc8cd27ee811cc > * T

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-10-30 Thread Russell Spitzer
Convenience binary artifacts are staged on Nexus. The Maven repository URL is: https://repository.apache.org/content/repositories/orgapacheiceberg-1175/ On Wed, Oct 30, 2024 at 5:31 PM Russell Spitzer wrote: > Sorry forgot binary, doing that now > > On Wed, Oct 30, 2024 at 5:06 P

Re: [DISCUSS] Apache Iceberg 1.7.0 Release Cutoff

2024-10-23 Thread Russell Spitzer
). This is a serious issue, > and I'd like to see the fix go into 1.7.0. > >> Eduard has already approved the PR, but he asked if you or Amogh would > take a look as well. > >> Thanks, > >> Wing Yew > >> > >> > >> On Mon, Oct 21

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-28 Thread Russell Spitzer
This is one of the reasons I'm opposed to metadata we don't use/need. We end up forking the spec and then we have some odd behaviors, a metadata which is illegal in one implementation (PyIceberg) will be legal in another (Iceberg Java) (and with the suggestion above be silently modified to bring it

[REVIEW] 1.7.0 Remaining Milestone PR's

2024-10-28 Thread Russell Spitzer
We currently have the following PR's still in the 1.7.0 Milestone. I want to remove any that aren't ready to go by the end of the day so if you have strong feelings about some of these please coordinate with committers to review and get this code in. *Please do not tag any additional PR's as 1.7.0

Re: Changing default delete file granularity for Spark writes from partition to file scoped

2024-11-11 Thread Russell Spitzer
I don't think this is a bad idea from a theoretical perspective. Do we have any actual numbers to back up the change? I would think for most folks we would recommend just going to V3 rather than changing granularity for their new tables. It would just affect new tables though so I'm not opposed to

Re: [ANNOUNCE] Apache Iceberg release 1.7.0

2024-11-08 Thread Russell Spitzer
rom there: > Currently it still says "The latest version of Iceberg is 1.6.1 > <https://github.com/apache/iceberg/releases/tag/apache-iceberg-1.6.1>." > > > On Fri, Nov 8, 2024 at 7:33 AM Russell Spitzer > wrote: > >> I'm pleased to announce the release

Re: [DISCUSS] Duplicate KEYS files

2024-11-11 Thread Russell Spitzer
Sounds good to me, although I guess it's really just up to the Rust and GO maintainers to converge On Mon, Nov 11, 2024 at 9:13 AM Fokko Driesprong wrote: > Hi everyone, > > While looking at the release steps for iceberg-go > , I notice

Re: [DISCUSS] Duplicate KEYS files

2024-11-12 Thread Russell Spitzer
/downloads.apache.org/iceberg/KEYS | md5sum >>> 905987ebcc39a70ebcbce89f1939fe26 - >>> ➜ ~ curl -s https://dist.apache.org/repos/dist/release/iceberg/KEYS | >>> md5sum >>> 905987ebcc39a70ebcbce89f1939fe26 - >>> ``` >>> >>> Best, >

Re: [DISCUSS] Duplicate KEYS files

2024-11-12 Thread Russell Spitzer
I see it in downloads? ➜ icebergsvnrelease git:(master) ✗ curl https://downloads.apache.org/iceberg/KEYS | grep Topol uid [ultimate] Matt Topol sig 34B86A1E5E59C8B81 2024-10-10 Matt Topol uid [ultimate] Matthew Topol sig 34B86A1E5E59C8B81 2023-06-12 Matt T

Re: [DISCUSS] - Deprecate Equality Deletes

2024-10-31 Thread Russell Spitzer
ee to deprecate equality deletes, but -1 to commit any target >> for deletion before having a clear path for streaming platforms >> (Flink, Beam, ...) >> 2. In the meantime (during the deprecation period), I propose to >> explore possible improvements for st

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-10-31 Thread Russell Spitzer
found in the source distribution > - ASF header is present in all expected file > - Build is OK > - Tested using Spark SQL with JDBC Catalog and Apache Polaris without > problem > > Thanks ! > > Regards > JB > > On Wed, Oct 30, 2024 at 11:06 PM Russell Spitzer >

[ANNOUNCE] Apache Iceberg release 1.7.0

2024-11-08 Thread Russell Spitzer
I'm pleased to announce the release of Apache Iceberg 1.7.0! Apache Iceberg is an open table format for huge analytic datasets. Iceberg delivers high query performance for tables with tens of petabytes of data, along with atomic commits, concurrent writes, and SQL-compatible table evolution. This

Re: [DISCUSS] Column to Column filtering

2024-09-18 Thread Russell Spitzer
I have similar concerns to Ryan although I could see that if we were writing smaller and better correlated files that this could be a big help. Specifically with variant use cases this may be very useful. I would love to hear more about the use cases and rationale for adding this. Do you have any s

Re: [DISCUSS] Spark 3.5.3 breaks Iceberg SparkSessionCatalog

2024-09-25 Thread Russell Spitzer
uld just work around this by disabling staged create and replace if the delegate is being used but that would be a break iceberg behavior. Outside of these aspects I was able to get everything else working as expected but I think both of these are probably blockers. On Wed, Sep 25, 2024 at 3:51 

Re: [DISCUSS] Spark 3.5.3 breaks Iceberg SparkSessionCatalog

2024-09-25 Thread Russell Spitzer
I think it should be minimally difficult to switch this around on the Iceberg side, we only have to move the initialize code out and duplicate it. Not a huge cost On Sun, Sep 22, 2024 at 11:39 PM Wenchen Fan wrote: > It's a buggy behavior that a custom v2 catalog (without extending > DelegatingC

V3 Spec Changes

2024-09-24 Thread Russell Spitzer
Hi y’all! I’m excited to say that we have a lot of great Iceberg V3 Spec PR’s out right now. V3 Looks like it’s going to be awesome! A reminder if you haven’t had a chance yet to check them out: Row Lineage Materialized Views

Re: [DISCUSS] Iceberg Summit 2025 ?

2024-09-27 Thread Russell Spitzer
I am really excited about the prospect of another Summit and also had a great time last year. I think we had a great selection of talks and I'm hoping we can do so again. I'm very much in support of having an in person element, I would love to have a chance to talk face to face with other members

Re: [VOTE] Table v3 spec: Add unknown and new type promotion

2024-09-27 Thread Russell Spitzer
+1 (binding) On Fri, Sep 27, 2024 at 4:37 PM rdb...@gmail.com wrote: > Hi everyone, > > I'd like to vote on PR #10955 > that has been open for a > while with the changes to add new type promotion cases. After discussion, > the PR has been scoped dow

Re: Clarification on DayTransform Result Type

2024-09-27 Thread Russell Spitzer
Good thing DateType is an Integer :) https://github.com/apache/iceberg/blob/113c6e7d62e53d3e3cb15b1712f3a1db473ca940/api/src/main/java/org/apache/iceberg/types/Type.java#L37 On Thu, Sep 26, 2024 at 8:38 PM Kevin Liu wrote: > Hey folks, > > While reviewing a PR to fix DayTransform in PyIceberg (#

Re: [Discuss] Geospatial Support

2024-09-30 Thread Russell Spitzer
All my concerns are addressed, I'm ready to vote. On Mon, Sep 30, 2024 at 1:21 PM Szehon Ho wrote: > Hi all, > > There have been several rounds of discussion on the PR: > https://github.com/apache/iceberg/pull/10981 and I think most of the main > points have been addressed. > > If anyone is inte

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Russell Spitzer
+1 (oink) If anyone really cares please chime in but seriously we should drop it On Thu, Oct 17, 2024 at 8:07 AM Jean-Baptiste Onofré wrote: > Hi folks, > > Even if it seems the project is pretty close to 0.18 release, Apache > Pig is a "dormant" project. > > I would like to discuss here if it

Re: [VOTE] Add Variant type to Iceberg Spec

2024-11-25 Thread Russell Spitzer
I'm +1, 1. I don't think we are going to change our decision on whether to include variants based on the timing of Parquet ratification 2. We aren't going to formally close V3 Spec yet, so if we do end up in a situation where we want to close the spec and Parquet has not removed the tag, we can re

Re: [VOTE] Deprecate and remove last-column-id

2024-11-19 Thread Russell Spitzer
+1 On Tue, Nov 19, 2024 at 4:11 AM Fokko Driesprong wrote: > Hey Manu, > > That's an excellent question. I took the following rationale: > >- For the code, the iceberg-core module, a minor release deprecation >cycle is required >

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Russell Spitzer
;>>>>>>>>> position lookup. This is fairly tricky to implement for >>>>>>>>>>>>>> streaming use cases >>>>>>>>>>>>>> without an external system. >>>>>>>>>>>>&g

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Russell Spitzer
gt; [1] >> https://github.com/delta-io/delta/blob/master/PROTOCOL.md#change-data-files >> [2] >> https://github.com/delta-io/delta/blob/master/PROTOCOL.md#add-cdc-file >> [3] >> https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit?tab=t

Re: [DISCUSS] Deprecate embedded manifests

2024-11-19 Thread Russell Spitzer
Deprecate On Tue, Nov 19, 2024 at 5:40 AM Jean-Baptiste Onofré wrote: > Hi Fokko > > As I don’t think it’s actually used, I think it’s fine to deprecate it. > > Regards > JB > > Le mar. 19 nov. 2024 à 12:32, Fokko Driesprong a > écrit : > >> Hi everyone, >> >> I would like to propose to depreca

Re: [DISCUSS] Add a implementation status page for iceberg

2024-11-08 Thread Russell Spitzer
Sounds like a great idea to me On Fri, Nov 8, 2024 at 7:58 AM Renjie Liu wrote: > Hi: > > As iceberg evolved to a multi-lang project, I would like to propose to > maintain a status page for iceberg. For more details, please refer to this > doc >

Re: [VOTE] Release Apache Iceberg 1.7.0 RC1

2024-11-08 Thread Russell Spitzer
Hi y'all! This vote has passed: +1 (Binding) : Russell Spitzer Yufei Gu Amogh Jahagirdar Jack Ye Daniel Weeks Eduard Tudenhöfner Fokko Driesprong Péter Váry +1 (Non-Binding): Jean-Baptiste Onofré Prashant Singh Kevin Liu Christian Thiel Aihua Xu I'm going to release right now, keep

[RESULT][VOTE] Release Apache Iceberg 1.7.0 RC1 - PASS

2024-11-08 Thread Russell Spitzer
Hi y'all! This vote has passed: +1 (Binding) : Russell Spitzer Yufei Gu Amogh Jahagirdar Jack Ye Daniel Weeks Eduard Tudenhöfner Fokko Driesprong Péter Váry +1 (Non-Binding): Jean-Baptiste Onofré Prashant Singh Kevin Liu Christian Thiel Aihua Xu I'm going to release right now, keep

Re: [ANNOUNCE] Apache Iceberg release 1.7.1

2024-12-09 Thread Russell Spitzer
Thanks so much Bryan! Great work getting this out! On Mon, Dec 9, 2024 at 2:28 PM Bryan Keller wrote: > I'm pleased to announce the release of Apache Iceberg 1.7.1! > > Apache Iceberg is an open table format for huge analytic datasets. Iceberg > delivers high query performance for tables with te

Re: Welcome Huaxin Gao as a committer!

2025-02-06 Thread Russell Spitzer
Congratulations! On Thu, Feb 6, 2025 at 11:35 AM Péter Váry wrote: > Congratulations! > > Matt Topol ezt írta (időpont: 2025. febr. 6., > Cs, 10:40): > >> Congrats! Welcome! >> >> On Thu, Feb 6, 2025, 10:19 AM Raúl Cumplido wrote: >> >>> Congrats Huaxin! >>> >>> El jue, 6 feb 2025 a las 10:16,

Re: [VOTE] Simplify multi-arg table metadata

2025-02-10 Thread Russell Spitzer
+1 On Mon, Feb 10, 2025 at 2:50 AM Eduard Tudenhöfner wrote: > +1 > > On Mon, Feb 10, 2025 at 7:40 AM Péter Váry > wrote: > >> +1 >> >> On Mon, Feb 10, 2025, 03:44 Manu Zhang wrote: >> >>> +1 (non-binding) >>> >>> On Mon, Feb 10, 2025 at 10:25 AM roryqi wrote: >>> +1 xianjin 于

Re: [VOTE] Add Geometry and Geography types for V3

2025-02-06 Thread Russell Spitzer
+1 On Fri, Feb 7, 2025 at 12:57 AM Denny Lee wrote: > +1 (non-binding) - super exciting! > > On Thu, Feb 6, 2025 at 3:52 PM rdb...@gmail.com wrote: > >> +1 >> >> Awesome to see this ready to go! >> >> On Thu, Feb 6, 2025 at 12:01 PM Szehon Ho >> wrote: >> >>> Hi everyone >>> >>> We would like

Re: Table metadata swap not work for REST Catalog (#12134)

2025-02-10 Thread Russell Spitzer
I still would like a "register table" force" option On Mon, Feb 10, 2025 at 5:06 PM Steve Zhang wrote: > Thank you Dan for your detailed reply. Based on your explanation, do you > think it would be worthwhile to support non-linear or complete metadata > replacements in the REST implementation? I

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Russell Spitzer
+1 On Tue, Feb 11, 2025 at 9:15 AM Fokko Driesprong wrote: > +1 > > Op di 11 feb 2025 om 13:52 schreef Jean-Baptiste Onofré : > >> +1 (non binding) >> >> Regards >> JB >> >> On Tue, Feb 11, 2025 at 3:38 AM Gabor Kaszab >> wrote: >> > >> > Hi Iceberg Community, >> > >> > I'm working on removing

Re: FileRewrite API refactor

2025-02-01 Thread Russell Spitzer
te anyway, and focusing on data file rewriting would allow us to > remove some generics from the API. > > WDYT? > > Russell Spitzer ezt írta (időpont: 2025. jan. > 21., K, 17:11): > >> To bump this back up, I think this is a pretty important change to the >> core l

Re: guideline for interface change

2025-02-01 Thread Russell Spitzer
In the API module we tend to be very careful and follow a deprecation schedule. So we would deprecate the old method and make a new one with the different return type. This would then be removed in the next big release. On Sat, Feb 1, 2025 at 8:03 AM Péter Váry wrote: > Can we deprecate the old

Re: [DISCUSS] Clarify delete counts handling in partition stats

2025-02-01 Thread Russell Spitzer
Sounds reasonable, I think the intent was that N/A is different then 0 but that only makes sense for V1. For V2/V3 0 makes sense On Sat, Feb 1, 2025 at 3:15 AM Anton Okolnychyi wrote: > Hi all, > > I propose to clarify our delete counts handling in partition stats. We > have the following metric

Re: [VOTE] Update partition stats spec for V3

2025-02-01 Thread Russell Spitzer
+1 On Sat, Feb 1, 2025 at 3:01 AM Anton Okolnychyi wrote: > Hi all, > > I propose the following updates to our partition stats spec in V3: > > - Modify `position_delete_record_count` to include a sum of position > deletes across position delete files and DVs > - Keep `position_delete_file_count`

[DISCUSS] Spark Catalog - Drop vs Drop with Purge

2024-12-11 Thread Russell Spitzer
Hi Y'all! Today we had a little discussion on the Apache Iceberg Catalog Community Sync about DROP and DROP WITH PURGE. Currently the SparkCatalog implementation inside of the reference library has a unique method of DROP WITH PURGE vs other implementations. The pseudo code is essentially ``` us

Re: [Discuss] Document Snapshot Summary Optional Fields for Standardization

2024-12-11 Thread Russell Spitzer
I want to float this back up, I think this is a really good idea for cross engine support. I don't think we have to tie this to any specific Spec version since they are just recommendations so I think we can do this at any time On Wed, Nov 27, 2024 at 1:31 PM Szehon Ho wrote: > This makes sense

Re: [DISCUSS] Proposal to buffer manifest files before updating manifest-list

2024-11-22 Thread Russell Spitzer
I would much rather we switch to the "everything is a manifest approach. Instead of manifest lists we only ever have manifests. A Manifest can then link to data files or additional manifests. In the case of streaming then you only ever have to read and write a single manifest. If we couple this wit

Re: [VOTE] Release Apache Iceberg 1.7.1 RC1

2024-12-06 Thread Russell Spitzer
I forgot to send this, left it in drafts :D +1 - Ran test suite for Polaris on https://github.com/apache/polaris/pull/442 - Ran all Iceberg Tests - Checked Signatures and Sums - Confirmed Jar is signed by a loving father and Iceberg PMC member On Thu, Dec 5, 2024 at 7:31 PM Bryan Keller wrote:

Re: Very strange (AI generated) issues

2025-01-22 Thread Russell Spitzer
This is pretty disturbing and I hope that any users out there see that using automated tools to submit issues is just adding noise to the project which makes it very hard for real issues to be addressed. On Wed, Jan 22, 2025 at 6:58 AM Jarek Potiuk wrote: > - Iceberg dev to not flood them :) (i

Iceberg Community Meeting Notes - Jan 15 2025

2025-01-22 Thread Russell Spitzer
Hey Y'all! Here are the notes and recording for the Jan 15th meeting! Video - https://www.youtube.com/watch?v=9ZLWQSZvLIw Notes - https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit?tab=t.0 Notes : --- - Highlights - Java - Hive

Re: [DISCUSS, VOTE] OpenAPI Metadata Update for EnableRowLineage

2025-01-22 Thread Russell Spitzer
disable/enable (more forward > thinking as this is the first case quite like this). > > -Dan > > > > On Wed, Jan 22, 2025 at 3:55 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > >> +1 Thanks Russell >> >> On Wed, Jan 22, 2025 at 4:50 PM rdb...@gmail.com

Re: [DISCUSS] Support keeping at most N snapshots

2025-01-21 Thread Russell Spitzer
I do think this comes up a lot and is one of the more confusing things about the snapshot expiration. Definitely one of my most answered questions is: "When I set min-snapshots to 1, why do I not get only 1 snapshot." I agree adding another behavior may be even more confusing but I wouldn't be oppo

Re: [VOTE] Document Snapshot Summary Optional Fields as Subsection of Appendix F in Spec

2025-01-21 Thread Russell Spitzer
+1 On Tue, Jan 21, 2025 at 2:36 PM rdb...@gmail.com wrote: > +1 > > On Tue, Jan 21, 2025 at 12:20 PM Honah J. wrote: > >> Hi everyone, >> >> In the last VOTE >> thread >> on documenting snapshot summary optional fields, we decid

[DISCUSS, VOTE] OpenAPI Metadata Update for EnableRowLineage

2025-01-22 Thread Russell Spitzer
Hey Y'all Yet another Row Lineage Spec update. This adds a MetadataUpdate EnableRowLineage to the REST Spec. We briefly talked today about an alternative EnableFeature(Feature Name) API instead but in the absence of other features it doesn't seem like that's really a requirement now. I agreed tha

Re: [VOTE] REST API changes for freshness-aware table loading

2025-01-24 Thread Russell Spitzer
+1 'd on the PR earlier, but for the record +1 here as well :) On Fri, Jan 24, 2025 at 1:57 PM rdb...@gmail.com wrote: > +1 > > Thanks, Gabor! > > On Fri, Jan 24, 2025 at 9:25 AM Christian Thiel < > christian.t.b...@gmail.com> wrote: > >> +1 (non binding). Thanks Gabor! >> >> Daniel Weeks schri

[DISCUSS/VOTE] Add in ChangeLog Reserved Field IDs to Spec and Decrement Row Lineage Reserved IDs

2025-01-24 Thread Russell Spitzer
We added reserved fields into the Apache Iceberg repo to use with ChangeLog views but these were never added to the spec. When Row Lineage was added, those IDs inadvertently collided with the ID's already set. In this PR I add in the ChangeLog

Re: Very strange (AI generated) issues

2025-01-23 Thread Russell Spitzer
>> * I guess whoever has the tool realised their mistake and either stopped >>> it or removed some confusion >>> * I have my own suspicions (which I am exploring) - but I asked the user >>> to provide information about what tooling they were using (and the user was

Re: [DISCUSS, VOTE] OpenAPI Metadata Update for EnableRowLineage

2025-01-23 Thread Russell Spitzer
025 at 10:55 AM Fokko Driesprong >>> wrote: >>> >>>> +1 >>>> >>>> Thanks Russell >>>> >>>> Op do 23 jan 2025 om 18:47 schreef Aihua Xu : >>>> >>>>> + (non binding). >>>>> &

Re: [DISCUSS/VOTE] Add in ChangeLog Reserved Field IDs to Spec and Decrement Row Lineage Reserved IDs

2025-01-27 Thread Russell Spitzer
Thanks everyone, I'll be merging that fix ASAP On Mon, Jan 27, 2025 at 6:01 AM Fokko Driesprong wrote: > +1 > > Op ma 27 jan 2025 om 10:54 schreef Honah J. : > >> +1, thanks for driving this! >> >> Best Regards, >> Honah >> >> On Sun, Jan 26, 2025 at 3:20 PM Steven Wu wrote: >> >>> +1 >>> >>> O

Re: [VOTE] Release Apache Iceberg 1.7.2 rc0

2025-01-27 Thread Russell Spitzer
+1 (binding) Checked licensing and sha and gpg sig On Mon, Jan 27, 2025 at 3:02 PM Fokko Driesprong wrote: > +1 (binding) > > Ran signature/checksum/license: > > *➜ **Desktop* gpg --verify apache-iceberg-1.7.2.tar.gz.asc > > gpg: assuming signed data in 'apache-iceberg-1.7.2.tar.gz' > > gpg: S

Re: FileRewrite API refactor

2025-01-21 Thread Russell Spitzer
To bump this back up, I think this is a pretty important change to the core library so it's necessary that we get more folks involved in this discussion. I I agree that the Rewrite Data Files needs to be broken up and realigned if we want to be able to reuuse the code in flink. I think I prefer t

Re: [VOTE] Deprecate IRC snapshot-id Field of SetStatisticsUpdate

2025-01-21 Thread Russell Spitzer
+1 On Tue, Jan 21, 2025 at 4:34 AM Alex Dutra wrote: > +1 (nb) > > On Tue, Jan 21, 2025 at 11:30 AM Piotr Findeisen < > piotr.findei...@gmail.com> wrote: > >> +1 non-binding >> >> On Tue, 21 Jan 2025 at 10:25, Fokko Driesprong wrote: >> >>> +1 >>> >>> Thanks for cleaning this up Christian! >>>

Re: [VOTE] Add Variant type to Iceberg Spec

2025-01-29 Thread Russell Spitzer
). >>> >>> >>>> 3. There is very little in our change set here that specifically >>>> references the Parquet spec except for our reference link to it. >>> >>> >>> This cuts both ways? What is the rush to get this into V3 if it can >>

Re: Changing default delete file granularity for Spark writes from partition to file scoped

2025-01-02 Thread Russell Spitzer
;> >>>> I support the idea of switching to file-scoped deletes for new tables. >>>> The absence of sync maintenance prevented us from doing that earlier. Given >>>> that Amogh recently merged that functionality into main, we should be fine. >>>>

Re: [Discuss][Vote] Spec Change - Add optional field added-rows to Snapshot for Row Lineage

2025-01-17 Thread Russell Spitzer
22 AM Jean-Baptiste Onofré wrote: > +1 (non binding) > > Regards > JB > > On Wed, Jan 15, 2025 at 5:59 PM Russell Spitzer > wrote: > > > > Hi Everyone! > > > > PR: https://github.com/apache/iceberg/pull/11976/files > > > > Split out from #11948 &

[Discuss][Vote] Spec Change - Add optional field added-rows to Snapshot for Row Lineage

2025-01-15 Thread Russell Spitzer
Hi Everyone! PR: https://github.com/apache/iceberg/pull/11976/files Split out from #11948 Working on the row-lineage implementation made it clear that we needed a way to get information from the Snapshot object propagated into the Metadata layer. Sp

Re: [VOTE] Document Snapshot Summary Optional Fields as Appendix in Spec

2025-01-14 Thread Russell Spitzer
+1 On Tue, Jan 14, 2025 at 2:00 PM Honah J. wrote: > Hi everyone, > > Based on good feedback on the [DISCUSS] thread > . and > the pull request > . I > would lik

Re: [VOTE] Document Snapshot Summary Optional Fields as Appendix in Spec

2025-01-15 Thread Russell Spitzer
@Daniel Weeks what do you think? I know both you and I had the opposite feeling here. On Tue, Jan 14, 2025 at 6:21 PM rdb...@gmail.com wrote: > The content looks correct to me, but because this states a requirement > ("Metrics must be accurate if written") I would rather move this content > int

Re: [Discussion] Spec change for Row Lineage - Allow Equality Deletes

2025-02-12 Thread Russell Spitzer
t said, I'm not convinced that this worth the complexity and effort. > Especially since between maintenance job runs the lineage info is still > invalid. > > > On Wed, Feb 12, 2025, 19:06 Russell Spitzer > wrote: > >> I'm not sure I follow how one could figur

Re: [Discussion] Spec change for Row Lineage - Allow Equality Deletes

2025-02-12 Thread Russell Spitzer
gt;>> >>>>> On Tue, Feb 11, 2025 at 7:39 PM Gang Wu wrote: >>>>> >>>>>> Hi Russell, >>>>>> >>>>>> Thanks for supporting equality deletes to row lineage! >>>>>> >>>>>> > accept that "

Re: [VOTE] Add overwriteRequested to RegisterTableRequest in REST spec

2025-02-13 Thread Russell Spitzer
+1 On Wed, Feb 12, 2025 at 5:30 PM Steve Zhang wrote: > Hi Iceberg Community, > > I'm working on supporting the registration of iceberg metadata for an > existing table in the catalog. As part of this work, I'm proposing to add > an optional boolean field in RegisterTableRequest. > > I'd lik

[Discussion] Spec change for Row Lineage - Allow Equality Deletes

2025-02-11 Thread Russell Spitzer
Hi Y'all, As we have been working on the row lineage implementation I've been reached out to by a few folks in the community who are interested in changing our defined behavior around equality deletes. Currently when Row Lineage is enabled, the spec says to disable equality deletes for the table

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-13 Thread Russell Spitzer
I think we should do an even bigger change. IMHO, Project should have information about interacting with the project so Community Contributing Implementation Status Multi-engine Support How to Release ASF Then have Concepts include all the technical details * Spec Terms On Thu, Feb 13, 2025 at 1

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-14 Thread Russell Spitzer
term, specs), "Concepts" is > probably not accurate. Maybe "Spec" is more appropriate. > > On Thu, Feb 13, 2025 at 9:41 AM Russell Spitzer > wrote: > >> I think we should do an even bigger change. IMHO, Project should have >> information about inter

[VOTE] Allow Row-Lineage with Equality Deletes

2025-02-19 Thread Russell Spitzer
The PR: https://github.com/apache/iceberg/pull/12230 is basically ready now. So let's do a last vote to make sure everyone is aware of the upcoming change. Before: Equality deletes are not allowed to be used when row-lineage is enabled After: Equality deletes are allowed to be used when row-line

Re: [VOTE] Java implementation notes around current-snapshot-id

2025-02-24 Thread Russell Spitzer
+1 On Mon, Feb 24, 2025 at 12:55 PM Fokko Driesprong wrote: > Hi everyone, > > Recently, there was confusion > about > valid values for the current-snapshot-id, which led to implementation > notes

Re: [VOTE] Allow Row-Lineage with Equality Deletes

2025-02-24 Thread Russell Spitzer
;>>>> Thanks Russell! >>>>>> >>>>>> On Thu, Feb 20, 2025 at 1:57 AM Fokko Driesprong >>>>>> wrote: >>>>>> >>>>>>> +1 >>>>>>> >>>>>>> Thanks Russell! >>>

Re: [VOTE] Release Apache Iceberg 1.8.1 RC1

2025-02-25 Thread Russell Spitzer
+1 Checked Sigs and Checksum Ran Rat Ran full build/test On Tue, Feb 25, 2025 at 11:30 AM Driesprong, Fokko wrote: > +1 (binding) > >- Checked signatures and checksum >- Checked licenses >- Spotchecked NOTICE/LICENSE > > Kind regards, > Fokko > > Op di 25 feb 2025 om 16:56 schree

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread Russell Spitzer
ior in the "Implementation Notes" section. >> >> > How about reverting #11560 for 1.8.1, and then reinstating this for >> 2.0.0? >> >> I think we need to fix this at a format version boundary, not a library >> version boundary. I'd be up for reinsta

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread Russell Spitzer
nges for the fields > * Define the "schema/spec/sort not present" values (the fields are > optional for v1 but required for v2+v3). > * OR Define that "schema/spec/sort must be absent" if there is no current > schema/spec/sort. > > WDYT? > > On 17.02.25 21:07,

Re: Remove deprecated table properties

2025-02-17 Thread Russell Spitzer
+1 to remove in 1.9 On Mon, Feb 17, 2025 at 4:20 AM Fokko Driesprong wrote: > Hi everyone, > > While reviewing the LocationProvider equivalent of PyIceberg, I noticed > some old code in the Java codebase that I felt could be cleaned up. You > can find the PR over here

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-17 Thread Russell Spitzer
It sounds like the argument here is that we should change the Spec for V1, V2, and V3 to mark current-snapshot-id as required. Then we should change all other implementations to follow this new standard. I'm not sure that is a good solution going forwards but I'm not sure of how we can support cata

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Russell Spitzer
+0 - I would be surprised if post compression sizes were that different but minifying json is a pretty standard practice for over the wire transfers On Mon, Feb 17, 2025 at 1:51 PM Steve Zhang wrote: > +1. Configure table property `write.metadata.compression-codec` to gzip is > usually suggested

Re: [ANNOUNCE] Apache Iceberg Pre-Summit Community Meetup in SF

2025-03-10 Thread Russell Spitzer
I'm flying in early enough to join as well! On Mon, Mar 10, 2025 at 4:38 PM Fokko Driesprong wrote: > Nice, thanks for organizing this Sung, and thanks to Bloomberg for > sponsoring. I've just signed up! > > Kind regards, > Fokko > > Op ma 10 mrt 2025 om 16:22 schreef Jean-Baptiste Onofré : > >>

<    1   2   3