Re: [VOTE] Merge guidelines for committing PRs

2024-08-29 Thread rdb...@gmail.com
-0 While I appreciate the motivation, I think that this is going to lead to more problems, not fewer. On Wed, Aug 28, 2024 at 10:54 PM Renjie Liu wrote: > +1 (binding) > > On Thu, Aug 29, 2024 at 8:59 AM Amogh Jahagirdar <2am...@gmail.com> wrote: > >> +1 (binding) >> >> On Wed, Aug 28, 2024 at

Re: [DISCUSS] Row Lineage Proposal

2024-08-29 Thread rdb...@gmail.com
+1 for making row lineage and equality deletes mutually exclusive. The idea behind equality deletes is to avoid needing to read existing data in order to delete records. That doesn't fit with row lineage because the purpose of lineage is to be able to identify when a row changes by maintaining an

Re: [VOTE] Merge REST Spec Change To Add New Scan Planning APIs

2024-09-03 Thread rdb...@gmail.com
+1 I think it would be good to give an overview of the current proposal since it has evolved quite a bit from the original like Jack said. On Tue, Sep 3, 2024 at 9:09 AM Jack Ye wrote: > Thanks for keeping pushing for this Rahil. Personally I am +1 (binding) > for this, with just some minor com

Re: Time-based partitioning on long column type

2024-09-10 Thread rdb...@gmail.com
Maybe we could update the time-based partition functions to be applied to a long column directly. It would treat that column like a timestamp in milliseconds. Would that work? I need to think more about the implications of doing that, but I don't think that we currently have an issue with extending

[DISCUSS] September board report

2024-09-10 Thread rdb...@gmail.com
Hi everyone, It’s time for another ASF board report! Here’s my current draft. Please reply if you think there is something that I should add or change. Thanks! Ryan Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. Pro

Re: [DISCUSS] September board report

2024-09-11 Thread rdb...@gmail.com
Thanks for the updates! I'll add those. On Wed, Sep 11, 2024 at 8:02 AM Jean-Baptiste Onofré wrote: > Hi Ryan, > > It looks good to me. Thanks ! > > Regards > JB > > On Tue, Sep 10, 2024 at 11:43 PM rdb...@gmail.com > wrote: > > > > Hi everyone, &g

Re: [DISCUSS] Define calendar used in specification?

2024-09-12 Thread rdb...@gmail.com
The spec purposely avoids timestamp conversion. Iceberg returns values as they are passed from the engine and it is the engine's responsibility to do any date/time conversion. I don't think that we should change this and take responsibility in Iceberg. On Thu, Sep 12, 2024 at 12:32 AM Bart Samwel

Re: Time-based partitioning on long column type

2024-09-12 Thread rdb...@gmail.com
ues and treat them as milliseconds. The former seems more reasonable to > me. The latter I think has many of the same draw-backs raised on the other > thread. IMO, both aren't super pleasing from a long term maintainability > perspective. > > Cheers, > Micah > > On Tue

Re: [Discuss] test logging is broken and Avro 1.12.0 upgraded slf4j-api dep to 2.x

2024-09-16 Thread rdb...@gmail.com
If I understand the SLF4J announcement correctly, it sounds like the best option is to rely on binary compatibility between the 1.x and 2.x clients. As long as we don't use the newer API, then the compiled code can use either a 1.7.x or 2.0.x API Jar. The API Jar needs to match the provider versio

Re: Spec changes for deletion vectors

2024-10-14 Thread rdb...@gmail.com
e Apache Impala). > > I like the proposal, I just hope we won't "surprise" some query > engines with extra work :) > > Regards > JB > > On Thu, Oct 10, 2024 at 11:41 PM rdb...@gmail.com > wrote: > > > > Hi everyone, > > > > There s

Re: Spec changes for deletion vectors

2024-10-15 Thread rdb...@gmail.com
ciency, so I do feel the field is not in > the normal direction of the project. Also Im not clear on the plan for old > Delta readers, they cant read Puffin anyway, if Delta adopts Puffin, then > new readers could adopt? Anyway great work again, thanks for raising the > issue on devl

Re: [VOTE] Table V3 Spec: Row Lineage

2024-10-09 Thread rdb...@gmail.com
+1 Thanks for shepherding this, Russell! On Tue, Oct 8, 2024 at 7:07 PM Russell Spitzer wrote: > Hi Y'all! > > I think we are more or less in agreement on adding Row Lineage to the spec > apart from a few details which may change a bit during implementation. > Because of this, I'd like to call

Re: [Discuss] Iceberg community maintaining the docker images

2024-10-09 Thread rdb...@gmail.com
I think it's important for a project to remain focused on its core purpose, and I've always advocated for Iceberg to remain a library that is easy to plug into other projects. I think that should be the guide here as well. Aren't projects like Spark and Trino responsible for producing easy to use D

Re: Clarification on DayTransform Result Type

2024-10-07 Thread rdb...@gmail.com
; TypeToSparkType` > <https://github.com/apache/iceberg/blob/09370ddbc39fc3920fb8cbd3dff11b377dd37e40/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java#L103-L104> > ? > > Best, > Kevin > > > > On Fri, Sep 27, 2024 at 1:52 PM rdb...@gmail.co

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-19 Thread rdb...@gmail.com
+1 On Thu, Oct 17, 2024 at 11:56 PM Steve Zhang wrote: > +1 > > Thanks, > Steve Zhang > > > > On Oct 17, 2024, at 11:16 PM, roryqi wrote: > > +1. > > Péter Váry 于2024年10月18日周五 13:44写道: > >> +1 >> >> On Fri, Oct 18, 2024, 04:50 Manu Zhang wrote: >> >>> +1 >>> >>> On Fri, Oct 18, 2024 at 8:50 A

Re: Spec changes for deletion vectors

2024-10-19 Thread rdb...@gmail.com
>> >>>>> On Thu, Oct 17, 2024 at 11:02 AM Jean-Baptiste Onofré >>>>> wrote: >>>>> >>>>>> Hi folks, >>>>>> >>>>>> As Daniel said, I think we have actually two proposals in one: >>>>

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-19 Thread rdb...@gmail.com
I can provide some historical context here about how the table spec evolved and how the REST spec works with respect to table versions. We initially did not have the snapshot summary or operation. When I added the summary, the operation was intended to be required in cases where the summary is pre

Re: Overwrite old properties on table replace with REST catalog

2024-10-20 Thread rdb...@gmail.com
Hi Vladimir, This isn't a bug. The behavior of CREATE OR REPLACE is to replace the data of a table, but to maintain things like other refs, snapshot history, permissions (if supported by the catalog), and table properties. Table properties are replaced if they are set in the operation like `b` in

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-20 Thread rdb...@gmail.com
as, and how the Table Spec >> and the REST Catalog Spec should each be referenced in the sub-communities >> (like in PyIceberg). I'll keep those motivations in mind as we discuss >> those Specs in the future. >> >> Also, here's a small PR to specify more explicitl

Re: Spec changes for deletion vectors

2024-10-16 Thread rdb...@gmail.com
t; Iceberg on DVs is a great thing to have. The additions for cross-compat >>> seem a minor thing to me that is vastly outweighed by a future where Delta >>> tables with DVs were supported in Delta Uniform and could be read by any >>> Iceberg V3 compliant engine. >>

Re: Iceberg View Spec Improvements

2024-10-09 Thread rdb...@gmail.com
+1 for Steven's comment. There is already an implicit assumption that the catalog names are consistent across engines. The best practice is to not reference identifiers across catalogs, but there isn't much we can do about the assumption here without rewriting SQL to fully qualify identifiers. On

Re: [Discuss] Iceberg community maintaining the docker images

2024-10-10 Thread rdb...@gmail.com
with Ryan that Iceberg should not provide any docker image or >>>> runtime things (we had the same discussion about REST server). >>>> >>>> However, my understanding is that this discussion is also related to >>>> the REST TCK. The TCK validation run needs

Spec changes for deletion vectors

2024-10-10 Thread rdb...@gmail.com
Hi everyone, There seems to be broad agreement around Anton's proposal to use deletion vectors in Iceberg v3, so I've opened two PRs that update the spec with the proposed changes. The first, PR #11238 , adds a new Puffin blob type, delete-vector

Re: [EXTERNAL] Re: [DISCUSS] Column to Column filtering

2024-10-04 Thread rdb...@gmail.com
e: [DISCUSS] Column to Column filtering >> >> [CAUTION: External Email] >> >> >> >> I have similar concerns to Ryan although I could see that if we were >> writing smaller and better correlated files that this could be a big help. >> Specifically with

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-22 Thread rdb...@gmail.com
06c67e1eab3/open-api/rest-catalog-open-api.yaml#L2414 > [3] https://iceberg.apache.org/spec/#table-metadata-fields > [4] > https://github.com/apache/iceberg/blob/8e743a5b5209569f84b6bace36e1106c67e1eab3/open-api/rest-catalog-open-api.yaml#L2325 > [5] https://iceberg.apache.org/spec/#sn

Re: Overwrite old properties on table replace with REST catalog

2024-10-22 Thread rdb...@gmail.com
in the first place because I observed a discrepancy in Trino: all >> catalogs except for REST completely overrides table properties on REPLACE, >> and REST catalog merges them, which might be confusing to end users. >> Perhaps some clarification at the spec level might be useful, b

Re: Spec changes for deletion vectors

2024-10-22 Thread rdb...@gmail.com
of their adoption in other formats. >> >> So with that said, I'm in support of any of the above solutions but I >> think just going with full compatibility with Delta (down to storage format >> details) is the right choice to try to get the two communities working >&

Re: [VOTE] Endpoint for refreshing vended credentials

2024-10-22 Thread rdb...@gmail.com
+1 (binding) Thanks for your work on this! On Tue, Oct 22, 2024 at 2:47 PM Prashant Singh wrote: > +1 (non-binding) > > Regards, > Prashant > > On Tue, Oct 22, 2024 at 10:50 AM John Zhuge wrote: > >> +1 (non-binding) >> >> John Zhuge >> >> >> On Tue, Oct 22, 2024 at 9:45 AM Jack Ye wrote: >>

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-28 Thread rdb...@gmail.com
s("operation"); // This >> fails >> } >> >> Kind regards, >> Fokko >> >> >> Op ma 28 okt 2024 om 00:16 schreef Kevin Liu : >> >>> Hi Ryan, >>> >>> I've created a revert PR [1]. I agree that we should take a mor

Re: [DISCUSS] Apache Iceberg 1.7.0 Release Cutoff

2024-10-24 Thread rdb...@gmail.com
Do I recall our agreement that any V3 spec changes that will be released bear no compatibility guarantees until we close V3 and vote on it as a whole? Yes. We can make changes to v3 until the community votes to adopt and close the version to any new forward-breaking changes. That vote isn’t tied t

Re: [PROPOSAL] Refactore use of Guava Lists.*

2024-10-25 Thread rdb...@gmail.com
It’s correct that these methods aren’t strictly needed. We could translate every case into a slightly different form: Lists.newArrayList() -> new ArrayList<>() Lists.newArrayList(iter) -> new ArrayList(); Iterators.addAll(list, iter) Lists.newArrayList(iterable) -> new ArrayList<>(); Iterators.add

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-26 Thread rdb...@gmail.com
>> >> Best, >> Kevin Liu >> >> [1] https://github.com/apache/iceberg/pull/11354 >> >> >> >> On Tue, Oct 22, 2024 at 4:06 PM rdb...@gmail.com >> wrote: >> >>> > For example, the `Snapshot` `summary` field is optional in V1 but >&

Re: REST catalog removes void transform

2024-10-31 Thread rdb...@gmail.com
Vladimir, what is the context in which you want to maintain a partition spec with only void transforms? Is this in a v2 table? In a v2 table, the catalog should be free to remove void transforms. They are required for v1. On Wed, Oct 30, 2024 at 5:00 AM Vladimir Ozerov wrote: > Hi, > > When a us

Re: [VOTE] Deletion Vectors in V3

2024-10-31 Thread rdb...@gmail.com
+1 Thanks, Anton! On Wed, Oct 30, 2024 at 11:58 PM Fokko Driesprong wrote: > +1 > > I had to read up a bit, thanks for driving this Anton. > > Kind regards, > Fokko > > Op do 31 okt 2024 om 07:53 schreef Piotr Findeisen < > piotr.findei...@gmail.com>: > >> Thank you Anton, >> >> +1 (non-binding

Re: [VOTE] Drop Python3.8 Support in PyIceberg 0.8.0

2024-09-23 Thread rdb...@gmail.com
+1 On Mon, Sep 23, 2024 at 10:31 AM Steven Wu wrote: > +1 (binding). makes sense. > > On Mon, Sep 23, 2024 at 9:38 AM Yufei Gu wrote: > >> +1 Thanks for bringing this up. >> >> Yufei >> >> >> On Mon, Sep 23, 2024 at 9:27 AM Kevin Liu wrote: >> >>> +1 non-binding. Thanks for starting this conve

[VOTE] Table v3 spec: Add unknown and new type promotion

2024-09-27 Thread rdb...@gmail.com
Hi everyone, I'd like to vote on PR #10955 that has been open for a while with the changes to add new type promotion cases. After discussion, the PR has been scoped down to keep complexity low. It now adds: * An `unknown` type for cases when only `nu

Re: [DISCUSS] Modify ThreadPools.newWorkerPool to avoid unnecessary Shutdown Hook registration

2024-09-27 Thread rdb...@gmail.com
ic ExecutorService newNonExitingWorkerPool(String > namePrefix, int poolSize) { > >> return Executors.newFixedThreadPool( > >> poolSize, > >> new > ThreadFactoryBuilder().setDaemon(true).setNameFormat(namePrefix + > "-%d").build()); &

Re: Clarification on DayTransform Result Type

2024-09-27 Thread rdb...@gmail.com
The background is that the result of the day function and dates are basically the same: the number of days from the Unix epoch. When we started using metadata tables, we realized that a lot of people use the day function but then get a weird ordinal value out, but if we just change the type to `dat

Re: [DISCUSS] Modify ThreadPools.newWorkerPool to avoid unnecessary Shutdown Hook registration

2024-09-30 Thread rdb...@gmail.com
eprecate the current > `newWorkerPool` with `newExitingWorkerPool`. This way, when people calls > `newExitingWorkerPool`, the intended behavior is clear from the method name. > > On Fri, Sep 27, 2024 at 1:58 PM rdb...@gmail.com wrote: > >> I'm okay with adding newFixedThreadPool as Ste

Re: [Discuss] Geospatial Support

2024-09-30 Thread rdb...@gmail.com
I have a couple of comments that I'd like to see addressed. First, I think that the definition of the bounding box needs to be more clear: the bounding box must include all points that lie on an object's edges or within an object. If that isn't required then we can't use the bounding box for filte

Re: [VOTE] Table v3 spec: Add unknown and new type promotion

2024-09-30 Thread rdb...@gmail.com
+1 (binding) On Mon, Sep 30, 2024 at 12:32 PM Daniel Weeks wrote: > +1 (binding) > > On Fri, Sep 27, 2024 at 2:41 PM Russell Spitzer > wrote: > >> +1 (binding) >> >> On Fri, Sep 27, 2024 at 4:37 PM rdb...@gmail.com >> wrote: >> >>> Hi ever

Re: [DISCUSS] Modify ThreadPools.newWorkerPool to avoid unnecessary Shutdown Hook registration

2024-09-18 Thread rdb...@gmail.com
Since we're using standard interfaces, maybe we should just document this behavior and you can control it by creating your own worker pool instead? On Tue, Sep 17, 2024 at 2:20 AM Péter Váry wrote: > Bumping this thread a bit. > > Cleaning up the pool in non-static cases should be a responsibili

Re: [DISCUSS] Modify ThreadPools.newWorkerPool to avoid unnecessary Shutdown Hook registration

2024-09-18 Thread rdb...@gmail.com
eate a "newExitingWorkerPool", and change the > callers to use the correct one. > If this is a feature, then we create a "newNotExitingWorkerPool" (which is > gross IMHO, but we should consider API compatibility), and change the > callers to use the correct one. >

Re: [DISCUSS] Column to Column filtering

2024-09-18 Thread rdb...@gmail.com
I'm curious to learn more about this feature. Is there a driving use case that you're implementing it for? Are there common situations in which these filters are helpful and selective? My initial impression is that this kind of expression would have limited utility at the table format level. Icebe

Re: [VOTE] Table v3 spec: Add unknown and new type promotion

2024-10-03 Thread rdb...@gmail.com
fré wrote: > +1 (non binding) > > Regards > JB > > On Fri, Sep 27, 2024 at 11:36 PM rdb...@gmail.com > wrote: > > > > Hi everyone, > > > > I'd like to vote on PR #10955 that has been open for a while with the > changes to add new type promotion

Re: [Discuss] Iceberg View Interoperability

2024-10-25 Thread rdb...@gmail.com
Substrait is one of the reasons why we designed views with the ability to have different representations. I think that SQL translation is not a great solution. I'd like to see more focus on a portable intermediate representation like Substrait. That would solve a lot of the limitations with the SQL

Re: [Discuss] Different file formats for ingestion and compaction

2024-10-25 Thread rdb...@gmail.com
Gabor, The reason why the write format is a "default" is that I intended for it to be something that engines could override. For cases where it doesn't make sense to use the default because of memory pressure (as you might see in ingestion processes) you could choose to override and use a format t

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread rdb...@gmail.com
hat it is not necessarily part of this scope of HiveCatalog's > tableExists(). > > At least this is my understanding. > Thanks, > Szehon > > On Wed, Nov 27, 2024 at 10:56 AM rdb...@gmail.com > wrote: > >> What kind of corruption are you referring to? I would expect

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread rdb...@gmail.com
e original proposal to return true when >> the table exists but the metadata is somehow corrupted? Note: this is the >> proposed change of behavior why the thread was originally started. >> >> On Tue, Nov 26, 2024, 21:30 rdb...@gmail.com wrote: >> >>> I'd a

Re: Storing catalog directly on object store

2024-11-27 Thread rdb...@gmail.com
> We deprecated this recently and we don't have to deprecate it if object stores support atomic operations like this. I disagree because this misses many of the reasons for deprecation. It isn't just that S3 didn't support a `putIfAbsent` operation. Other object stores did and there are still seve

Re: [DISCUSS] Enforce table properties at catalog level

2024-11-27 Thread rdb...@gmail.com
Manu, this is something that you can easily build into a REST catalog implementation. I think that's probably the best way to solve it, rather than trying to implement this behavior across all of the catalogs in the project, right? On Wed, Nov 27, 2024 at 8:47 AM Pucheng Yang wrote: > I think th

Re: [DISCUSS] Hive Support

2024-11-27 Thread rdb...@gmail.com
I think that we should remove Hive 2 and Hive 3. We already agreed to remove Hive 2, but Hive 3 is not compatible with the project anymore and is already EOL and will not see a release to update it so that it can be compatible. Anyone using the existing Hive 3 support should be able to continue usi

Re: [DISCUSS] Deprecate embedded manifests

2024-11-27 Thread rdb...@gmail.com
s for a longer time. My suggestion would be to > mark the field as deprecated and revisit the actual removal. I've marked it > up for removal in Java 2.0 for now to give it enough time. > > Kind regards, > Fokko > > > > Op do 21 nov 2024 om 20:52 schreef rdb...@gmail.

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-26 Thread rdb...@gmail.com
I'd argue against changing this. The current behavior's intent is not to check whether the metadata is valid, it is to detect whether the table is an Iceberg table. It ignores non-Iceberg tables. Changing that behavior would be surprising, especially if we started throwing exceptions. On Fri, Nov

Re: [VOTE] Add Variant type to Iceberg Spec

2024-11-26 Thread rdb...@gmail.com
+1 and I agree with Russell. v3 is still under development, so I think it's reasonable to include Variant based on the current Parquet spec. On Mon, Nov 25, 2024 at 10:35 PM Jean-Baptiste Onofré wrote: > I second Russell here. I think it makes sense to add variant type to > V3 spec, even if the

Re: [DISCUSS] Apache Iceberg Summit 2025 - Selection Committee

2024-11-26 Thread rdb...@gmail.com
I'd like to volunteer. Glad to see Iceberg Summit 2025 coming together! On Tue, Nov 26, 2024 at 1:42 AM Jean-Baptiste Onofré wrote: > Hi everyone, > > As you probably know, we've been having discussions about the Iceberg > Summit 2025. > > The PMC pre-approved the Iceberg Summit proposal, and on

Re: [DISCUSS] Update supported blob types in puffin spec

2025-02-04 Thread rdb...@gmail.com
Thanks for proposing this. My main concern is that this doesn't seem to be aimed at standardizing this metadata, but rather a way to pass existing Hive structures in a different way. I commented on the PR, but I'll carry it over here for this discussion. Iceberg already supports tracking column l

Re: [VOTE] Update partition stats spec for V3

2025-02-04 Thread rdb...@gmail.com
+1 On Tue, Feb 4, 2025 at 12:46 AM Honah J. wrote: > +1 > > On Mon, Feb 3, 2025 at 11:42 PM Ajantha Bhat > wrote: > >> +1 >> >> On Tue, Feb 4, 2025 at 11:30 AM Eduard Tudenhöfner < >> etudenhoef...@apache.org> wrote: >> >>> +1 >>> >>> On Mon, Feb 3, 2025 at 8:33 PM Dongjoon Hyun >>> wrote: >>>

Re: Iceberg handling of Parquet "2-level" lists

2025-02-06 Thread rdb...@gmail.com
Hi Matt, If you want to work on getting this change in, I'd be happy to review it. I think it is fine to support older, incorrectly written data. I looked briefly at the PR and I think it needs to be updated to at least add tests and to justify why the changes are correct. It looks like the repeti

Re: [VOTE] Add Geometry and Geography types for V3

2025-02-06 Thread rdb...@gmail.com
+1 Awesome to see this ready to go! On Thu, Feb 6, 2025 at 12:01 PM Szehon Ho wrote: > Hi everyone > > We would like to add Geometry and Geography types to the Iceberg V3 spec: > > https://github.com/apache/iceberg/pull/10981 > > This is proposed together with Apache Parquet format change to su

Re: [DISCUSS] Table name in table metadata

2025-02-10 Thread rdb...@gmail.com
I don't think it is a good idea to add the table name to metadata because it can easily get stale and would be misleading. Table name is a catalog concern and we typically try to keep catalog concerns out of the table space. Instead, I'd suggest updating the error that your users see so that the er

Re: Table metadata swap not work for REST Catalog (#12134)

2025-02-10 Thread rdb...@gmail.com
Yeah, it sounds like a "register table force" is the right concept here. I think we want to make sure that table updates remain change-based as the best practice in the REST API. But there are some irregular use cases that justify having some mechanism to completely replace the state (like push-bas

Re: [VOTE] Release Apache Iceberg 1.8.0 RC0

2025-02-11 Thread rdb...@gmail.com
+1 * Validated signature and checksum * Ran RAT checks * Ran tests that didn't require Docker in Java 17 As a follow up, I think that we should move any tests that require Docker to integrationTest rather than test. We should try not to rely on Docker containers in normal unit tests because conta

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread rdb...@gmail.com
+1 On Tue, Feb 11, 2025 at 10:50 AM Steve Zhang wrote: > +1 nb > > Thanks, > Steve Zhang > > > > On Feb 11, 2025, at 10:26 AM, Honah J. wrote: > > +1 > > On Tue, Feb 11, 2025 at 10:16 AM Christian Thiel < > christian.t.b...@gmail.com> wrote: > >> +1 (non-binding) >> Thanks Gabor! >> >> On Tue,

Re: Contributor guidelines for becoming a committer

2024-12-11 Thread rdb...@gmail.com
I want to make a clarification. I did comment on that PR that we are describing how the community is operating today, but that was in response to suggestions to reference the comdev project, lower the requirements, and add a requirement for loving the project and helping the community. My intent is

[DISCUSS] December board report

2024-12-11 Thread rdb...@gmail.com
Hi everyone, It’s time to report to the board again. Great to see all the progress here, and awesome to have our first go release this quarter! My draft is below. Please reply if there’s anything you’d like to add or change. Thanks! Ryan Description: Apache Iceberg is a table format for huge an

Re: Contributor guidelines for becoming a committer

2024-12-11 Thread rdb...@gmail.com
ertise? > ... > * Have the candidate’s contributions been stable and maintainable? > ... > " > etc > > It sounds like "code contributions and code reviews". I think it's > what Justin meant. > > Regards > JB > > On Wed, Dec 11, 2024 a

Re: [DISCUSS] December board report

2024-12-11 Thread rdb...@gmail.com
m of Arrow record batches. > > On Wed, Dec 11, 2024, 4:22 PM Walaa Eldin Moustafa > wrote: > >> Hi Ryan, >> >> For Table Format V3, we could point out that the default value support >> for Avro has been merged and support for other formats is ongoing. >> >>

Re: [DISCUSS] Spark Catalog - Drop vs Drop with Purge

2024-12-11 Thread rdb...@gmail.com
That plan sounds good to me. Thanks, Russell! On Wed, Dec 11, 2024 at 1:43 PM Yufei Gu wrote: > +1 on adding a flag to support the Spark REST client behavior change > between v1.8 and v2.0. > > At the same time, we may clarify further more on the behavior of DropTable > REST API, > https://githu

Contributor guidelines for becoming a committer

2024-12-10 Thread rdb...@gmail.com
Hi everyone, Earlier this year, there were a few threads on this list that highlighted that it wasn’t clear enough how contributors become committers, so the PMC put together a doc to explain some of the common questions, including: - What are the responsibilities of a committer? - How are

Re: [DISCUSS] Hive Support

2024-12-16 Thread rdb...@gmail.com
0[1] and there will be conflicts > between Spark's hive 2.3.9 and our hive 4.0 dependencies. > I'm not sure there's an upgrade path before Spark 4.0. Any ideas? > > 1. https://issues.apache.org/jira/browse/SPARK-45265 > > Thanks, > Manu > > > On Sat, Dec 14

Re: [VOTE] Drop Hive runtime

2024-12-18 Thread rdb...@gmail.com
The PR looks good to me. +1 On Tue, Dec 17, 2024 at 6:00 PM Manu Zhang wrote: > Hi all, > > Thanks for sharing your ideas in the discussion of Hive support[1]. We > have a consensus to drop Hive runtime and upgrade Hive metastore connector > to Hive 4. However, it looks like we can't upgrade met

Re: [DISCUSS] Relocate Parquet to Iceberg Core

2024-12-18 Thread rdb...@gmail.com
I was the person that originally suggested that we not move iceberg-parquet into core, so it would probably help if I gave some context for my rationale as I remember it and what's changed since then. I pushed back on the original suggestion to move Parquet classes into core because it wasn't clea

Re: [DISCUSS] Hive Support

2024-12-13 Thread rdb...@gmail.com
Oh, I think I see. The upgrade to Hive 4 is just for the Hive metastore support? When I read the thread, I thought that we weren't going to change the metastore. That seems reasonable to me. Sorry for the confusion. On Fri, Dec 13, 2024 at 10:24 AM rdb...@gmail.com wrote: > Sorry, I m

Re: [DISCUSS] Hive Support

2024-12-13 Thread rdb...@gmail.com
ceberg-core/, >>> which might help when working on features that are not released yet (eg >>> Nanosecond timestamps). Besides that, we should run RCs against Hive to >>> check if everything works as expected. >>> > >>> > I'm leaning toward removing Hiv

Re: [DISCUSS] Standardizing Error Handling in the Iceberg Spark Module

2024-12-19 Thread rdb...@gmail.com
This looks like a good improvement to me. Thanks, Huaxin! On Wed, Dec 18, 2024 at 11:37 PM huaxin gao wrote: > Hi everyone, > > While working on integrating Spark 4.0 with Iceberg, I noticed that error > conditions in the Spark module are primarily validated through the content > of error messag

Re: [Discuss] Proposal to Adjust Catalog Sync Schedule & Cancel Next Wednesday’s Meeting

2024-11-21 Thread rdb...@gmail.com
+1 for every 3 weeks instead of 2 out of 3. On Thu, Nov 21, 2024 at 10:57 AM Dmitri Bourlatchkov wrote: > Thanks for keeping track of this, Honah! > > +1 to keep the Wednesday 9 AM Pacific Time meeting every 3 weeks > > I'm ok to pause the 8 PM PST meeting - this time does not work for me > pers

Re: [VOTE] Deprecate and remove last-column-id

2024-11-21 Thread rdb...@gmail.com
+1 On Thu, Nov 21, 2024 at 5:22 AM Jean-Baptiste Onofré wrote: > +1 > > Regards > JB > > On Tue, Nov 19, 2024 at 9:18 AM Fokko Driesprong wrote: > > > > Hi everyone, > > > > Based on the positive feedback on the [DISCUSS] thread and the > pull-request on GitHub, I would like to raise a vote to

Re: [DISCUSS] Deprecate embedded manifests

2024-11-21 Thread rdb...@gmail.com
Can we safely deprecate and remove this? The manifest list is required in v2, but the spec has stated for a long time that v1 tables can use manifests rather than a manifest list. It’s unlikely, but it would be valid for other implementations to produce it. I would understand if other implementati

Re: [DISCUSS, VOTE] OpenAPI Metadata Update for EnableRowLineage

2025-01-22 Thread rdb...@gmail.com
+1 On Wed, Jan 22, 2025 at 2:51 PM Russell Spitzer wrote: > Hey Y'all > > Yet another Row Lineage Spec update. This adds a MetadataUpdate > EnableRowLineage to the REST Spec. We briefly talked today > about an alternative EnableFeature(Feature Name) API instead but in the > absence of other feat

Re: [DISCUSS] Support keeping at most N snapshots

2025-01-21 Thread rdb...@gmail.com
I think you could achieve what you're looking for by setting the age to 1 ms and the minimum number of snapshots to keep. Then snapshot expiration would always expire all snapshots other than the min number, getting you what you want. It probably wouldn't make sense to set a maximum as well. Right

Re: [VOTE] Document Snapshot Summary Optional Fields as Subsection of Appendix F in Spec

2025-01-21 Thread rdb...@gmail.com
+1 On Tue, Jan 21, 2025 at 12:20 PM Honah J. wrote: > Hi everyone, > > In the last VOTE > thread > on documenting snapshot summary optional fields, we decided to move the > documentation to a subsection of Appendix F – Implementa

Re: [VOTE] REST API changes for freshness-aware table loading

2025-01-24 Thread rdb...@gmail.com
+1 Thanks, Gabor! On Fri, Jan 24, 2025 at 9:25 AM Christian Thiel wrote: > +1 (non binding). Thanks Gabor! > > Daniel Weeks schrieb am Fr. 24. Jan. 2025 um 17:15: > >> +1 >> >> On Wed, Jan 22, 2025 at 1:19 PM Yufei Gu wrote: >> >>> +1. Thanks, Gabor! A bit more context, we synced on this spec

Re: [VOTE] Add initial/write defaults to REST spec

2025-01-24 Thread rdb...@gmail.com
+1 On Fri, Jan 24, 2025 at 2:25 PM Yufei Gu wrote: > +1 > Yufei > > > On Fri, Jan 24, 2025 at 2:15 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > >> +1 (binding) >> >> On Fri, Jan 24, 2025 at 2:02 PM Jean-Baptiste Onofré >> wrote: >> >>> +1 (non binding) >>> >>> It corresponds to the spec (ini

Re: [DISCUSS/VOTE] Add in ChangeLog Reserved Field IDs to Spec and Decrement Row Lineage Reserved IDs

2025-01-28 Thread rdb...@gmail.com
+1 Thanks for catching this, Russell! On Mon, Jan 27, 2025 at 12:51 PM Russell Spitzer wrote: > Thanks everyone, I'll be merging that fix ASAP > > On Mon, Jan 27, 2025 at 6:01 AM Fokko Driesprong wrote: > >> +1 >> >> Op ma 27 jan 2025 om 10:54 schreef Honah J. : >> >>> +1, thanks for driving t

Re: [VOTE] Document Snapshot Summary Optional Fields as Appendix in Spec

2025-01-16 Thread rdb...@gmail.com
> >> -Dan >> >> On Wed, Jan 15, 2025 at 8:07 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> @Daniel Weeks what do you think? >>> >>> I know both you and I had the opposite feeling here. >>> >>> On Tue,

Re: [VOTE] Document Snapshot Summary Optional Fields as Appendix in Spec

2025-01-14 Thread rdb...@gmail.com
The content looks correct to me, but because this states a requirement ("Metrics must be accurate if written") I would rather move this content into the section on the snapshot summary instead of an appendix. On Tue, Jan 14, 2025 at 1:30 PM huaxin gao wrote: > +1 non-binding > > On Tue, Jan 14,

Re: pre-proposal: schema_id on DataFile

2025-02-14 Thread rdb...@gmail.com
We've considered this in the past and I'm undecided on it. There is some benefit, like being able to prune files during planning if the file didn't contain a column that is used in a non-null filter (i.e. `new_data_column IN ("a", "b")`). On the other hand, we don't want data files that were writt

Re: [VOTE] Add overwriteRequested to RegisterTableRequest in REST spec

2025-02-13 Thread rdb...@gmail.com
+1 On Thu, Feb 13, 2025 at 9:56 AM Huang-Hsiang Cheng wrote: > +1 (non-binding) > > On Feb 13, 2025, at 9:36 AM, Daniel Weeks wrote: > > +1 > > On Thu, Feb 13, 2025 at 9:07 AM Fokko Driesprong wrote: > >> +1 >> >> Op do 13 feb 2025 om 18:06 schreef Steven Wu : >> >>> +1 here. >>> >>> already a

Re: [ANNOUNCE] Apache Iceberg release 1.8.0

2025-02-18 Thread rdb...@gmail.com
+1 to reverting PT 11560 in main and 1.8.1. That avoids unnecessary incompatibility with older readers. I also agree that we should update the spec to say what Russell suggests: > that -1 has meant "no current snapshot" in the past and is equivalent to missing/null. That's a correct description o

Re: [DISCUSS] Apache Iceberg 1.8.1 release

2025-02-19 Thread rdb...@gmail.com
+1 for rolling back the AWS SDK update for the patch release. We should get the better fix into main though. On Wed, Feb 19, 2025 at 8:56 AM Eduard Tudenhöfner wrote: > @Yuya maybe it makes sense to revert the AWS SDK to a version that didn't > introduce those integrity protection changes (added

Re: [DISCUSS] Rest Catalog 419 Response Code

2025-02-24 Thread rdb...@gmail.com
Yeah, I don't think that this response was used. We thought that it was needed but it can probably be safely removed as I'm not aware of any implementation that sent or handled it. If that's the right thing to do because there are other more standard mechanisms for sending more information about a

Re: [VOTE] Java implementation notes around current-snapshot-id

2025-02-24 Thread rdb...@gmail.com
+1 On Mon, Feb 24, 2025 at 12:26 PM Daniel Weeks wrote: > +1 > > On Mon, Feb 24, 2025, 11:00 AM Russell Spitzer > wrote: > >> +1 >> >> On Mon, Feb 24, 2025 at 12:55 PM Fokko Driesprong >> wrote: >> >>> Hi everyone, >>> >>> Recently, there was confusion >>>

Re: [VOTE] Deprecate or remove distinct_count

2025-02-24 Thread rdb...@gmail.com
I can provide some context here. The field is very old and when we realized that it was not only unused but also difficult to produce and use in practice (can't be combined) we deprecated the field. However, some folks from Dremio wanted to bring it back because they said they could store values th

Re: [VOTE] Allow Row-Lineage with Equality Deletes

2025-02-20 Thread rdb...@gmail.com
+1 On Thu, Feb 20, 2025 at 10:01 AM Aihua Xu wrote: > +1 (non-binding). > > On Thu, Feb 20, 2025 at 9:41 AM Huang-Hsiang Cheng > wrote: > >> +1 (non-binding) >> >> Thanks, >> Huang-Hsiang >> >> On Feb 20, 2025, at 9:37 AM, huaxin gao wrote: >> >> +1 (non-binding) >> >> Thanks Russell! >> >> On