Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

2024-10-17 Thread Jean-Baptiste Onofré
Hi Kevin It sounds reasonable to me. I would just mention that the REST catalog is the preferred one. Regards JB On Wed, Oct 16, 2024 at 8:40 PM Kevin Liu wrote: > > Hey folks, > > > Thanks for the discussions. > > > It seems everyone is in favor of replacing the Hadoop catalog example, and >

Re: Spec changes for deletion vectors

2024-10-17 Thread Jean-Baptiste Onofré
Hi folks, As Daniel said, I think we have actually two proposals in one: 1. The first proposal is "improvement of positional delete files", using delete vectors stored in Puffin files. I like this proposal, it makes a lot of sense. I think with a kind of consensus here (we discussed about how to p

[DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Jean-Baptiste Onofré
Hi folks, Even if it seems the project is pretty close to 0.18 release, Apache Pig is a "dormant" project. I would like to discuss here if it would not make sense to remove the iceberg-pig module. Thoughts ? Regards JB

Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

2024-10-17 Thread Marc Cenac
Hey Kevin, This approach sounds good to me and thanks for your work to improve the getting started docs! I would consider using the file-based sqlite rather than in-memory since I've seen some users surprised when they realize their tables disappear from the catalog upon restart, but either way i

[Discuss] Iceberg View Interoperability

2024-10-17 Thread Ajantha Bhat
Hi everyone, It’s been over six months since Iceberg views were introduced (in version 1.5.0), and while we’ve seen some benefits—such as versioning and cross-engine recognition—there’s still a critical gap in terms of true interoperability. Although views created by one engine are recognized by a

Re: [VOTE] Standardize vended credentials in OpenAPI spec

2024-10-17 Thread Amogh Jahagirdar
+1 On Tue, Oct 15, 2024 at 3:03 PM Yufei Gu wrote: > +1 > Yufei > > > On Tue, Oct 15, 2024 at 12:09 PM Daniel Weeks wrote: > >> +1 >> >> On Tue, Oct 15, 2024 at 10:42 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> +1 >>> >>> On Tue, Oct 15, 2024 at 12:28 PM Bryan Keller wrote

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Bryan Keller
+1 > On Oct 17, 2024, at 1:51 PM, Anton Okolnychyi wrote: > > +1 > > чт, 17 жовт. 2024 р. о 13:42 Steven Wu > пише: >> +1 >> >> On Thu, Oct 17, 2024 at 10:44 AM John Zhuge > > wrote: >>> +1 (non-binding) >>> >>> On Thu, Oct 17, 2024 at 1

Re: Spec changes for deletion vectors

2024-10-17 Thread Anton Okolnychyi
We would want to have magic bytes + checksum as part of the blob in Iceberg, as discussed in the spec PRs. If we chose something other than CRC and/or use little endian for all parts of the blob, this would break the compatibility in either direction and would prevent the use case that Scott was me

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Steven Wu
+1 On Thu, Oct 17, 2024 at 10:44 AM John Zhuge wrote: > +1 (non-binding) > > On Thu, Oct 17, 2024 at 10:21 AM Yufei Gu wrote: > >> +1 for deprecating it in 1.7 >> Yufei >> >> >> On Thu, Oct 17, 2024 at 9:51 AM Ajantha Bhat >> wrote: >> >>> +1 for dropping it. >>> >>> On Thu, Oct 17, 2024 at 8:

Re: Spec changes for deletion vectors

2024-10-17 Thread Russell Spitzer
For the conversion from Delta to Iceberg, wouldn't we need to scan all of the Delta Vectors if we choose a different CRC or other endian-ness? Does delta mandate that writers also include this information in their metadata files? On Thu, Oct 17, 2024 at 4:26 PM Anton Okolnychyi wrote: > We would

Re: [VOTE] Standardize vended credentials in OpenAPI spec

2024-10-17 Thread Dmitri Bourlatchkov
+1 (non-binding) Cheers, Dmitri. On Tue, Oct 15, 2024 at 1:15 PM Eduard Tudenhöfner wrote: > Hey everyone, > > I'd like to vote on #10722 , > which has been open for quite a while now. > I believe we're in agreement on how we want to standardize cre

Re: Spec changes for deletion vectors

2024-10-17 Thread Anton Okolnychyi
> > For the conversion from Delta to Iceberg, wouldn't we need to scan all of > the Delta Vectors if we choose a different CRC or other endian-ness? Exactly, we would not be able to expose Delta as Iceberg if we choose a different checksum type or byte order. Does delta mandate that writers also

Re: [VOTE] Standardize vended credentials in OpenAPI spec

2024-10-17 Thread Jack Ye
+1 (binding) Best, Jack Ye On Thu, Oct 17, 2024 at 3:05 PM Dmitri Bourlatchkov wrote: > +1 (non-binding) > > Cheers, > Dmitri. > > On Tue, Oct 15, 2024 at 1:15 PM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> Hey everyone, >> >> I'd like to vote on #10722

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-17 Thread Yufei Gu
Hi Sung, It seems we are running to issues related to a mismatch between the REST spec and table specifications. Currently, there's no clear definition of how the REST spec is meant to support different table specs. The closest reference I found is this statement

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Anton Okolnychyi
+1 чт, 17 жовт. 2024 р. о 13:42 Steven Wu пише: > +1 > > On Thu, Oct 17, 2024 at 10:44 AM John Zhuge wrote: > >> +1 (non-binding) >> >> On Thu, Oct 17, 2024 at 10:21 AM Yufei Gu wrote: >> >>> +1 for deprecating it in 1.7 >>> Yufei >>> >>> >>> On Thu, Oct 17, 2024 at 9:51 AM Ajantha Bhat >>> w

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-17 Thread Anton Okolnychyi
Well, the spec says nothing about a top-level `operation` field in JSON [1]. Yet the Java implementation produces it [2] and removes the operation from the summary map. This seems inconsistent? - Anton [1] - https://iceberg.apache.org/spec/#snapshots [2] - https://github.com/apache/iceberg/blob/1

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Rodrigo Meneses
+1 On Thu, Oct 17, 2024 at 4:38 PM Bryan Keller wrote: > +1 > > > On Oct 17, 2024, at 1:51 PM, Anton Okolnychyi > wrote: > > +1 > > чт, 17 жовт. 2024 р. о 13:42 Steven Wu пише: > >> +1 >> >> On Thu, Oct 17, 2024 at 10:44 AM John Zhuge wrote: >> >>> +1 (non-binding) >>> >>> On Thu, Oct 17, 202

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Steve Zhang
+1 Thanks, Steve Zhang > On Oct 17, 2024, at 11:16 PM, roryqi wrote: > > +1. > > Péter Váry mailto:peter.vary.apa...@gmail.com>> > 于2024年10月18日周五 13:44写道: >> +1 >> >> On Fri, Oct 18, 2024, 04:50 Manu Zhang > > wrote: >>> +1 >>> >>> On Fri, Oct 18, 2024 at

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Manu Zhang
+1 On Fri, Oct 18, 2024 at 8:50 AM Rodrigo Meneses wrote: > +1 > > On Thu, Oct 17, 2024 at 4:38 PM Bryan Keller wrote: > >> +1 >> >> >> On Oct 17, 2024, at 1:51 PM, Anton Okolnychyi >> wrote: >> >> +1 >> >> чт, 17 жовт. 2024 р. о 13:42 Steven Wu пише: >> >>> +1 >>> >>> On Thu, Oct 17, 2024 at

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-17 Thread Péter Váry
Hi Team, Apart from fixing this current issue by relaxing the current spec constraints, to support both v1 and v2 specifications, we should think about how to handle table spec evolution for the long term. What are the base factors we can start from (please add your own ideas if I have missed some

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread roryqi
+1. Péter Váry 于2024年10月18日周五 13:44写道: > +1 > > On Fri, Oct 18, 2024, 04:50 Manu Zhang wrote: > >> +1 >> >> On Fri, Oct 18, 2024 at 8:50 AM Rodrigo Meneses >> wrote: >> >>> +1 >>> >>> On Thu, Oct 17, 2024 at 4:38 PM Bryan Keller wrote: >>> +1 On Oct 17, 2024, at 1:51 PM, A

Re: Spec changes for deletion vectors

2024-10-17 Thread Szehon Ho
So based on Micah's original goals, switch 2 and 3: 1. The best possible implementation of DVs (limited redundancy, no extraneous fields, CPU efficiency, minimal space, etc). 2. The ability for Iceberg readers to read Delta Lake DVs 3. The ability for Delta Lake readers to read Iceberg DVs The

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Péter Váry
+1 On Fri, Oct 18, 2024, 04:50 Manu Zhang wrote: > +1 > > On Fri, Oct 18, 2024 at 8:50 AM Rodrigo Meneses > wrote: > >> +1 >> >> On Thu, Oct 17, 2024 at 4:38 PM Bryan Keller wrote: >> >>> +1 >>> >>> >>> On Oct 17, 2024, at 1:51 PM, Anton Okolnychyi >>> wrote: >>> >>> +1 >>> >>> чт, 17 жовт. 2

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-17 Thread Sung Yun
Thank you for the clarification Daniel, and thank you Kevin for raising this issue! Does that mean that we are creating component schemas that are the superset of the V1 and V2 schemas? And if so, should we remove summary and manifest-list from the required properties, and add manifests optiona

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-17 Thread Daniel Weeks
Sung, I was thinking of v1, so you're right that manifest-list and summary are required as of v2. The REST Spec seems to follow the v2 definition, so I think we're somewhat implicitly requiring those fields via REST. Kevin, Based on the example metadata, that looks like it is not to spec, so it

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Ajantha Bhat
+1 for dropping it. On Thu, Oct 17, 2024 at 8:55 PM Daniel Weeks wrote: > +1 for deprecating and dropping > > On Thu, Oct 17, 2024 at 7:46 AM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> +1 for marking the project deprecated (in 1.7.0) and dropping it in the >> next release (1.8.

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-17 Thread Kevin Liu
> Based on the example metadata, that looks like it is not to spec, so it's reasonable that python would reject it. If the java implementation is allowing for that, it's likely that we're being too relaxed (possibly a holdover from v1 parsing). I believe the Java implementation is relaxing the con

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Russell Spitzer
+1 (oink) If anyone really cares please chime in but seriously we should drop it On Thu, Oct 17, 2024 at 8:07 AM Jean-Baptiste Onofré wrote: > Hi folks, > > Even if it seems the project is pretty close to 0.18 release, Apache > Pig is a "dormant" project. > > I would like to discuss here if it

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Eduard Tudenhöfner
+1 for marking the project deprecated (in 1.7.0) and dropping it in the next release (1.8.0) On Thu, Oct 17, 2024 at 4:36 PM Russell Spitzer wrote: > +1 (oink) > > If anyone really cares please chime in but seriously we should drop it > > On Thu, Oct 17, 2024 at 8:07 AM Jean-Baptiste Onofré > w

Re: [VOTE] Standardize vended credentials in OpenAPI spec

2024-10-17 Thread Prashant Singh
+1 (non-binding) On Thu, Oct 17, 2024 at 6:53 AM Amogh Jahagirdar <2am...@gmail.com> wrote: > +1 > > On Tue, Oct 15, 2024 at 3:03 PM Yufei Gu wrote: > >> +1 >> Yufei >> >> >> On Tue, Oct 15, 2024 at 12:09 PM Daniel Weeks wrote: >> >>> +1 >>> >>> On Tue, Oct 15, 2024 at 10:42 AM Russell Spitzer

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-17 Thread Daniel Weeks
I'm not convinced this is incorrect behavior (table spec or implementation), but it does lend to some confusion. The 'summary' field is optional, which means that if a summary is not provided, you do not have an associated 'operation' field. The 'operation' field is only required in the context o

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Daniel Weeks
+1 for deprecating and dropping On Thu, Oct 17, 2024 at 7:46 AM Eduard Tudenhöfner wrote: > +1 for marking the project deprecated (in 1.7.0) and dropping it in the > next release (1.8.0) > > On Thu, Oct 17, 2024 at 4:36 PM Russell Spitzer > wrote: > >> +1 (oink) >> >> If anyone really cares ple

Re: Spec changes for deletion vectors

2024-10-17 Thread Bart Samwel
I hope it's OK if I chime in. I'm one of the people responsible for the format for position deletes that is used in Delta Lake and I've been reading along with the discussion. Given that the main sticking point is whether this compatibility is worth the associated "not pure" spec, I figured that ma

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-17 Thread Kevin Liu
Thanks for the additional context. My understanding is that if a Snapshot has a `summary` field, it must also have a corresponding `operation` key in the summary map. Is that correct? Based on the `SnapshotParser`, this is not enforced [1]. The underlying issue in #1106 [2] is the missing `operat

Re: [Discuss] Iceberg View Interoperability

2024-10-17 Thread Daniel Weeks
Hey Ajantha, I think it's good to figure out a path forward for extending view support, but I'm not convinced using a procedure is a good idea or really moves things forward in that direction. As you already indicated, there are a number of different libraries to translate views, but of the vario

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-17 Thread Sung Yun
> As a side note, the `rest-catalog-open-api.yaml` file [2] in the Iceberg repo > contains the latest version of the spec. I think more clarity on this would be helpful. Is it really the case that the Open API spec contains the latest version of the spec? For example, I'm noticing a discrepancy

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread Yufei Gu
+1 for deprecating it in 1.7 Yufei On Thu, Oct 17, 2024 at 9:51 AM Ajantha Bhat wrote: > +1 for dropping it. > > On Thu, Oct 17, 2024 at 8:55 PM Daniel Weeks wrote: > >> +1 for deprecating and dropping >> >> On Thu, Oct 17, 2024 at 7:46 AM Eduard Tudenhöfner < >> etudenhoef...@apache.org> wrot

Re: [DISCUSS] Remove iceberg-pig module ?

2024-10-17 Thread John Zhuge
+1 (non-binding) On Thu, Oct 17, 2024 at 10:21 AM Yufei Gu wrote: > +1 for deprecating it in 1.7 > Yufei > > > On Thu, Oct 17, 2024 at 9:51 AM Ajantha Bhat > wrote: > >> +1 for dropping it. >> >> On Thu, Oct 17, 2024 at 8:55 PM Daniel Weeks wrote: >> >>> +1 for deprecating and dropping >>> >>>