Re: [VOTE] Release Apache Iceberg 1.3.0 RC0

2023-05-28 Thread Jean-Baptiste Onofré
+1 (non binding)

Regards
JB

On Tue, May 23, 2023 at 10:21 PM Anton Okolnychyi
 wrote:
>
> Hi Everyone,
>
> I propose that we release the following RC as the official Apache Iceberg 
> 1.3.0 release.
>
> The commit ID is 7dbdfd33a667a721fbb21c7c7d06fec9daa30b88
> * This corresponds to the tag: apache-iceberg-1.3.0-rc0
> * https://github.com/apache/iceberg/commits/apache-iceberg-1.3.0-rc0
> * 
> https://github.com/apache/iceberg/tree/7dbdfd33a667a721fbb21c7c7d06fec9daa30b88
>
> The release tarball, signature, and checksums are here:
> * https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.3.0-rc0
>
> You can find the KEYS file here:
> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
> Convenience binary artifacts are staged on Nexus. The Maven repository URL is:
> * https://repository.apache.org/content/repositories/orgapacheiceberg-1134/
>
> Please download, verify, and test.
>
> Please vote in the next 72 hours. (Weekends excluded)
>
> [ ] +1 Release this as Apache Iceberg 1.3.0
> [ ] +0
> [ ] -1 Do not release this because...
>
> Only PMC members have binding votes, but other community members are 
> encouraged to cast
> non-binding votes. This vote will pass if there are 3 binding +1 votes and 
> more binding
> +1 votes than -1 votes.
>
> - Anton


Re: 👋 Intro and question for the community

2023-06-06 Thread Jean-Baptiste Onofré
Hi Brian,

Can you please describe a bit what you mean by Common Room ?

At first glance, it looks like a good idea. However, from Apache
standpoint, it has to be approved by the PMC members. Did you request
so on the private mailing list ?

Regards
JB

On Thu, May 18, 2023 at 11:44 PM Brian Olsen  wrote:
>
> Hey all,
> My name is Brian and I'm the new Head of Developer Relations working at 
> Tabular. I'd like to set up Common Room for us to have a bit of a pulse on 
> the community. I would like to see if the community is interested in enabling 
> read-only permissions for the apache/iceberg and apache/icberg-docs for the 
> GitHub integration. Here's how the information would be used:
>
> Triage issues and PRs
> Learn ways to improve developer/contributor experience in the community
> Understand which PRs and issues are not getting attention and why
> Set alerts and notifications for the Developer Relations team to follow up on 
> issues to help drive changes in Iceberg
> Metrics reporting to showcase Iceberg usage to drive further adoption and 
> interest in Iceberg
> Gaining a better understanding of the ways people use Iceberg and the 
> features they are interested in
> Showcase the diversity of contributions the Iceberg project
>
> Is everyone okay with me setting this up so I can help the community with 
> things like roadmap updates and making sure we follow up on reviews?


Re: 👋 Intro and question for the community

2023-06-06 Thread Jean-Baptiste Onofré
Hi Brian,

Thanks, CommonRoom looks interesting, I will take a look :)

If we have consensus from PMC members, it's OK, no need to do a formal
vote (As Apache member, I have access to the Iceberg private mailing
list, and I was surprised to have not seen this discussion here
first).

I agree with Ryan and you: having this discussion public makes sense.
As we say at Apache: "if it didn't happen on the mailing list, it
never happened" :)

Thanks again,
Regards
JB

On Tue, Jun 6, 2023 at 1:22 PM Brian Olsen  wrote:
>
> Hi Jean-Baptiste,
>
> Common Room https://www.commonroom.io/, is an application used to 
> comprehensively understand activities happening across a community so that a 
> team focusing on developer relations can better respond to issues, understand 
> where bottlenecks exist, and many other potential applications around 
> optimizing releases and developer experience.
>
> Three of the PMC have already replied to this list and after talking about it 
> with Ryan Blue he said this discussion would make more sense in a public 
> forum for all to see. That said I’m more than happy off the PMC would like to 
> make a formal vote around adding this capability if any of them feel that is 
> necessary.
>
> On Tue, Jun 6, 2023 at 4:30 AM Jean-Baptiste Onofré  wrote:
>>
>> Hi Brian,
>>
>> Can you please describe a bit what you mean by Common Room ?
>>
>> At first glance, it looks like a good idea. However, from Apache
>> standpoint, it has to be approved by the PMC members. Did you request
>> so on the private mailing list ?
>>
>> Regards
>> JB
>>
>> On Thu, May 18, 2023 at 11:44 PM Brian Olsen  wrote:
>> >
>> > Hey all,
>> > My name is Brian and I'm the new Head of Developer Relations working at 
>> > Tabular. I'd like to set up Common Room for us to have a bit of a pulse on 
>> > the community. I would like to see if the community is interested in 
>> > enabling read-only permissions for the apache/iceberg and 
>> > apache/icberg-docs for the GitHub integration. Here's how the information 
>> > would be used:
>> >
>> > Triage issues and PRs
>> > Learn ways to improve developer/contributor experience in the community
>> > Understand which PRs and issues are not getting attention and why
>> > Set alerts and notifications for the Developer Relations team to follow up 
>> > on issues to help drive changes in Iceberg
>> > Metrics reporting to showcase Iceberg usage to drive further adoption and 
>> > interest in Iceberg
>> > Gaining a better understanding of the ways people use Iceberg and the 
>> > features they are interested in
>> > Showcase the diversity of contributions the Iceberg project
>> >
>> > Is everyone okay with me setting this up so I can help the community with 
>> > things like roadmap updates and making sure we follow up on reviews?


Re: [DISCUSS] June board report

2023-06-15 Thread Jean-Baptiste Onofré
Hi Ryan,

It looks good to me. Thanks !

NB: maybe for the next report, we can provide some inputs about
community activity (talks/meetups at CommunityOverCode conf, community
events, ...).

Thanks,
Regards
JB

On Wed, Jun 14, 2023 at 3:51 AM Ryan Blue  wrote:
>
> Hi everyone,
>
> Here’s our draft for the June board report. Please comment on this thread if 
> you’d like to add anything!
>
> Ryan
>
> Description:
>
> Apache Iceberg is a table format for huge analytic datasets that is designed
> for high performance and ease of use.
>
> Project Status:
>
> Current project status: Ongoing
> Issues for the board: none
>
> Membership Data:
>
> Apache Iceberg was founded 2020-05-19 (3 years ago)
> There are currently 24 committers and 16 PMC members in this project.
> The Committer-to-PMC ratio is 3:2.
>
> Community changes, past quarter:
>
> Fokko Driesprong was added to the PMC on 2023-04-06
> Steven Wu was added to the PMC on 2023-04-06
> Szehon Ho was added to the PMC on 2023-04-20
> Yufei Gu was added to the PMC on 2023-04-06
> Amogh Jahagirdar was added as committer on 2023-04-25
> Eduard Tudenhoefner was added as committer on 2023-04-25
>
> Project Activity:
>
> 1.3.0 was released on 2023-05-26
> 1.2.1 was released on 2023-04-01
> 1.2.0 was released on 2023-03-20
>
> The 1.3.0 release added support for Spark 3.4 and Flink 1.17. It also included
> several updates and fixes, including:
>
> Better Spark file distribution for row-level plans like MERGE
> Improved bit density in the object storage layout
> Readable metrics in metadata tables
> Optimized vectorized reads for decimal types
> Spark timestamp_ntz and UUID support
>
> The Python implementation is nearing an 0.4.0 release that will include:
>
> Delete file support
> Metadata updates for tables
> Improved compatibility
>
> The community is also continuing to build a view specification, expand REST
> catalog support, and add encryption to the table spec.
>
> Community Health:
>
> The community continues to be healthy, with most metrics steady this quarter.
>
> --
> Ryan Blue
> Tabular


Re: [DISCUSS] June board report

2023-06-19 Thread Jean-Baptiste Onofré
I guess we will be present (talk and people) during CommunityOverCode
NA (I will be there at least), so maybe mentioning this to the board
could be interesting.

We will share more about community activity and updates when we have
concrete stuff to share with the board.

Regards
JB

On Thu, Jun 15, 2023 at 6:26 PM Ryan Blue  wrote:
>
> Thanks JB. Do you have any community activity to highlight this time?
>
> On Thu, Jun 15, 2023 at 9:10 AM Jean-Baptiste Onofré  
> wrote:
>>
>> Hi Ryan,
>>
>> It looks good to me. Thanks !
>>
>> NB: maybe for the next report, we can provide some inputs about
>> community activity (talks/meetups at CommunityOverCode conf, community
>> events, ...).
>>
>> Thanks,
>> Regards
>> JB
>>
>> On Wed, Jun 14, 2023 at 3:51 AM Ryan Blue  wrote:
>> >
>> > Hi everyone,
>> >
>> > Here’s our draft for the June board report. Please comment on this thread 
>> > if you’d like to add anything!
>> >
>> > Ryan
>> >
>> > Description:
>> >
>> > Apache Iceberg is a table format for huge analytic datasets that is 
>> > designed
>> > for high performance and ease of use.
>> >
>> > Project Status:
>> >
>> > Current project status: Ongoing
>> > Issues for the board: none
>> >
>> > Membership Data:
>> >
>> > Apache Iceberg was founded 2020-05-19 (3 years ago)
>> > There are currently 24 committers and 16 PMC members in this project.
>> > The Committer-to-PMC ratio is 3:2.
>> >
>> > Community changes, past quarter:
>> >
>> > Fokko Driesprong was added to the PMC on 2023-04-06
>> > Steven Wu was added to the PMC on 2023-04-06
>> > Szehon Ho was added to the PMC on 2023-04-20
>> > Yufei Gu was added to the PMC on 2023-04-06
>> > Amogh Jahagirdar was added as committer on 2023-04-25
>> > Eduard Tudenhoefner was added as committer on 2023-04-25
>> >
>> > Project Activity:
>> >
>> > 1.3.0 was released on 2023-05-26
>> > 1.2.1 was released on 2023-04-01
>> > 1.2.0 was released on 2023-03-20
>> >
>> > The 1.3.0 release added support for Spark 3.4 and Flink 1.17. It also 
>> > included
>> > several updates and fixes, including:
>> >
>> > Better Spark file distribution for row-level plans like MERGE
>> > Improved bit density in the object storage layout
>> > Readable metrics in metadata tables
>> > Optimized vectorized reads for decimal types
>> > Spark timestamp_ntz and UUID support
>> >
>> > The Python implementation is nearing an 0.4.0 release that will include:
>> >
>> > Delete file support
>> > Metadata updates for tables
>> > Improved compatibility
>> >
>> > The community is also continuing to build a view specification, expand REST
>> > catalog support, and add encryption to the table spec.
>> >
>> > Community Health:
>> >
>> > The community continues to be healthy, with most metrics steady this 
>> > quarter.
>> >
>> > --
>> > Ryan Blue
>> > Tabular
>
>
>
> --
> Ryan Blue
> Tabular


Re: [VOTE] Release PyIceberg 0.4.0 RC1

2023-06-26 Thread Jean-Baptiste Onofré
+1 (non binding)

Regards
JB

On Mon, Jun 26, 2023 at 11:27 AM Fokko Driesprong  wrote:
>
> Hi Everyone,
>
>
> Excited to start the 0.4.0 PyIceberg release process. The 0.4.0 release is 
> packed with cool features:
>
> Support for converting Parquet schemas into Iceberg ones
> Support for reading data using FSSpec.
> Support fetching a limited number of rows to quickly peek into a dataset.
> Reduced the number of calls to the object store with PyArrow>=12.0.0.
> Speed up queries using the Iceberg metrics.
> Ability to do SQL style filters: row_filter='passengers >= 3'.|
> SigV4 support for the REST catalog.
> A complete makeover of the docs site.
> Support for positional deletes.
> Ability to set table properties.
> And many bugs have been fixed!
>
>  I propose that we release the following RC as the official PyIceberg 0.4.0 
> release. The commit ID is e85ec9447c08c1a21e9ef21278f3237811f3f67f
>
>
> * This corresponds to the tag: pyiceberg-0.4.0rc1 
> (c3579a11b4bfa5387e313185e714c40a0ed1ccfe)
>
> * https://github.com/apache/iceberg/releases/tag/pyiceberg-0.4.0rc1
>
> * 
> https://github.com/apache/iceberg/tree/e85ec9447c08c1a21e9ef21278f3237811f3f67f
>
>
> The release tarball, signature, and checksums are here:
>
>
> * https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.4.0rc1/
>
>
> You can find the KEYS file here:
>
>
> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
>
> Convenience binary artifacts are staged on pypi:
>
>
> https://pypi.org/project/pyiceberg/0.4.0rc1/
>
>
> And can be installed using: pip3 install pyiceberg==0.4.0rc1
>
>
> Please download, verify, and test.
>
>
> Please vote in the next 72 hours.
>
> [ ] +1 Release this as PyIceberg 0.4.0
>
> [ ] +0
>
> [ ] -1 Do not release this because...
>
>
> Please consider this email a +1 from my side:
>
> Ran some basic table scans
>
> Including tables with positional deletes
>
> Checked to see if everything still works when PyArrow is not installed
> Set some table properties
>
> Kind regards,
>
> Fokko


Re: [VOTE] Release PyIceberg 0.4.0 RC2

2023-06-27 Thread Jean-Baptiste Onofré
+1 (non binding)

I did quick tests and it looks good. Thanks!

Regards
JB

On Tue, Jun 27, 2023 at 10:37 PM Fokko Driesprong  wrote:
>
> All,
>
>
> Excited to start the 0.4.0 PyIceberg release process. The 0.4.0 release is 
> packed with awesome features:
>
> Support for converting Parquet schemas into Iceberg ones
> Support for reading data using FSSpec.
> Support fetching a limited number of rows to quickly peek into a dataset.
> Reduced the number of calls to the object store with PyArrow>=12.0.0.
> Speed up queries using the Iceberg metrics.
> Ability to do SQL style filters: row_filter='passengers >= 3'.|
> SigV4 support for the REST catalog.
> A complete makeover of the docs site.
> Support for positional deletes.
> Ability to set table properties.
> And many bugs have been fixed!
>
> I propose that we release the following RC as the official PyIceberg 0.4.0 
> release. The commit ID is 51eaf6806361e6e0a5cd163071dce684ec05350b
>
>
> * This corresponds to the tag: pyiceberg-0.4.0rc2 
> (f81c759835672e956c71280394f432463d25463c)
>
> * https://github.com/apache/iceberg/releases/tag/pyiceberg-0.4.0rc2
>
> * 
> https://github.com/apache/iceberg/tree/51eaf6806361e6e0a5cd163071dce684ec05350b
>
>
> The release tarball, signature, and checksums are here:
>
>
> * https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.4.0rc2/
>
>
> You can find the KEYS file here:
>
>
> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
>
> Convenience binary artifacts are staged on pypi:
>
>
> https://pypi.org/project/pyiceberg/0.4.0rc2/
>
>
> And can be installed using: pip3 install pyiceberg==0.4.0rc2
>
>
> Please download, verify, and test.
>
>
> Please vote in the next 72 hours.
>
> [ ] +1 Release this as PyIceberg 0.4.0
>
> [ ] +0
>
> [ ] -1 Do not release this because...
>
>
> Please consider this email a +1 from my side:
>
> Ran some basic table scans
>
> Including tables with positional deletes
>
> Checked to see if everything still works when PyArrow is not installed
> Set some table properties
>
> Kind regards,
>
> Fokko


[PROPOSAL] Preparing first Apache Iceberg Summit

2023-07-05 Thread Jean-Baptiste Onofré
Hi everyone,

I started a discussion on the private mailing list, and, as there are
no objections from the PMC members, I'm moving the thread to the dev
mailing list.

I propose to organize the first Apache Iceberg Summit \o/

For the format, I think the best option is a virtual event with a mix of:
1. Dev community talks: architecture, roadmap, features, use in "products", ...
2. User community talks: companies could present their use cases, best
practices, ...

In terms of organization:
1. no objection so far from the PMC members to use Apache Iceberg
Summit name. If it works for everyone, I will send a message to the
Apache Publicity & Marketing to get their OK for the event.
 2. create two committees:
  2.1. the Sponsoring Committee gathering companies/organizations
wanting to sponsor the event
  2.2. the Program Committee gathers folks from the Iceberg community
(PMC/committers/contributors) to select talks.

My company (Dremio) will “host” the event - i.e., provide funding, a
conference platform, sponsor logistics, speaker training, slide
design, etc..

In terms of dates, as CommunityOverCode Con NA will be in October, I
think January 2024 would work: it gives us time to organize smoothly,
promote the event, and not in a rush.

I propose:
1. to create the #summit channel on Iceberg Slack.
2. I will share a preparation document with a plan proposal.

Thoughts ?

Regards
JB


Re: [DISCUSS] Apache Iceberg Release 1.3.1

2023-07-06 Thread Jean-Baptiste Onofré
Hi,

It sounds good to me to have 1.3.1.

Thanks !
Regards
JB

On Fri, Jul 7, 2023 at 12:53 AM Szehon Ho  wrote:
>
> Hi
>
> I wanted to start a discussion for whether its the right time for 1.3.1, a 
> patch release of 1.3.0.  It was started based on the issue found by Xiangyang 
> (@ConeyLiu) : 
> https://github.com/apache/iceberg/pull/7931#pullrequestreview-1507935277.
>
> Do people have any other bug fixes that should be included?  Also let me 
> know, if anyone wants to be a release manager?  If not, I can give it a shot 
> as well.
>
> Thanks,
> Szehon


Re: [VOTE] Release Apache Iceberg 1.3.1 RC1

2023-07-19 Thread Jean-Baptiste Onofré
+1 (non binding)

I checked:
- hash & signature are OK
- LICENSE, NOTICE look good
- maven-rat-plugin checks are OK
- build worked

Thanks !
Regards
JB

On Mon, Jul 17, 2023 at 8:00 PM Szehon Ho  wrote:
>
> Hi Everyone,
>
> I propose that we release the following RC as the official Apache Iceberg 
> 1.3.1 release.
>
> The commit ID is 62c34711c3f22e520db65c51255512f6cfe622c4
> * This corresponds to the tag: apache-iceberg-1.3.1-rc1
> * https://github.com/apache/iceberg/commits/apache-iceberg-1.3.1-rc1
> * 
> https://github.com/apache/iceberg/tree/62c34711c3f22e520db65c51255512f6cfe622c4
>
> The release tarball, signature, and checksums are here:
> * https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.3.1-rc1
>
> You can find the KEYS file here:
> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
> Convenience binary artifacts are staged on Nexus. The Maven repository URL is:
> * https://repository.apache.org/content/repositories/orgapacheiceberg-1141/
>
> This release includes several important bug fixes over 1.3.0, including:
> * Fix Spark RewritePositionDeleteFiles failure for certain partition types 
> (#8059)
> * Fix Spark RewriteDataFiles concurrency edge-case on commit timeouts (#7933)
> * Table Metadata parser now accepts null current-snapshot-id, properties, 
> snapshots fields (#8064)
> * FlinkCatalog creation no longer creates the default database (#8039)
> * Fix loading certain V1 table branch snapshots using snapshot references 
> (#7621)
> * Fix Spark partition-level DELETE operations for WAP branches (#7900)
> * Fix HiveCatalog deleting metadata on failures in checking lock status 
> (#7931)
>
> Please download, verify, and test.
>
> Please vote in the next 72 hours. (Weekends excluded)
>
> [ ] +1 Release this as Apache Iceberg 1.3.1
> [ ] +0
> [ ] -1 Do not release this because...
>
> Only PMC members have binding votes, but other community members are 
> encouraged to cast
> non-binding votes. This vote will pass if there are 3 binding +1 votes and 
> more binding
> +1 votes than -1 votes.
>
> Thanks
> Szehon


Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-07-19 Thread Jean-Baptiste Onofré
Hi guys,

Following the previous email about Apache Iceberg Summit, please find
a document introducing the summit organization:

https://docs.google.com/presentation/d/1iy2-WdVQYTwJOrwi7pFYh_x9xHNuGV5lT1yK1g194To/edit?usp=sharing

I'm kindly doing a Call For Action: anyone interested to help in the
organization and participate to the committees, please let me know.
I would like to schedule a meeting with all interested parties.

Thanks !

Regards
JB

On Wed, Jul 5, 2023 at 4:37 PM Jean-Baptiste Onofré  wrote:
>
> Hi everyone,
>
> I started a discussion on the private mailing list, and, as there are
> no objections from the PMC members, I'm moving the thread to the dev
> mailing list.
>
> I propose to organize the first Apache Iceberg Summit \o/
>
> For the format, I think the best option is a virtual event with a mix of:
> 1. Dev community talks: architecture, roadmap, features, use in "products", 
> ...
> 2. User community talks: companies could present their use cases, best
> practices, ...
>
> In terms of organization:
> 1. no objection so far from the PMC members to use Apache Iceberg
> Summit name. If it works for everyone, I will send a message to the
> Apache Publicity & Marketing to get their OK for the event.
>  2. create two committees:
>   2.1. the Sponsoring Committee gathering companies/organizations
> wanting to sponsor the event
>   2.2. the Program Committee gathers folks from the Iceberg community
> (PMC/committers/contributors) to select talks.
>
> My company (Dremio) will “host” the event - i.e., provide funding, a
> conference platform, sponsor logistics, speaker training, slide
> design, etc..
>
> In terms of dates, as CommunityOverCode Con NA will be in October, I
> think January 2024 would work: it gives us time to organize smoothly,
> promote the event, and not in a rush.
>
> I propose:
> 1. to create the #summit channel on Iceberg Slack.
> 2. I will share a preparation document with a plan proposal.
>
> Thoughts ?
>
> Regards
> JB


Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-07-19 Thread Jean-Baptiste Onofré
Awesome ! Let's wait few days to know who wants to participate, then I
will schedule the meeting (probably the first week of August).

Thanks all !

Regards
JB

On Wed, Jul 19, 2023 at 6:13 PM Jack Ye  wrote:
>
> +1. Let me know when the meeting is!
>
> -Jack
>
> On Wed, Jul 19, 2023 at 8:54 AM Russell Spitzer  
> wrote:
>>
>> I would love to be involved if possible. I'm a bit short on time though but 
>> can definitely contribute async time to planning.
>>
>> On Wed, Jul 19, 2023 at 9:35 AM Jean-Baptiste Onofré  
>> wrote:
>>>
>>> Hi guys,
>>>
>>> Following the previous email about Apache Iceberg Summit, please find
>>> a document introducing the summit organization:
>>>
>>> https://docs.google.com/presentation/d/1iy2-WdVQYTwJOrwi7pFYh_x9xHNuGV5lT1yK1g194To/edit?usp=sharing
>>>
>>> I'm kindly doing a Call For Action: anyone interested to help in the
>>> organization and participate to the committees, please let me know.
>>> I would like to schedule a meeting with all interested parties.
>>>
>>> Thanks !
>>>
>>> Regards
>>> JB
>>>
>>> On Wed, Jul 5, 2023 at 4:37 PM Jean-Baptiste Onofré  
>>> wrote:
>>> >
>>> > Hi everyone,
>>> >
>>> > I started a discussion on the private mailing list, and, as there are
>>> > no objections from the PMC members, I'm moving the thread to the dev
>>> > mailing list.
>>> >
>>> > I propose to organize the first Apache Iceberg Summit \o/
>>> >
>>> > For the format, I think the best option is a virtual event with a mix of:
>>> > 1. Dev community talks: architecture, roadmap, features, use in 
>>> > "products", ...
>>> > 2. User community talks: companies could present their use cases, best
>>> > practices, ...
>>> >
>>> > In terms of organization:
>>> > 1. no objection so far from the PMC members to use Apache Iceberg
>>> > Summit name. If it works for everyone, I will send a message to the
>>> > Apache Publicity & Marketing to get their OK for the event.
>>> >  2. create two committees:
>>> >   2.1. the Sponsoring Committee gathering companies/organizations
>>> > wanting to sponsor the event
>>> >   2.2. the Program Committee gathers folks from the Iceberg community
>>> > (PMC/committers/contributors) to select talks.
>>> >
>>> > My company (Dremio) will “host” the event - i.e., provide funding, a
>>> > conference platform, sponsor logistics, speaker training, slide
>>> > design, etc..
>>> >
>>> > In terms of dates, as CommunityOverCode Con NA will be in October, I
>>> > think January 2024 would work: it gives us time to organize smoothly,
>>> > promote the event, and not in a rush.
>>> >
>>> > I propose:
>>> > 1. to create the #summit channel on Iceberg Slack.
>>> > 2. I will share a preparation document with a plan proposal.
>>> >
>>> > Thoughts ?
>>> >
>>> > Regards
>>> > JB


Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-07-24 Thread Jean-Baptiste Onofré
Ack. Thanks again all for your interest and willingness to help!

I will schedule a first preparation meeting next week (you will
receive an invite). I will send the meeting notes on the dev mailing
list anyway.

Stay tuned!

Regards
JB

On Thu, Jul 20, 2023 at 11:05 AM Ashok Krishna  wrote:
>
> Hello,
>
> I'm interested in helping in whatever way I can.Count me in.
>
> Thanks,
> Ashok
>
> On Thu, Jul 20, 2023 at 2:01 PM Eduard Tudenhoefner  wrote:
>>
>> Having an Iceberg conference sounds great. I'd love to help out here as 
>> well, so count me in!
>>
>> Eduard
>>
>> On Thu, Jul 20, 2023 at 12:18 AM Nan Zhu  wrote:
>>>
>>> Hello!
>>>
>>> Glad to help here and also present our use case on iceberg
>>>
>>> Thanks!
>>>
>>> Nan
>>>
>>> On Wed, Jul 19, 2023 at 3:00 PM Jay Dave  wrote:
>>>>
>>>> Hello JB:
>>>>
>>>> I am interested and help in whatever I can.
>>>>
>>>> Thanks
>>>> JD
>>>> 
>>>> From: Brian Olsen 
>>>> Sent: Wednesday, July 19, 2023 4:27 PM
>>>> To: dev@iceberg.apache.org 
>>>> Cc: Jean-Baptiste Onofré 
>>>> Subject: Re: [PROPOSAL] Preparing first Apache Iceberg Summit
>>>>
>>>> Hey JB,
>>>>
>>>> I would love to hop on a call and discuss how I can help as well. I've 
>>>> planned a couple of these before. :)
>>>>
>>>> On Wed, Jul 19, 2023 at 10:54 AM Russell Spitzer 
>>>>  wrote:
>>>>
>>>> I would love to be involved if possible. I'm a bit short on time though 
>>>> but can definitely contribute async time to planning.
>>>>
>>>> On Wed, Jul 19, 2023 at 9:35 AM Jean-Baptiste Onofré  
>>>> wrote:
>>>>
>>>> Hi guys,
>>>>
>>>> Following the previous email about Apache Iceberg Summit, please find
>>>> a document introducing the summit organization:
>>>>
>>>> https://docs.google.com/presentation/d/1iy2-WdVQYTwJOrwi7pFYh_x9xHNuGV5lT1yK1g194To/edit?usp=sharing
>>>>
>>>> I'm kindly doing a Call For Action: anyone interested to help in the
>>>> organization and participate to the committees, please let me know.
>>>> I would like to schedule a meeting with all interested parties.
>>>>
>>>> Thanks !
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On Wed, Jul 5, 2023 at 4:37 PM Jean-Baptiste Onofré  
>>>> wrote:
>>>> >
>>>> > Hi everyone,
>>>> >
>>>> > I started a discussion on the private mailing list, and, as there are
>>>> > no objections from the PMC members, I'm moving the thread to the dev
>>>> > mailing list.
>>>> >
>>>> > I propose to organize the first Apache Iceberg Summit \o/
>>>> >
>>>> > For the format, I think the best option is a virtual event with a mix of:
>>>> > 1. Dev community talks: architecture, roadmap, features, use in 
>>>> > "products", ...
>>>> > 2. User community talks: companies could present their use cases, best
>>>> > practices, ...
>>>> >
>>>> > In terms of organization:
>>>> > 1. no objection so far from the PMC members to use Apache Iceberg
>>>> > Summit name. If it works for everyone, I will send a message to the
>>>> > Apache Publicity & Marketing to get their OK for the event.
>>>> >  2. create two committees:
>>>> >   2.1. the Sponsoring Committee gathering companies/organizations
>>>> > wanting to sponsor the event
>>>> >   2.2. the Program Committee gathers folks from the Iceberg community
>>>> > (PMC/committers/contributors) to select talks.
>>>> >
>>>> > My company (Dremio) will “host” the event - i.e., provide funding, a
>>>> > conference platform, sponsor logistics, speaker training, slide
>>>> > design, etc..
>>>> >
>>>> > In terms of dates, as CommunityOverCode Con NA will be in October, I
>>>> > think January 2024 would work: it gives us time to organize smoothly,
>>>> > promote the event, and not in a rush.
>>>> >
>>>> > I propose:
>>>> > 1. to create the #summit channel on Iceberg Slack.
>>>> > 2. I will share a preparation document with a plan proposal.
>>>> >
>>>> > Thoughts ?
>>>> >
>>>> > Regards
>>>> > JB


Re: Thoughts on EOL Strategy for Our Releases?

2023-07-25 Thread Jean-Baptiste Onofré
Hi,

At Apache, a strong EOL/LTS policy doesn't really exist: anyone can
cut a new release on a very old branch as soon as it's voted by at
least three binding votes.

That said, a lot of Apache projects have EOL/LTS policy defined in the project:
- for instance Apache Camel has LTS branches
(https://camel.apache.org/download/)
- Apache Karaf talks more about active/non active branches defining
which branches are still "maintained/active"
(https://karaf.apache.org/download.html)

I think it would be good to have EOL/LTS details for Iceberg in
https://iceberg.apache.org/releases/.

Before that, we should have a consensus about the policy :)
Correct me if I'm wrong, but we have only "one active branch" which is
basically main (where we cut releases). Do we plan to have multiple
active branches (for instance 1.3.x and 1.4.x) ?
Do we plan to flag some releases at LTS ?

Regards
JB

On Mon, Jul 24, 2023 at 8:08 PM Yufei Gu  wrote:
>
> Hi folks,
>
> I was thinking about how we handle the Iceberg releases, and it seems that we 
> don't currently have a clear EOL (End of Life) strategy in place. At least, 
> we don't specify an EOL timeline for our releases on our official page 
> (https://iceberg.apache.org/releases). I believe it would be helpful for our 
> users if we could indicate when a particular release will no longer be 
> supported or receive updates.
>
> What do you think about setting up an EOL policy? We could go for a 
> vote-based approach or have a fixed lifecycle for each release. Either way, 
> this could help our users plan their upgrades and keep their systems updated 
> more effectively.
>
> Looking forward to hearing your thoughts!
>
> Best,
>
> Yufei


Re: Thoughts on EOL Strategy for Our Releases?

2023-08-01 Thread Jean-Baptiste Onofré
Hi Ryan,

I agree about EOL, and actually it doesn't really exist for Apache
projects (anyone can trigger a release on an old branch if needed).
That's why in Karaf for instance (and few other Apache projects), I
prefer to use active/non active on branches, just to indicate to the
community that we have some branches where we are focusing (for bug
fixes or dep updates).
The "forks" made by organizations are under the responsibility of
those organizations, the Iceberg community decides the
active/non-active branches :) So I agree with you Ryan about
"community patch releases".

Regards
JB

On Wed, Jul 26, 2023 at 9:19 PM Ryan Blue  wrote:
>
> My main question is whether an EOL policy provides additional value. Most 
> projects that provide EOL guidance (say Spark and Java) aren't like Iceberg. 
> They are software projects that are directly deployed by people using them 
> and they are generally hard to upgrade (or downgrade) incrementally. Iceberg, 
> on the other hand, is a library that makes strong guarantees about the format 
> it produces and consumes:
> * Iceberg is deployed bundled with services like EMR or in releases of Trino 
> that manage updates
> * When Iceberg is depended upon directly, it is usually easy to upgrade 
> incrementally across jobs, rather than all at once
> * Iceberg provides forward- and backward-compatibility guarantees to ensure 
> that older versions continue to work alongside newer ones
>
> As a result, most organizations I know of have several versions of Iceberg 
> deployed at the same time and people are generally able to upgrade if there's 
> something in a newer release that they need. Given that flexibility, I think 
> it's fairly easy to upgrade and keeping on a particular minor release for a 
> long time isn't nearly as important as for Java or Spark. That's why I'm 
> wondering if it would actually be helpful for us to provide an EOL policy.
>
> I think the counter-argument to this is that some organizations branch 
> Iceberg and maintain those branches for a long time. But in those cases, do 
> we need the community to continue maintaining the base of that branch? I did 
> this for a long time at Netflix and I'm not convinced that community patch 
> releases are needed.
>
> JB, we do have multiple branches and release older versions from time to time 
> if there is a serious enough bug to warrant a patch release.
>
> Ryan
>
> On Tue, Jul 25, 2023 at 10:37 PM Jean-Baptiste Onofré  
> wrote:
>>
>> Hi,
>>
>> At Apache, a strong EOL/LTS policy doesn't really exist: anyone can
>> cut a new release on a very old branch as soon as it's voted by at
>> least three binding votes.
>>
>> That said, a lot of Apache projects have EOL/LTS policy defined in the 
>> project:
>> - for instance Apache Camel has LTS branches
>> (https://camel.apache.org/download/)
>> - Apache Karaf talks more about active/non active branches defining
>> which branches are still "maintained/active"
>> (https://karaf.apache.org/download.html)
>>
>> I think it would be good to have EOL/LTS details for Iceberg in
>> https://iceberg.apache.org/releases/.
>>
>> Before that, we should have a consensus about the policy :)
>> Correct me if I'm wrong, but we have only "one active branch" which is
>> basically main (where we cut releases). Do we plan to have multiple
>> active branches (for instance 1.3.x and 1.4.x) ?
>> Do we plan to flag some releases at LTS ?
>>
>> Regards
>> JB
>>
>> On Mon, Jul 24, 2023 at 8:08 PM Yufei Gu  wrote:
>> >
>> > Hi folks,
>> >
>> > I was thinking about how we handle the Iceberg releases, and it seems that 
>> > we don't currently have a clear EOL (End of Life) strategy in place. At 
>> > least, we don't specify an EOL timeline for our releases on our official 
>> > page (https://iceberg.apache.org/releases). I believe it would be helpful 
>> > for our users if we could indicate when a particular release will no 
>> > longer be supported or receive updates.
>> >
>> > What do you think about setting up an EOL policy? We could go for a 
>> > vote-based approach or have a fixed lifecycle for each release. Either 
>> > way, this could help our users plan their upgrades and keep their systems 
>> > updated more effectively.
>> >
>> > Looking forward to hearing your thoughts!
>> >
>> > Best,
>> >
>> > Yufei
>
>
>
> --
> Ryan Blue
> Tabular


Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-08-02 Thread Jean-Baptiste Onofré
Hi guys,

Just wanted to give you an update about the Iceberg Summit.

I'm working on the document with a more concrete and complete
proposal. The purpose is to review and discuss the proposal during the
first meeting.
I will be on vacation for two weeks but I will work on this during break time.
As soon as I have the first proposal completed, I will send an invite
to all interested parties.

Thanks !
Regards
JB

On Mon, Jul 24, 2023 at 3:26 PM Jean-Baptiste Onofré  wrote:
>
> Ack. Thanks again all for your interest and willingness to help!
>
> I will schedule a first preparation meeting next week (you will
> receive an invite). I will send the meeting notes on the dev mailing
> list anyway.
>
> Stay tuned!
>
> Regards
> JB
>
> On Thu, Jul 20, 2023 at 11:05 AM Ashok Krishna  
> wrote:
> >
> > Hello,
> >
> > I'm interested in helping in whatever way I can.Count me in.
> >
> > Thanks,
> > Ashok
> >
> > On Thu, Jul 20, 2023 at 2:01 PM Eduard Tudenhoefner  
> > wrote:
> >>
> >> Having an Iceberg conference sounds great. I'd love to help out here as 
> >> well, so count me in!
> >>
> >> Eduard
> >>
> >> On Thu, Jul 20, 2023 at 12:18 AM Nan Zhu  wrote:
> >>>
> >>> Hello!
> >>>
> >>> Glad to help here and also present our use case on iceberg
> >>>
> >>> Thanks!
> >>>
> >>> Nan
> >>>
> >>> On Wed, Jul 19, 2023 at 3:00 PM Jay Dave  wrote:
> >>>>
> >>>> Hello JB:
> >>>>
> >>>> I am interested and help in whatever I can.
> >>>>
> >>>> Thanks
> >>>> JD
> >>>> 
> >>>> From: Brian Olsen 
> >>>> Sent: Wednesday, July 19, 2023 4:27 PM
> >>>> To: dev@iceberg.apache.org 
> >>>> Cc: Jean-Baptiste Onofré 
> >>>> Subject: Re: [PROPOSAL] Preparing first Apache Iceberg Summit
> >>>>
> >>>> Hey JB,
> >>>>
> >>>> I would love to hop on a call and discuss how I can help as well. I've 
> >>>> planned a couple of these before. :)
> >>>>
> >>>> On Wed, Jul 19, 2023 at 10:54 AM Russell Spitzer 
> >>>>  wrote:
> >>>>
> >>>> I would love to be involved if possible. I'm a bit short on time though 
> >>>> but can definitely contribute async time to planning.
> >>>>
> >>>> On Wed, Jul 19, 2023 at 9:35 AM Jean-Baptiste Onofré  
> >>>> wrote:
> >>>>
> >>>> Hi guys,
> >>>>
> >>>> Following the previous email about Apache Iceberg Summit, please find
> >>>> a document introducing the summit organization:
> >>>>
> >>>> https://docs.google.com/presentation/d/1iy2-WdVQYTwJOrwi7pFYh_x9xHNuGV5lT1yK1g194To/edit?usp=sharing
> >>>>
> >>>> I'm kindly doing a Call For Action: anyone interested to help in the
> >>>> organization and participate to the committees, please let me know.
> >>>> I would like to schedule a meeting with all interested parties.
> >>>>
> >>>> Thanks !
> >>>>
> >>>> Regards
> >>>> JB
> >>>>
> >>>> On Wed, Jul 5, 2023 at 4:37 PM Jean-Baptiste Onofré  
> >>>> wrote:
> >>>> >
> >>>> > Hi everyone,
> >>>> >
> >>>> > I started a discussion on the private mailing list, and, as there are
> >>>> > no objections from the PMC members, I'm moving the thread to the dev
> >>>> > mailing list.
> >>>> >
> >>>> > I propose to organize the first Apache Iceberg Summit \o/
> >>>> >
> >>>> > For the format, I think the best option is a virtual event with a mix 
> >>>> > of:
> >>>> > 1. Dev community talks: architecture, roadmap, features, use in 
> >>>> > "products", ...
> >>>> > 2. User community talks: companies could present their use cases, best
> >>>> > practices, ...
> >>>> >
> >>>> > In terms of organization:
> >>>> > 1. no objection so far from the PMC members to use Apache Iceberg
> >>>> > Summit name. If it works for everyone, I will send a message to the
> >>>> > Apache Publicity & Marketing to get their OK for the event.
> >>>> >  2. create two committees:
> >>>> >   2.1. the Sponsoring Committee gathering companies/organizations
> >>>> > wanting to sponsor the event
> >>>> >   2.2. the Program Committee gathers folks from the Iceberg community
> >>>> > (PMC/committers/contributors) to select talks.
> >>>> >
> >>>> > My company (Dremio) will “host” the event - i.e., provide funding, a
> >>>> > conference platform, sponsor logistics, speaker training, slide
> >>>> > design, etc..
> >>>> >
> >>>> > In terms of dates, as CommunityOverCode Con NA will be in October, I
> >>>> > think January 2024 would work: it gives us time to organize smoothly,
> >>>> > promote the event, and not in a rush.
> >>>> >
> >>>> > I propose:
> >>>> > 1. to create the #summit channel on Iceberg Slack.
> >>>> > 2. I will share a preparation document with a plan proposal.
> >>>> >
> >>>> > Thoughts ?
> >>>> >
> >>>> > Regards
> >>>> > JB


Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-08-03 Thread Jean-Baptiste Onofré
Hi Chen,

ACK, I will include you in the invite for the first meeting.

Thanks !
Regards
JB

On Wed, Aug 2, 2023 at 8:31 PM Chen Qin  wrote:
>
> Love to get involved and present our large scale CDC work.
>
> Chen
>
> On Wed, Aug 2, 2023 at 7:00 AM Jean-Baptiste Onofré  wrote:
>>
>> Hi guys,
>>
>> Just wanted to give you an update about the Iceberg Summit.
>>
>> I'm working on the document with a more concrete and complete
>> proposal. The purpose is to review and discuss the proposal during the
>> first meeting.
>> I will be on vacation for two weeks but I will work on this during break 
>> time.
>> As soon as I have the first proposal completed, I will send an invite
>> to all interested parties.
>>
>> Thanks !
>> Regards
>> JB
>>
>> On Mon, Jul 24, 2023 at 3:26 PM Jean-Baptiste Onofré  
>> wrote:
>> >
>> > Ack. Thanks again all for your interest and willingness to help!
>> >
>> > I will schedule a first preparation meeting next week (you will
>> > receive an invite). I will send the meeting notes on the dev mailing
>> > list anyway.
>> >
>> > Stay tuned!
>> >
>> > Regards
>> > JB
>> >
>> > On Thu, Jul 20, 2023 at 11:05 AM Ashok Krishna  
>> > wrote:
>> > >
>> > > Hello,
>> > >
>> > > I'm interested in helping in whatever way I can.Count me in.
>> > >
>> > > Thanks,
>> > > Ashok
>> > >
>> > > On Thu, Jul 20, 2023 at 2:01 PM Eduard Tudenhoefner  
>> > > wrote:
>> > >>
>> > >> Having an Iceberg conference sounds great. I'd love to help out here as 
>> > >> well, so count me in!
>> > >>
>> > >> Eduard
>> > >>
>> > >> On Thu, Jul 20, 2023 at 12:18 AM Nan Zhu  wrote:
>> > >>>
>> > >>> Hello!
>> > >>>
>> > >>> Glad to help here and also present our use case on iceberg
>> > >>>
>> > >>> Thanks!
>> > >>>
>> > >>> Nan
>> > >>>
>> > >>> On Wed, Jul 19, 2023 at 3:00 PM Jay Dave  
>> > >>> wrote:
>> > >>>>
>> > >>>> Hello JB:
>> > >>>>
>> > >>>> I am interested and help in whatever I can.
>> > >>>>
>> > >>>> Thanks
>> > >>>> JD
>> > >>>> 
>> > >>>> From: Brian Olsen 
>> > >>>> Sent: Wednesday, July 19, 2023 4:27 PM
>> > >>>> To: dev@iceberg.apache.org 
>> > >>>> Cc: Jean-Baptiste Onofré 
>> > >>>> Subject: Re: [PROPOSAL] Preparing first Apache Iceberg Summit
>> > >>>>
>> > >>>> Hey JB,
>> > >>>>
>> > >>>> I would love to hop on a call and discuss how I can help as well. 
>> > >>>> I've planned a couple of these before. :)
>> > >>>>
>> > >>>> On Wed, Jul 19, 2023 at 10:54 AM Russell Spitzer 
>> > >>>>  wrote:
>> > >>>>
>> > >>>> I would love to be involved if possible. I'm a bit short on time 
>> > >>>> though but can definitely contribute async time to planning.
>> > >>>>
>> > >>>> On Wed, Jul 19, 2023 at 9:35 AM Jean-Baptiste Onofré 
>> > >>>>  wrote:
>> > >>>>
>> > >>>> Hi guys,
>> > >>>>
>> > >>>> Following the previous email about Apache Iceberg Summit, please find
>> > >>>> a document introducing the summit organization:
>> > >>>>
>> > >>>> https://docs.google.com/presentation/d/1iy2-WdVQYTwJOrwi7pFYh_x9xHNuGV5lT1yK1g194To/edit?usp=sharing
>> > >>>>
>> > >>>> I'm kindly doing a Call For Action: anyone interested to help in the
>> > >>>> organization and participate to the committees, please let me know.
>> > >>>> I would like to schedule a meeting with all interested parties.
>> > >>>>
>> > >>>> Thanks !
>> > >>>>
>> > >>>> Regards
>> > >>>> JB
>> > >>>>
>> > &

Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-08-20 Thread Jean-Baptiste Onofré
Hi guys,

I'm back from vacation and I'm resuming the work on the Iceberg Summit
proposal doc. I will share the doc asap.

Regards
JB

On Wed, Jul 5, 2023 at 4:37 PM Jean-Baptiste Onofré  wrote:
>
> Hi everyone,
>
> I started a discussion on the private mailing list, and, as there are
> no objections from the PMC members, I'm moving the thread to the dev
> mailing list.
>
> I propose to organize the first Apache Iceberg Summit \o/
>
> For the format, I think the best option is a virtual event with a mix of:
> 1. Dev community talks: architecture, roadmap, features, use in "products", 
> ...
> 2. User community talks: companies could present their use cases, best
> practices, ...
>
> In terms of organization:
> 1. no objection so far from the PMC members to use Apache Iceberg
> Summit name. If it works for everyone, I will send a message to the
> Apache Publicity & Marketing to get their OK for the event.
>  2. create two committees:
>   2.1. the Sponsoring Committee gathering companies/organizations
> wanting to sponsor the event
>   2.2. the Program Committee gathers folks from the Iceberg community
> (PMC/committers/contributors) to select talks.
>
> My company (Dremio) will “host” the event - i.e., provide funding, a
> conference platform, sponsor logistics, speaker training, slide
> design, etc..
>
> In terms of dates, as CommunityOverCode Con NA will be in October, I
> think January 2024 would work: it gives us time to organize smoothly,
> promote the event, and not in a rush.
>
> I propose:
> 1. to create the #summit channel on Iceberg Slack.
> 2. I will share a preparation document with a plan proposal.
>
> Thoughts ?
>
> Regards
> JB


Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-08-23 Thread Jean-Baptiste Onofré
It sounds great :)

I will include a note in the proposal doc about that.

Regards
JB

Le mer. 23 août 2023 à 14:14, Brian Olsen  a
écrit :

> Out of curiosity, is anyone strongly opposed to doing antics like this for
> summits?
>
> https://youtube.com/playlist?list=PLFnr63che7wYFsknFAqisURvfm96rW0Dr
>
>
> On Mon, Aug 21, 2023 at 6:58 PM Matt Topol  wrote:
>
>> I don't think I'll have much time to contribute to help, but I would
>> absolutely help if possible.
>>
>> That said, I'll definitely want to give a talk / speak at this summit
>> when it happens :)
>>
>> On Mon, Aug 21, 2023 at 1:38 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi guys,
>>>
>>> I'm back from vacation and I'm resuming the work on the Iceberg Summit
>>> proposal doc. I will share the doc asap.
>>>
>>> Regards
>>> JB
>>>
>>> On Wed, Jul 5, 2023 at 4:37 PM Jean-Baptiste Onofré 
>>> wrote:
>>> >
>>> > Hi everyone,
>>> >
>>> > I started a discussion on the private mailing list, and, as there are
>>> > no objections from the PMC members, I'm moving the thread to the dev
>>> > mailing list.
>>> >
>>> > I propose to organize the first Apache Iceberg Summit \o/
>>> >
>>> > For the format, I think the best option is a virtual event with a mix
>>> of:
>>> > 1. Dev community talks: architecture, roadmap, features, use in
>>> "products", ...
>>> > 2. User community talks: companies could present their use cases, best
>>> > practices, ...
>>> >
>>> > In terms of organization:
>>> > 1. no objection so far from the PMC members to use Apache Iceberg
>>> > Summit name. If it works for everyone, I will send a message to the
>>> > Apache Publicity & Marketing to get their OK for the event.
>>> >  2. create two committees:
>>> >   2.1. the Sponsoring Committee gathering companies/organizations
>>> > wanting to sponsor the event
>>> >   2.2. the Program Committee gathers folks from the Iceberg community
>>> > (PMC/committers/contributors) to select talks.
>>> >
>>> > My company (Dremio) will “host” the event - i.e., provide funding, a
>>> > conference platform, sponsor logistics, speaker training, slide
>>> > design, etc..
>>> >
>>> > In terms of dates, as CommunityOverCode Con NA will be in October, I
>>> > think January 2024 would work: it gives us time to organize smoothly,
>>> > promote the event, and not in a rush.
>>> >
>>> > I propose:
>>> > 1. to create the #summit channel on Iceberg Slack.
>>> > 2. I will share a preparation document with a plan proposal.
>>> >
>>> > Thoughts ?
>>> >
>>> > Regards
>>> > JB
>>>
>>


Re: two proposed spec changes

2023-08-28 Thread Jean-Baptiste Onofré
Hi Jacob

I agree with 1, it makes sense to use nanosecond precision. IMHO it
should be available for both timestamp and timestamptz.

For 2, I'm not sure. Let's see what the others think.

Regards
JB

On Wed, Aug 23, 2023 at 10:17 PM Jacob Marble
 wrote:
>
> Good afternoon,
>
> I would like to propose two changes to the Iceberg spec:
>
> 1) Primitive types time, timestamp, timestamptz gain property "precision", 
> with three possible values: millis, micros, nanos (borrowing the list from 
> Parquet). The stringified type names would be extended to time[nanos], 
> timestamp[millis], timestamptz[micros], allowing for easy fallback to micros 
> whenever the suffix is not present.
>
> For this proposal, here is a diff demonstrating the idea just a bit.
>
> 2) Identifier fields allowed to be optional. From the spec "it is the 
> responsibility of processing engines or data providers to enforce" which 
> means that any such provider could limit the use of optional identifiers, 
> just as they may limit particular data types or file formats.
>
> To be clear, the spec currently reads "Float, double, and optional fields 
> cannot be used as identifier fields and a nested field cannot be used as an 
> identifier field if it is nested in an optional struct, to avoid null values 
> in identifiers." and I propose "Float and double fields cannot be used as 
> identifier fields."
>
> - What do people think of these two proposed changes?
> - What can I do next?
>
> The spec mentions v3; is there a plan for a v3 release yet? I saw a 
> conversation about enabling v2 by default, so I assume v3 is a ways off yet.
>
> --
> Jacob Marble
> 🇺🇸 🇺🇦


Re: September board report

2023-09-07 Thread Jean-Baptiste Onofré
Hi Ryan

It looks good to me. About the conference (summit/meetup), I will add
as soon as we have concrete plans (still working on the Summit
proposal doc).

Thanks !

Regards
JB

On Thu, Sep 7, 2023 at 6:30 PM Ryan Blue  wrote:
>
> Hi everyone,
>
> Here’s my draft for the September Iceberg board report. Let me know if you’d 
> like to add anything!
>
> I know that JB wanted to add conference talks last time, but I’m not aware of 
> any that have happened this quarter. If you’ve given a talk recently, please 
> let me know!
>
> Ryan
>
> Description:
>
> Apache Iceberg is a table format for huge analytic datasets that is designed
> for high performance and ease of use.
>
> Project Status:
>
> Current project status: Ongoing
> Issues for the board: none
>
> Membership Data:
>
> Apache Iceberg was founded 2020-05-19 (3 years ago)
> There are currently 24 committers and 16 PMC members in this project.
> The Committer-to-PMC ratio is 3:2.
>
> Community changes, past quarter:
>
> No new PMC members. Last addition was Szehon Ho on 2023-04-20.
> No new committers. Last addition was Amogh Jahagirdar on 2023-04-25.
>
> Project Activity:
>
> Releases:
>
> PyIcberg 0.4.0 was released on 2023-07-23
> 1.3.1 was released on 2023-07-25
>
> Java:
>
> Preparing for a 1.4.0 release in Sept/Oct
> Added dependency bundles for AWS, GCP, and Azure
> Added Azure FileIO implementation
> Added API for multi-table commits
> Performance optimizations for delete file scan planning
> Spark: Implemented adaptive split sizing
> Spark: Implemented function pushdown in v2 expressions
> Flink: Added bucketing only key-by strategy
> Build: Updated to Gradle version catalog
> Making progress on the reference implementation of common views
> Continuing work on table encryption
>
> Python:
>
> 0.5.0 rc1 vote is under way
> Added support for serverless environments
> Implemented schema evolution
> Moved to Pydantic v2
> Added support for positional deletes
> Substantially improved Avro read performance
> Added conversion from Parquet to Iceberg schemas
> Added support for FSSpec and HDFS data
> Added SQL filter parsing
>
> Rust:
>
> Created a repository for the Rust implementation, iceberg-rust
> 25 PRs merged
> Implemented base table metadata (e.g., types, transforms)
> Implemented visitors for working with nested structures
> Added Avro/Iceberg schema conversion
> Added build tooling
>
> Go:
>
> Created a repository for the Go implementation, iceberg-go
> Added schema and types
>
> Community Health:
>
> The largest development in the community is the addition of the Rust and Go
> repositories, which is shown in the increase in code contributors this 
> quarter.
> The new implementations will also lead to new committers and PMC members. The
> community has had good discussions about how manage contributions, to build
> confidence in the implementations as well as to help new contributors become
> familiar with the way the Apache community operates. (Along with ASF
> requirements like license documentation.)
>
> Two community metrics show decreases. Dev list traffic tends to vary because 
> of
> how the community uses the dev list — that is, mostly for large design
> discussions. The number of issues closed was also lower than normal and is not
> expected to fluctuate. We will take a look and see what the difference is.
>
> --
> Ryan Blue
> Tabular


Re: [VOTE] Release Apache PyIceberg 0.5.0

2023-09-13 Thread Jean-Baptiste Onofré
Hi Fokko,

I agree: let's move forward on 0.5.0 release and we can submit 0.5.1
very soon after 0.5.0.

Thanks !
Regards
JB

On Tue, Sep 12, 2023 at 7:15 PM Driesprong, Fokko  wrote:
>
> Hey everyone,
>
> After an issue on Github, I noticed a bug in PyIceberg that the filesystem 
> isn't being reused. I think there is more room for improvement (both in the 
> long and short term), but I don't think we should block the release on that 
> since 0.5.0 is already much faster due to improved Avro parsing, improved IO, 
> and the previously mentioned bugfix (and one that was merged earlier today).
>
> I'll cut another PR as soon as #8549 is in. Thanks everyone for the patience!
>
> Cheers, Fokko
>
> Op ma 11 sep 2023 om 14:22 schreef Fokko Driesprong :
>>
>> Hi Everyone,
>>
>> I propose that we release the following RC as the official PyIceberg 0.5.0 
>> release. A summary of what's included in 0.5.0:
>>
>> Add gzip metadata support
>> PyArrow HDFS support
>> Support serverless environments (AWS Lambda)
>> Many fixes around Avro performance (PRs 1, 2, 3, 4)
>> Remove the upper bound of PyParsing dependency (blocking a PR in Airflow)
>> Moving the reading of Avro to Cython (10x speed improvement(!))
>> Support for the SQLCatalog (JDBC in Java)
>> Fix support for UUID columns
>> Support for adding columns
>> Optimize concurrency (follow up on the Support servless environments)
>> Bump Pydantic to v2 (improved performance of the JSON (de)serialization)
>> A lot of bugfixes!
>>
>> The commit ID is 3323281045a72f1156d58c261067469e383fb26d
>>
>> * This corresponds to the tag: pyiceberg-0.5.0rc2 
>> (92600935834bdf77ba37ac361338712713549a77)
>> * https://github.com/apache/iceberg/releases/tag/pyiceberg-0.5.0rc2
>> * 
>> https://github.com/apache/iceberg/tree/3323281045a72f1156d58c261067469e383fb26d
>>
>> The release tarball, signature, and checksums are here:
>>
>> * https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.5.0rc2/
>>
>> You can find the KEYS file here:
>>
>> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>>
>> Convenience binary artifacts are staged on pypi:
>>
>> https://pypi.org/project/pyiceberg/0.5.0rc2/
>>
>> And can be installed using: pip3 install pyiceberg==0.5.0rc2
>>
>> Since a lot has changed due to the release of the wheels (binary Python 
>> libraries), I've included the following steps to verify the release:
>>
>> curl https://dist.apache.org/repos/dist/dev/iceberg/KEYS -o KEYS
>> gpg --import KEYS
>>
>> svn checkout 
>> https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.5.0rc1/ 
>> /tmp/pyiceberg/
>>
>> for name in $(ls /tmp/pyiceberg/pyiceberg-*.whl 
>> /tmp/pyiceberg/pyiceberg-*.tar.gz)
>> do
>> gpg --verify ${name}.asc ${name}
>> done
>>
>> cd  /tmp/pyiceberg/
>> for name in $(ls /tmp/pyiceberg/pyiceberg-*.whl.asc.sha512 
>> /tmp/pyiceberg/pyiceberg-*.tar.gz.asc.sha512)
>> do
>> shasum -a 512 --check ${name}
>> done
>>
>> tar xzf pyiceberg-0.5.0.tar.gz
>> cd pyiceberg-0.5.0
>>
>> ./dev/check-license
>>
>> Please download, verify, and test.
>>
>> Please vote in the next 72 hours.
>> [ ] +1 Release this as PyIceberg 0.5.0
>> [ ] +0
>> [ ] -1 Do not release this because...
>>
>> Please consider this my +1, I've checked against the docker-spark-iceberg 
>> notebook, and did some checks.
>>
>> Kind regards,
>> Fokko Driesprong
>>


Re: [DISCUSS] Include Spark 3.5 support in 1.4?

2023-09-13 Thread Jean-Baptiste Onofré
Hi Anton,

I think we can increase our release pace (I'm volunteer to deal with
releases if it helps). For 1.4, I'm working on removing the deprecated
API in AwsProperties (the PR should be there pretty soon).

Imho, it would be great to have a regular release pace clearly defined
on the website for the community (I do that on several Apache
projects), but nothing prevent us to do some fast release when needed.
So we can imagine to submit 1.4.1 release to vote, including Spark
3.5.

Thoughts ?

Regards
JB

On Thu, Aug 31, 2023 at 1:28 AM Anton Okolnychyi
 wrote:
>
> Hey everyone,
>
> I have been one of the supporters for releasing Java 1.4 end of Aug or early 
> Sep. I think we are getting close and most of the targeted items will be 
> wrapped within one or two weeks.
>
> I just realized that Spark 3.5 is almost out. A few release candidates failed 
> but it is converging. How would everyone feel about targeting Spark 3.5 
> support with our Java 1.4 release? I can add that support fairly quickly 
> cause we already consume the 3.5 APIs and adapted the tests internally. I 
> think that would be a good idea cause there is no critical items to deliver 
> in 1.4 as of now and it would be really unfortunate to wait for 2-3 more 
> months for Spark 3.5 support if we don’t include it now.
>
> Any thoughts? Does anyone need 1.4 asap?
>
> - Anton


Re: September board report

2023-09-13 Thread Jean-Baptiste Onofré
Hi Brian,

sorry for the late reply (I'm back from vacation and unstack my Apache
emails :)).

The doc is almost ready now, I will share on the mailing list soon
(with a quick summary directly in the email).

Thanks !
Regards
JB

On Fri, Sep 8, 2023 at 1:14 PM Brian Olsen  wrote:
>
> Hey JB, let me know if you need any help here.
>
> On Thu, Sep 7, 2023 at 11:01 PM Jean-Baptiste Onofré  
> wrote:
>>
>> Hi Ryan
>>
>> It looks good to me. About the conference (summit/meetup), I will add
>> as soon as we have concrete plans (still working on the Summit
>> proposal doc).
>>
>> Thanks !
>>
>> Regards
>> JB
>>
>> On Thu, Sep 7, 2023 at 6:30 PM Ryan Blue  wrote:
>> >
>> > Hi everyone,
>> >
>> > Here’s my draft for the September Iceberg board report. Let me know if 
>> > you’d like to add anything!
>> >
>> > I know that JB wanted to add conference talks last time, but I’m not aware 
>> > of any that have happened this quarter. If you’ve given a talk recently, 
>> > please let me know!
>> >
>> > Ryan
>> >
>> > Description:
>> >
>> > Apache Iceberg is a table format for huge analytic datasets that is 
>> > designed
>> > for high performance and ease of use.
>> >
>> > Project Status:
>> >
>> > Current project status: Ongoing
>> > Issues for the board: none
>> >
>> > Membership Data:
>> >
>> > Apache Iceberg was founded 2020-05-19 (3 years ago)
>> > There are currently 24 committers and 16 PMC members in this project.
>> > The Committer-to-PMC ratio is 3:2.
>> >
>> > Community changes, past quarter:
>> >
>> > No new PMC members. Last addition was Szehon Ho on 2023-04-20.
>> > No new committers. Last addition was Amogh Jahagirdar on 2023-04-25.
>> >
>> > Project Activity:
>> >
>> > Releases:
>> >
>> > PyIcberg 0.4.0 was released on 2023-07-23
>> > 1.3.1 was released on 2023-07-25
>> >
>> > Java:
>> >
>> > Preparing for a 1.4.0 release in Sept/Oct
>> > Added dependency bundles for AWS, GCP, and Azure
>> > Added Azure FileIO implementation
>> > Added API for multi-table commits
>> > Performance optimizations for delete file scan planning
>> > Spark: Implemented adaptive split sizing
>> > Spark: Implemented function pushdown in v2 expressions
>> > Flink: Added bucketing only key-by strategy
>> > Build: Updated to Gradle version catalog
>> > Making progress on the reference implementation of common views
>> > Continuing work on table encryption
>> >
>> > Python:
>> >
>> > 0.5.0 rc1 vote is under way
>> > Added support for serverless environments
>> > Implemented schema evolution
>> > Moved to Pydantic v2
>> > Added support for positional deletes
>> > Substantially improved Avro read performance
>> > Added conversion from Parquet to Iceberg schemas
>> > Added support for FSSpec and HDFS data
>> > Added SQL filter parsing
>> >
>> > Rust:
>> >
>> > Created a repository for the Rust implementation, iceberg-rust
>> > 25 PRs merged
>> > Implemented base table metadata (e.g., types, transforms)
>> > Implemented visitors for working with nested structures
>> > Added Avro/Iceberg schema conversion
>> > Added build tooling
>> >
>> > Go:
>> >
>> > Created a repository for the Go implementation, iceberg-go
>> > Added schema and types
>> >
>> > Community Health:
>> >
>> > The largest development in the community is the addition of the Rust and Go
>> > repositories, which is shown in the increase in code contributors this 
>> > quarter.
>> > The new implementations will also lead to new committers and PMC members. 
>> > The
>> > community has had good discussions about how manage contributions, to build
>> > confidence in the implementations as well as to help new contributors 
>> > become
>> > familiar with the way the Apache community operates. (Along with ASF
>> > requirements like license documentation.)
>> >
>> > Two community metrics show decreases. Dev list traffic tends to vary 
>> > because of
>> > how the community uses the dev list — that is, mostly for large design
>> > discussions. The number of issues closed was also lower than normal and is 
>> > not
>> > expected to fluctuate. We will take a look and see what the difference is.
>> >
>> > --
>> > Ryan Blue
>> > Tabular


Re: [VOTE] Release Apache PyIceberg 0.5.0 RC3

2023-09-18 Thread Jean-Baptiste Onofré
+1 (non binding)

quickly tested the "legal" part:
- signatures/hash has been fixed thanks for that !
- ASF header looks ok
- no binaries found in the pyiceberg distribution which is good

Thanks !
Regards
JB

On Wed, Sep 13, 2023 at 2:18 PM Fokko Driesprong  wrote:
>
> Hi Everyone,
>
>
> I propose that we release the following RC as the official PyIceberg 0.5.0 
> release. This includes the performance issue that was discovered in RC2. A 
> summary of what's included in 0.5.0:
>
> Add gzip metadata support
> PyArrow HDFS support
> Support serverless environments (AWS Lambda)
> Many fixes around Avro performance (PRs 1, 2, 3, 4)
> Remove the upper bound of PyParsing dependency (blocking a PR in Airflow)
> Moving the reading of Avro to Cython (10x speed improvement(!))
> Support for the SQLCatalog (JDBC in Java)
> Fix support for UUID columns
> Support for adding columns
> Optimize concurrency (follow up on the Support serverless environments)
> Bump Pydantic to v2 (improved performance of the JSON (de)serialization)
> A lot of bugfixes!
>
> The commit ID is f798b06246e67131d413dfceece5ccaf269e01fe
>
> This corresponds to the tag: pyiceberg-0.5.0rc3 
> (37fa779b0957644590a03754a733a5b3e3f589d0)
> https://github.com/apache/iceberg/releases/tag/pyiceberg-0.5.0rc3
> https://github.com/apache/iceberg/tree/f798b06246e67131d413dfceece5ccaf269e01fe
>
> The release tarball, signature, and checksums are here:
>
> https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.5.0rc3/
>
> You can find the KEYS file here:
>
> https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
> Convenience binary artifacts are staged on pypi:
>
>
> https://pypi.org/project/pyiceberg/0.5.0rc3/
>
>
> And can be installed using: pip3 install pyiceberg==0.5.0rc3
>
>
> Please download, verify, and test.
>
>
> Please vote in the next 72 hours.
>
>
> [ ] +1 Release this as PyIceberg 0.5.0
>
> [ ] +0
>
> [ ] -1 Do not release this because...
>
>
> Cheers, Fokko
>
>


Re: [DISCUSS] Spark 3.1 support?

2023-09-21 Thread Jean-Baptiste Onofré
Hi Anton,

imho for 1.4.0, we can deprecate/remove Spark 3.1 support. As it's
major release, we can remove old version support. Spark 3.1 users can
still use Iceberg 1.3.x.

That's why I proposed a LTS policy for our users, I will come with a
proposal about that.

Thanks !
Regards
JB

On Wed, Sep 20, 2023 at 11:53 PM Anton Okolnychyi
 wrote:
>
> Just checking in to see how we feel about Spark 3.1 now. We support 5 
> different Spark versions at this point: 3.1, 3.2, 3.3, 3.4 and 3.5. Any 
> thoughts on making Iceberg 1.4 the last release with Spark 3.1?
>
> On 2023/04/27 04:58:06 Walaa Eldin Moustafa wrote:
> > Yes, that sounds like a good compromise. Initially I was looking at
> > deprecation guidelines in [1], but I see you are referring to [2].
> >
> > [1] https://iceberg.apache.org/contribute/
> > [2]
> > https://iceberg.apache.org/multi-engine-support/#current-engine-version-lifecycle-status
> >
> > On Wed, Apr 26, 2023 at 8:10 AM Anton Okolnychyi
> >  wrote:
> >
> > > Got it. Given that quite a bit of folks still use 3.1, I don’t think we
> > > would remove it unless the branch becomes inactive. Marking it as
> > > deprecated would allow us to indicate that it may not be as up-to-date and
> > > complete as other versions and some performance enhancements or even minor
> > > bug fixes may not be there. That would solve one of the concerns I raised
> > > earlier. We should discourage users from onboarding new use cases on 3.1.
> > >
> > > I believe our doc summarizes the message pretty well.
> > >
> > >
> > >1. *Deprecated*: an engine version is no longer actively maintained.
> > >People who are still interested in the version can backport any 
> > > necessary
> > >feature or bug fix from newer versions, but the community will not 
> > > spend
> > >effort in achieving feature parity. Iceberg recommends users to move
> > >towards a newer version. Contributions to a deprecated version is 
> > > expected
> > >to diminish over time, so that eventually no change is added to a
> > >deprecated version.
> > >
> > >
> > > https://iceberg.apache.org/multi-engine-support/#current-engine-version-lifecycle-status
> > >
> > > Let me know if that seems like a good compromise.
> > >
> > > - Anton
> > >
> > >
> > > On Apr 25, 2023, at 8:01 PM, Walaa Eldin Moustafa 
> > > wrote:
> > >
> > > To elaborate on LinkedIn's use case:
> > >
> > > * LinkedIn maintains its own fork, but we would like to keep it as close
> > > to upstream as possible.
> > > * +1 to Manu on migrations in large companies could take well beyond 18
> > > months, and it is unlikely to migrate/upgrade more frequently.
> > > * One important use case for the Spark 3.1 module is not necessarily
> > > fixing issues in the module itself, but fixing issues in other core
> > > modules, and having a release that contains core fixes as well as Spark 
> > > 3.1.
> > > * That said, in the last 6 months, there have been 29 commits to Spark 3.1
> > > module, 50 commits to Spark 3.2 module, and 90 to Spark 3.3, in the 
> > > Iceberg
> > > master branch. It seems that Spark 3.1 is reasonably active.
> > >
> > > What does marking as deprecated entail in terms of deleting the code?
> > > Would the guideline be to use 3.2 or 3.3 as an alternative?
> > >
> > > Thanks,
> > > Walaa.
> > >
> > >
> > >
> > > On Tue, Apr 25, 2023 at 12:08 PM Anton Okolnychyi <
> > > aokolnyc...@apple.com.invalid> wrote:
> > >
> > >> Ok, seems like we are in agreement to deprecate 3.1. I’ll fire a PR
> > >> shortly.
> > >>
> > >> Does anyone want to go through changes in 3.3 and 3.2 and find what we
> > >> missed to cherry-pick so that we have that list in one place (e.g. create
> > >> an issue)?
> > >>
> > >> Any thoughts on how to mark changes as candidates for cherry-picking?
> > >> Creating an issue?
> > >>
> > >> - Anton
> > >>
> > >> On Apr 24, 2023, at 10:01 AM, Edgar Rodriguez <
> > >> edgar.rodrig...@airbnb.com.INVALID> wrote:
> > >>
> > >> Hi all,
> > >>
> > >> Thanks for the discussion. Similarly to Manu, we're in Spark 3.1.1 and
> > >> Iceberg 1.1.0 - we backport Spark 3.1.1 fixes internally as well. It's a
> > >> bit more complicated to move fast on Spark versions internally, mainly 
> > >> due
> > >> to the number of scala customers that we have.
> > >>
> > >> I understand maintaining yet another Spark version is burdensome so
> > >> I'm +1 on marking 3.1 deprecated, and I'd be happy to contribute on
> > >> backports if needed on a community maintained branch, we'd just need to 
> > >> tag
> > >> changes that may need a backport.
> > >>
> > >> Cheers,
> > >>
> > >> On Sun, Apr 23, 2023 at 4:40 PM Ryan Blue  wrote:
> > >>
> > >>> Thank you for stepping up and offering to help, Manu. I'm glad that
> > >>> you're willing to help with backports.
> > >>>
> > >>> On Sun, Apr 23, 2023 at 2:05 AM Manu Zhang 
> > >>> wrote:
> > >>>
> >    You would just end up backporting twice.
> > 
> > 
> >  That's why I said a community maintained branch benef

Re: [DISCUSS] Deprecate Spark 3.2 support?

2023-09-21 Thread Jean-Baptiste Onofré
+1

Regards
JB

On Thu, Sep 21, 2023 at 12:01 AM Anton Okolnychyi
 wrote:
>
> Shall we consider deprecating our Spark 3.2 support? That Spark version is no 
> longer being maintained by the Spark community and is not under active 
> development in Iceberg. It was released in October, 2021 and passed the 18 
> month maintenance mark in Spark.
>
> - Anton


Re: [DISCUSS] Deprecate Spark 3.2 support?

2023-09-21 Thread Jean-Baptiste Onofré
Just to elaborate a bit :)

- As Iceberg 1.4.0 is new "major" release, it's good time to
deprecate/remove old version support (of Spark and other things)
- Spark 3.2 users can still use previous Iceberg version
- I will start a discussion about LTS policy with a clear "target"
support for our users (something like the table you can see here
https://karaf.apache.org/download.html), we can list supported Java,
Python, Spark, Flink,  support

Regards
JB

On Thu, Sep 21, 2023 at 12:01 AM Anton Okolnychyi
 wrote:
>
> Shall we consider deprecating our Spark 3.2 support? That Spark version is no 
> longer being maintained by the Spark community and is not under active 
> development in Iceberg. It was released in October, 2021 and passed the 18 
> month maintenance mark in Spark.
>
> - Anton


Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-09-21 Thread Jean-Baptiste Onofré
Hi Ryan,

Thanks for the feedback. Unfortunately, I was not able to join the
Iceberg community sync meeting yesterday, I promise I will join the
next ones.

I think the proposal is very interesting and also the
discussion/comments in the document. I agree that some points should
be discussed further. I propose to update the document with your
points/questions.

Thanks !

Regards
JB

On Thu, Sep 21, 2023 at 2:02 AM Ryan Blue  wrote:
>
> Renjie, thanks for the proposal.
>
> We talked about this today in the Iceberg community sync and the general 
> feedback was that we're excited work on this, but the proposal left a few 
> areas unclear. There are a few decisions about how to manage the delete 
> vectors that need to be added to the design. For example:
> 1. Would there be only one delete vector per data file?
> 2. Would this require merge of existing vectors and new deletes at write time?
> 3. How would the data file for a vector be identified?
> 4. If multiple vectors are allowed, what is the plan for keeping the number 
> of delete vectors small?
> 5. Would we allow writing multiple delete vectors into the same file?
> 6. How would we track which files are affected by a combined file of delete 
> vectors?
> 7. What are the details of the proposed file format?
>
> In short, we just want to better understand how all this would work.
>
> Thanks!
>
> Ryan
>
>
> On Mon, Sep 18, 2023 at 8:22 PM Renjie Liu  wrote:
>>
>> Hi, all:
>>
>>
>>
>> I have a proposal to introduce deletion vector file to reduce write 
>> amplification of iceberg table:
>>
>> https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit?usp=sharing
>>
>>
>>
>> Welcome to comment, and looking forward to hear your advice.
>
>
>
> --
> Ryan Blue
> Tabular


Re: [DISCUSS] Deprecate Spark 3.2 support?

2023-09-21 Thread Jean-Baptiste Onofré
Hi Ryan,

Yes it makes sense. The way we discuss and decide the Spark versions
is totally fine.

My proposal was more to clearly announce the Spark/Flink/Java/Python
versions supported by Iceberg releases. I know that it's obvious on
the artifacts name (containing the spark/flink versions) as we share
on https://iceberg.apache.org/releases/.
The idea is just to anticipate a bit to inform our users/community,
for instance having a clear table about the supported layers (a bit
like on https://karaf.apache.org/download.html or
https://kafka.apache.org/downloads).

Thanks !
Regards
JB

On Thu, Sep 21, 2023 at 5:40 PM Ryan Blue  wrote:
>
> JB, I don't think that we need a policy on which Spark versions we intend to 
> keep. Having discussions like this are more effective. Just look at the 
> support for Spark 2.4, which we kept for a lot longer to help people 
> transition. Policy is a way of making decisions by algorithm and I don't 
> think we want to do that here.
>
> On Thu, Sep 21, 2023 at 1:48 AM Jean-Baptiste Onofré  
> wrote:
>>
>> Just to elaborate a bit :)
>>
>> - As Iceberg 1.4.0 is new "major" release, it's good time to
>> deprecate/remove old version support (of Spark and other things)
>> - Spark 3.2 users can still use previous Iceberg version
>> - I will start a discussion about LTS policy with a clear "target"
>> support for our users (something like the table you can see here
>> https://karaf.apache.org/download.html), we can list supported Java,
>> Python, Spark, Flink,  support
>>
>> Regards
>> JB
>>
>> On Thu, Sep 21, 2023 at 12:01 AM Anton Okolnychyi
>>  wrote:
>> >
>> > Shall we consider deprecating our Spark 3.2 support? That Spark version is 
>> > no longer being maintained by the Spark community and is not under active 
>> > development in Iceberg. It was released in October, 2021 and passed the 18 
>> > month maintenance mark in Spark.
>> >
>> > - Anton
>
>
>
> --
> Ryan Blue
> Tabular


Re: [DISCUSS] Deprecate Spark 3.2 support?

2023-09-21 Thread Jean-Baptiste Onofré
Exactly ! Thanks for sharing !

I didn't find this page.

Thanks, so we have it :)

Regards
JB

On Thu, Sep 21, 2023 at 6:42 PM Ryan Blue  wrote:
>
> Do you mean like this page? 
> https://iceberg.apache.org/multi-engine-support/#current-engine-version-lifecycle-status
>
> On Thu, Sep 21, 2023 at 8:52 AM Jean-Baptiste Onofré  
> wrote:
>>
>> Hi Ryan,
>>
>> Yes it makes sense. The way we discuss and decide the Spark versions
>> is totally fine.
>>
>> My proposal was more to clearly announce the Spark/Flink/Java/Python
>> versions supported by Iceberg releases. I know that it's obvious on
>> the artifacts name (containing the spark/flink versions) as we share
>> on https://iceberg.apache.org/releases/.
>> The idea is just to anticipate a bit to inform our users/community,
>> for instance having a clear table about the supported layers (a bit
>> like on https://karaf.apache.org/download.html or
>> https://kafka.apache.org/downloads).
>>
>> Thanks !
>> Regards
>> JB
>>
>> On Thu, Sep 21, 2023 at 5:40 PM Ryan Blue  wrote:
>> >
>> > JB, I don't think that we need a policy on which Spark versions we intend 
>> > to keep. Having discussions like this are more effective. Just look at the 
>> > support for Spark 2.4, which we kept for a lot longer to help people 
>> > transition. Policy is a way of making decisions by algorithm and I don't 
>> > think we want to do that here.
>> >
>> > On Thu, Sep 21, 2023 at 1:48 AM Jean-Baptiste Onofré  
>> > wrote:
>> >>
>> >> Just to elaborate a bit :)
>> >>
>> >> - As Iceberg 1.4.0 is new "major" release, it's good time to
>> >> deprecate/remove old version support (of Spark and other things)
>> >> - Spark 3.2 users can still use previous Iceberg version
>> >> - I will start a discussion about LTS policy with a clear "target"
>> >> support for our users (something like the table you can see here
>> >> https://karaf.apache.org/download.html), we can list supported Java,
>> >> Python, Spark, Flink,  support
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> On Thu, Sep 21, 2023 at 12:01 AM Anton Okolnychyi
>> >>  wrote:
>> >> >
>> >> > Shall we consider deprecating our Spark 3.2 support? That Spark version 
>> >> > is no longer being maintained by the Spark community and is not under 
>> >> > active development in Iceberg. It was released in October, 2021 and 
>> >> > passed the 18 month maintenance mark in Spark.
>> >> >
>> >> > - Anton
>> >
>> >
>> >
>> > --
>> > Ryan Blue
>> > Tabular
>
>
>
> --
> Ryan Blue
> Tabular


Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-09-22 Thread Jean-Baptiste Onofré
Hi guys,

Finally (sorry for the long wait :)), a first formal Iceberg Summit
proposal doc is ready to be populated/reviewed:

https://docs.google.com/document/d/1Uy9-qRxLtjMWJkRXsjj94Vq3VO1Mc0wGz_bnisevNh8/edit?usp=sharing

Anyone can edit the document, so feel free to complete or ask
questions via comments.

Thanks !
Regards
JB

On Wed, Aug 23, 2023 at 11:25 PM Jean-Baptiste Onofré  wrote:
>
> It sounds great :)
>
> I will include a note in the proposal doc about that.
>
> Regards
> JB
>
> Le mer. 23 août 2023 à 14:14, Brian Olsen  a écrit :
>>
>> Out of curiosity, is anyone strongly opposed to doing antics like this for 
>> summits?
>>
>> https://youtube.com/playlist?list=PLFnr63che7wYFsknFAqisURvfm96rW0Dr
>>
>>
>> On Mon, Aug 21, 2023 at 6:58 PM Matt Topol  wrote:
>>>
>>> I don't think I'll have much time to contribute to help, but I would 
>>> absolutely help if possible.
>>>
>>> That said, I'll definitely want to give a talk / speak at this summit when 
>>> it happens :)
>>>
>>> On Mon, Aug 21, 2023 at 1:38 AM Jean-Baptiste Onofré  
>>> wrote:
>>>>
>>>> Hi guys,
>>>>
>>>> I'm back from vacation and I'm resuming the work on the Iceberg Summit
>>>> proposal doc. I will share the doc asap.
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On Wed, Jul 5, 2023 at 4:37 PM Jean-Baptiste Onofré  
>>>> wrote:
>>>> >
>>>> > Hi everyone,
>>>> >
>>>> > I started a discussion on the private mailing list, and, as there are
>>>> > no objections from the PMC members, I'm moving the thread to the dev
>>>> > mailing list.
>>>> >
>>>> > I propose to organize the first Apache Iceberg Summit \o/
>>>> >
>>>> > For the format, I think the best option is a virtual event with a mix of:
>>>> > 1. Dev community talks: architecture, roadmap, features, use in 
>>>> > "products", ...
>>>> > 2. User community talks: companies could present their use cases, best
>>>> > practices, ...
>>>> >
>>>> > In terms of organization:
>>>> > 1. no objection so far from the PMC members to use Apache Iceberg
>>>> > Summit name. If it works for everyone, I will send a message to the
>>>> > Apache Publicity & Marketing to get their OK for the event.
>>>> >  2. create two committees:
>>>> >   2.1. the Sponsoring Committee gathering companies/organizations
>>>> > wanting to sponsor the event
>>>> >   2.2. the Program Committee gathers folks from the Iceberg community
>>>> > (PMC/committers/contributors) to select talks.
>>>> >
>>>> > My company (Dremio) will “host” the event - i.e., provide funding, a
>>>> > conference platform, sponsor logistics, speaker training, slide
>>>> > design, etc..
>>>> >
>>>> > In terms of dates, as CommunityOverCode Con NA will be in October, I
>>>> > think January 2024 would work: it gives us time to organize smoothly,
>>>> > promote the event, and not in a rush.
>>>> >
>>>> > I propose:
>>>> > 1. to create the #summit channel on Iceberg Slack.
>>>> > 2. I will share a preparation document with a plan proposal.
>>>> >
>>>> > Thoughts ?
>>>> >
>>>> > Regards
>>>> > JB


Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-09-26 Thread Jean-Baptiste Onofré
Hi Ryan,

My bad: I understood in the previous discussion about the summit that
we wanted something more concrete in terms of format, length,
organisation, how we can set this up.

Of course, no problem to take a step back, and the document is
definitely where we can work all together. I'm also in the same
situation as you: I'm helping using my Apache/community hat, but I
included proposals from my company (Dremio) in the doc. For instance,
I think it's important to "split" committees for sponsoring and track
content, just to avoid "sales talk" as it's a community event.
As few of us helped on summits in the past, experience and how it went
are valuable.

Special thank to you, Brian and Ed about the comments.

I propose to work on the doc: anyone from the community can edit the doc.

Thanks,
Regards
JB





On Fri, Sep 22, 2023 at 8:18 PM Ryan Blue  wrote:
>
> To me, this proposal is getting a bit ahead of where I'm comfortable. I was 
> expecting this to address some of the big questions about how to run an event 
> like this from an open source community, but it seems to be assuming that the 
> event will happen and addresses logistics.
>
> Here's an example of what I mean: the doc suggests the different levels of 
> sponsors, slots at those levels, and prices. But I think the main question 
> the community has to think through before we get there --- and what I'd 
> expect in a proposal --- is how to ensure that such an event remains 
> commercially neutral. Because you're asking for this to be an "official" 
> event from the community and using its trademarks, we need to think through 
> how we want to strike that balance.
>
> I'd like to see more of a proposal around:
> 1. Should the Iceberg community put on an event? Clearly, we like the idea of 
> exchanging ideas, but it goes beyond that.
> 2. How would we balance the interests of different parts of the community and 
> why should we take that approach?
>
> We have a lot of different companies contributing and building around 
> Iceberg. The last thing that we want to do is give the impression that the 
> community is in any way "pay-to-play" --- that's one of this community's 
> distinguishing features.
>
> Also, I want to disclose that I'm also associated with a vendor (Tabular). 
> With that hat on, I think we'd happily sponsor an event like this. But with 
> my community hat on I want to make sure we plan it out and think carefully.
>
> Ryan
>
> On Fri, Sep 22, 2023 at 2:09 AM Jean-Baptiste Onofré  
> wrote:
>>
>> Hi guys,
>>
>> Finally (sorry for the long wait :)), a first formal Iceberg Summit
>> proposal doc is ready to be populated/reviewed:
>>
>> https://docs.google.com/document/d/1Uy9-qRxLtjMWJkRXsjj94Vq3VO1Mc0wGz_bnisevNh8/edit?usp=sharing
>>
>> Anyone can edit the document, so feel free to complete or ask
>> questions via comments.
>>
>> Thanks !
>> Regards
>> JB
>>
>> On Wed, Aug 23, 2023 at 11:25 PM Jean-Baptiste Onofré  
>> wrote:
>> >
>> > It sounds great :)
>> >
>> > I will include a note in the proposal doc about that.
>> >
>> > Regards
>> > JB
>> >
>> > Le mer. 23 août 2023 à 14:14, Brian Olsen  a 
>> > écrit :
>> >>
>> >> Out of curiosity, is anyone strongly opposed to doing antics like this 
>> >> for summits?
>> >>
>> >> https://youtube.com/playlist?list=PLFnr63che7wYFsknFAqisURvfm96rW0Dr
>> >>
>> >>
>> >> On Mon, Aug 21, 2023 at 6:58 PM Matt Topol  wrote:
>> >>>
>> >>> I don't think I'll have much time to contribute to help, but I would 
>> >>> absolutely help if possible.
>> >>>
>> >>> That said, I'll definitely want to give a talk / speak at this summit 
>> >>> when it happens :)
>> >>>
>> >>> On Mon, Aug 21, 2023 at 1:38 AM Jean-Baptiste Onofré  
>> >>> wrote:
>> >>>>
>> >>>> Hi guys,
>> >>>>
>> >>>> I'm back from vacation and I'm resuming the work on the Iceberg Summit
>> >>>> proposal doc. I will share the doc asap.
>> >>>>
>> >>>> Regards
>> >>>> JB
>> >>>>
>> >>>> On Wed, Jul 5, 2023 at 4:37 PM Jean-Baptiste Onofré  
>> >>>> wrote:
>> >>>> >
>> >>>> > Hi everyone,
>> >>>> >
>> >>>> > I started a

Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-09-26 Thread Jean-Baptiste Onofré
I mean that anyone interested can add it to the doc.
I think we have different motivation and ideas. It’s great to share all
together.
Personally, my motivation is to promote the project (because it’s super
interesting) and connect dots with other Apache projects. I also think it’s
good timing to grow/expand our community.
I will also work on the document thanks to your comment, as anyone can do :)

Regards
JB

Le mer. 27 sept. 2023 à 00:47, Ryan Blue  a écrit :

> JB,
>
> What do you mean --- are you going to put together the proposal for how
> this would work? Or are you saying that anyone interested in that topic
> could add it to the doc?
>
> Ryan
>
> On Tue, Sep 26, 2023 at 12:26 PM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Ryan,
>>
>> My bad: I understood in the previous discussion about the summit that
>> we wanted something more concrete in terms of format, length,
>> organisation, how we can set this up.
>>
>> Of course, no problem to take a step back, and the document is
>> definitely where we can work all together. I'm also in the same
>> situation as you: I'm helping using my Apache/community hat, but I
>> included proposals from my company (Dremio) in the doc. For instance,
>> I think it's important to "split" committees for sponsoring and track
>> content, just to avoid "sales talk" as it's a community event.
>> As few of us helped on summits in the past, experience and how it went
>> are valuable.
>>
>> Special thank to you, Brian and Ed about the comments.
>>
>> I propose to work on the doc: anyone from the community can edit the doc.
>>
>> Thanks,
>> Regards
>> JB
>>
>>
>>
>>
>>
>> On Fri, Sep 22, 2023 at 8:18 PM Ryan Blue  wrote:
>> >
>> > To me, this proposal is getting a bit ahead of where I'm comfortable. I
>> was expecting this to address some of the big questions about how to run an
>> event like this from an open source community, but it seems to be assuming
>> that the event will happen and addresses logistics.
>> >
>> > Here's an example of what I mean: the doc suggests the different levels
>> of sponsors, slots at those levels, and prices. But I think the main
>> question the community has to think through before we get there --- and
>> what I'd expect in a proposal --- is how to ensure that such an event
>> remains commercially neutral. Because you're asking for this to be an
>> "official" event from the community and using its trademarks, we need to
>> think through how we want to strike that balance.
>> >
>> > I'd like to see more of a proposal around:
>> > 1. Should the Iceberg community put on an event? Clearly, we like the
>> idea of exchanging ideas, but it goes beyond that.
>> > 2. How would we balance the interests of different parts of the
>> community and why should we take that approach?
>> >
>> > We have a lot of different companies contributing and building around
>> Iceberg. The last thing that we want to do is give the impression that the
>> community is in any way "pay-to-play" --- that's one of this community's
>> distinguishing features.
>> >
>> > Also, I want to disclose that I'm also associated with a vendor
>> (Tabular). With that hat on, I think we'd happily sponsor an event like
>> this. But with my community hat on I want to make sure we plan it out and
>> think carefully.
>> >
>> > Ryan
>> >
>> > On Fri, Sep 22, 2023 at 2:09 AM Jean-Baptiste Onofré 
>> wrote:
>> >>
>> >> Hi guys,
>> >>
>> >> Finally (sorry for the long wait :)), a first formal Iceberg Summit
>> >> proposal doc is ready to be populated/reviewed:
>> >>
>> >>
>> https://docs.google.com/document/d/1Uy9-qRxLtjMWJkRXsjj94Vq3VO1Mc0wGz_bnisevNh8/edit?usp=sharing
>> >>
>> >> Anyone can edit the document, so feel free to complete or ask
>> >> questions via comments.
>> >>
>> >> Thanks !
>> >> Regards
>> >> JB
>> >>
>> >> On Wed, Aug 23, 2023 at 11:25 PM Jean-Baptiste Onofré 
>> wrote:
>> >> >
>> >> > It sounds great :)
>> >> >
>> >> > I will include a note in the proposal doc about that.
>> >> >
>> >> > Regards
>> >> > JB
>> >> >
>> >> > Le mer. 23 août 2023 à 14:14, Brian Olsen 
>> a écrit :
>> >

Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-09-27 Thread Jean-Baptiste Onofré
Hi Ryan

Yes I will update the proposal to address the concerns.

Thanks
Regards
JB

Le mer. 27 sept. 2023 à 23:11, Ryan Blue  a écrit :

> JB,
>
> I think we're agreed that it would be great to promote the project and put
> together talks to help expand the community. What isn't clear is how to
> make this happen. Are you planning on updating the proposal to address the
> concerns I raised or do you want someone else to pick that up?
>
> Ryan
>
> On Tue, Sep 26, 2023 at 11:28 PM Jean-Baptiste Onofré 
> wrote:
>
>> I mean that anyone interested can add it to the doc.
>> I think we have different motivation and ideas. It’s great to share all
>> together.
>> Personally, my motivation is to promote the project (because it’s super
>> interesting) and connect dots with other Apache projects. I also think it’s
>> good timing to grow/expand our community.
>> I will also work on the document thanks to your comment, as anyone can do
>> :)
>>
>> Regards
>> JB
>>
>> Le mer. 27 sept. 2023 à 00:47, Ryan Blue  a écrit :
>>
>>> JB,
>>>
>>> What do you mean --- are you going to put together the proposal for how
>>> this would work? Or are you saying that anyone interested in that topic
>>> could add it to the doc?
>>>
>>> Ryan
>>>
>>> On Tue, Sep 26, 2023 at 12:26 PM Jean-Baptiste Onofré 
>>> wrote:
>>>
>>>> Hi Ryan,
>>>>
>>>> My bad: I understood in the previous discussion about the summit that
>>>> we wanted something more concrete in terms of format, length,
>>>> organisation, how we can set this up.
>>>>
>>>> Of course, no problem to take a step back, and the document is
>>>> definitely where we can work all together. I'm also in the same
>>>> situation as you: I'm helping using my Apache/community hat, but I
>>>> included proposals from my company (Dremio) in the doc. For instance,
>>>> I think it's important to "split" committees for sponsoring and track
>>>> content, just to avoid "sales talk" as it's a community event.
>>>> As few of us helped on summits in the past, experience and how it went
>>>> are valuable.
>>>>
>>>> Special thank to you, Brian and Ed about the comments.
>>>>
>>>> I propose to work on the doc: anyone from the community can edit the
>>>> doc.
>>>>
>>>> Thanks,
>>>> Regards
>>>> JB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Sep 22, 2023 at 8:18 PM Ryan Blue  wrote:
>>>> >
>>>> > To me, this proposal is getting a bit ahead of where I'm comfortable.
>>>> I was expecting this to address some of the big questions about how to run
>>>> an event like this from an open source community, but it seems to be
>>>> assuming that the event will happen and addresses logistics.
>>>> >
>>>> > Here's an example of what I mean: the doc suggests the different
>>>> levels of sponsors, slots at those levels, and prices. But I think the main
>>>> question the community has to think through before we get there --- and
>>>> what I'd expect in a proposal --- is how to ensure that such an event
>>>> remains commercially neutral. Because you're asking for this to be an
>>>> "official" event from the community and using its trademarks, we need to
>>>> think through how we want to strike that balance.
>>>> >
>>>> > I'd like to see more of a proposal around:
>>>> > 1. Should the Iceberg community put on an event? Clearly, we like the
>>>> idea of exchanging ideas, but it goes beyond that.
>>>> > 2. How would we balance the interests of different parts of the
>>>> community and why should we take that approach?
>>>> >
>>>> > We have a lot of different companies contributing and building around
>>>> Iceberg. The last thing that we want to do is give the impression that the
>>>> community is in any way "pay-to-play" --- that's one of this community's
>>>> distinguishing features.
>>>> >
>>>> > Also, I want to disclose that I'm also associated with a vendor
>>>> (Tabular). With that hat on, I think we'd happily sponsor an event like
>>>> this. But with my community ha

Re: Proposal to fix the docs - this time it'll be different

2023-09-28 Thread Jean-Baptiste Onofré
Hi Brian

Thanks for the update. I will take a look.

Regards
JB

Le ven. 29 sept. 2023 à 07:05, Brian Olsen  a
écrit :

> Hey All,
>
> I know it's been a while but the first phase of the docs refactor has
> landed. I think it's at a decent point for everyone to take a look. To be
> clear, this is not going to replace the existing website yet, but get the
> first large landing of new docs to provide the initial proof of concept for
> the build and make incremental changes until we are comfortable making
> the swap. Once this is in and 1.4.0 goes out, I'll have to retroactively
> create tags for each prior version of the documentation. While that's
> happening, we can have someone else work on the look and feel of the
> website, to look closer to our current site.
>
> https://github.com/apache/iceberg/pull/8659
>
> Thanks! Let me know if you have any questions!
>
> - Bits
>
> On Thu, Jul 27, 2023 at 4:10 PM Szehon Ho  wrote:
>
>> Hi
>>
>> I'm ok with putting things back in Iceberg repo, it gets more visbility
>> on prs.  I guess it used to be a bit distracting, but now with more
>> projects in Iceberg (pyiceberg, rust) we have to anyway use tags to filter
>> through all the mails.
>>
>> Just wanted to +1 on Fokko/Ryan suggestion to avoid versioned doc
>> directories, I had a lot of difficulties in this part doing the last
>> release: https://github.com/apache/iceberg/issues/8151 , as did Anton
>> when I consulted him offline.
>>
>> For me, replacing the 'latest' branch with a tag would be the biggest win
>> as it caused me the most trouble.  If we can avoid versioned docs and use
>> tags across the board, that would be even better, I do think all the
>> versions are already tagged in Github on every release, if that is your
>> question?
>>
>> Thanks,
>> Szehon
>>
>> On Thu, Jul 27, 2023 at 2:31 AM Brian Olsen 
>> wrote:
>>
>>> Thanks Fokko,
>>>
>>> Yeah, I think tío address that we would need to switch to a tagging that
>>> prefixes the different project name as a namespace within the tags space
>>> (e.g. pyIceberg-0.4.0, rust-0.0.1, etc…). But certainly this would result
>>> in an explosion of tags as we continue to introduce more projects. I’m not
>>> sure if this makes it difficult to find things as long as you start to
>>> search the prefix in GitHub it should be easy enough to find. Has anyone
>>> else worked on a project where this type of tagging is applied? Are their
>>> any performance, searching, or other implications we are missing?
>>>
>>> Bits
>>>
>>> On Thu, Jul 27, 2023 at 4:18 AM Fokko Driesprong 
>>> wrote:
>>>
 Hey Brian,

 Thanks for raising this. As a release manager, I can confirm that the
 current structure is confusing, and I can also see the community
 struggling with this because they are willing to contribute to the docs,
 but cannot always find the place where to do this. I think the complexity
 of the current website mostly comes from the versioned docs. It would be
 great if we can find a way to make this easier. Instead of using the
 branches, we could also use the release tags and build the docs for those
 versions.

 I think switching to mkdocs-material is a great idea. We currently also
 use this for PyIceberg, and it works really well. My main concern is around
 merging everything together. Should we combine Java and Python in the same
 documentation? They have a different versioning scheme, so that would
 create a matrix of versions. Go and Rust
  is also in the
 making, so that would explode at some point.

 Cheers, Fokko

 Ps. Currently, PyIceberg uses the gh-pages branch for publishing the
 docs .


 Op do 27 jul 2023 om 00:04 schreef Brian Olsen >>> >:

> Hey all,
>
> I have some proposals I'd like to make to fixing the docs. I would
> want to do this in two phases.
>
> The first phase I'm proposing that we locate all the documentation
> (reference docs, website, and pyIceberg) back into the apache/iceberg
> repository. I explain my reasoning in the attached document. This phase
> would also update us from Hugo to MkDocs but keep all the content the 
> same.
>
> The second phase, is focused on iteratively building out the content
> that we've marked missing in some the proposal that Sam R. created along
> with a recent community member, Mahfuza. We will also restructure the
> content to following the diátaxis method (https://diataxis.fr/).
>
>
> https://docs.google.com/document/d/1WJXzcwC6isfoywcLY2lZ9gZN6i0JU1I2SIZmCkumbZc/edit#heading=h.gli9mc2ghfz1
>
> Let me know what you think and bring on the questions and criticisms
> please! :)
>
> Bits
>



Re: [VOTE] Release Apache Iceberg 1.4.0 RC1

2023-09-28 Thread Jean-Baptiste Onofré
+1 (non binding)

I checked:
- signatures and hash are ok
- asf headers are present
- no binary in the source distribution
- build is ok

NB: I’m working on a set of use cases with different data sets but it’s not
yet complete. I should have it for next release and be able to compare
queries time and behavior between releases.

Thanks !
Regards
JB

Le jeu. 28 sept. 2023 à 04:02, Anton Okolnychyi
 a écrit :

> Hi Everyone,
>
> I propose that we release the following RC as the official Apache Iceberg
> 1.4.0 release.
>
> The commit ID is 8f37faa6a21e863551b17992370edc0f8706465d
> * This corresponds to the tag: apache-iceberg-1.4.0-rc1
> * https://github.com/apache/iceberg/commits/apache-iceberg-1.4.0-rc1
> *
> https://github.com/apache/iceberg/tree/8f37faa6a21e863551b17992370edc0f8706465d
>
> The release tarball, signature, and checksums are here:
> * https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.4.0-rc1
>
> You can find the KEYS file here:
> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
> Convenience binary artifacts are staged on Nexus. The Maven repository URL
> is:
> *
> https://repository.apache.org/content/repositories/orgapacheiceberg-1145/
>
> Please download, verify, and test.
>
> Please vote in the next 72 hours. (Weekends excluded)
>
> [ ] +1 Release this as Apache Iceberg 1.4.0
> [ ] +0
> [ ] -1 Do not release this because...
>
> Only PMC members have binding votes, but other community members are
> encouraged to cast non-binding votes. This vote will pass if there are 3
> binding +1 votes and more binding +1 votes than -1 votes.
>
> - Anton


[DISCUSSION] Rename master branch as main for the main repository

2023-09-29 Thread Jean-Baptiste Onofré
Hi guys,

The Apache CoC (https://www.apache.org/foundation/policies/conduct)
especially contains section 5 about the wording we use. Several Apache
projects renamed the master branch to the main branch (Apache Karaf,
ActiveMQ, Airflow, ...).
As we already use main for go, rust, and python repositories, I wonder
(for consistency) if we should not rename master to main on the "main"
repository.

Apache INFRA can do this "smoothly" but we would have to do some changes:
- update build.gradle
- update README.md
- update to GH Actions (in .github/workflows/*)

Thoughts ?

Regards
JB


Re: Migration of PyIceberg to iceberg-python repository

2023-09-29 Thread Jean-Baptiste Onofré
Hi Fokko

+1 to move PyIceberg to iceberg-python repo.
Can we keep track of log history ? For me, it's not a blocker, we can
move even if we lose the history though.

Regards
JB

On Fri, Sep 29, 2023 at 1:25 PM Fokko Driesprong  wrote:
>
> Hey everyone 👋
>
> A while ago we discussed that Rust and Go are going into a separate 
> repository: https://lists.apache.org/thread/4s02lmwf1kyrxxdpj3q9w2fqnxq2llbn
>
> Since we just did the PyIcerg 0.5.0 release, I think it is a good moment to 
> migrate PyIceberg to iceberg-python as well: 
> https://github.com/apache/iceberg-python/pull/2 I went over the PRs that are 
> ready to merge and got them in. If there is anything missing, please let me 
> know.
>
> I would suggest merging the PR and leaving the source code in the main 
> repository for another week or so to make sure that we didn't miss anything.
>
> Since PyIceberg now also hosts the docs on the Github pages of the Iceberg 
> repository, moving PyIceberg will also free up the Github pages for the 
> migration of the docs back into the main repository.
>
> Let me know if there are any concerns.
>
> Kind regards,
> Fokko Driesprong


Re: Migration of PyIceberg to iceberg-python repository

2023-09-29 Thread Jean-Baptiste Onofré
Awesome, it looks even better ;)

Thanks !
Regards
JB

On Fri, Sep 29, 2023 at 2:31 PM Fokko Driesprong  wrote:
>
> Hey Ajantha,
>
> That's a great suggestion. I've followed the steps and created a new PR here: 
> https://github.com/apache/iceberg-python/pull/3
>
> The subdirectory-filter command moves a subdirectory to the root directory. 
> This way I still had to add some files afterward (.github/*, .gitignore, 
> etc.), these are in a separate commit. Please take a look.
>
> Thanks,
>
> Fokko
>
> Op vr 29 sep 2023 om 13:39 schreef Ajantha Bhat :
>>
>> I think we are gonna lose the history of commits if we merge the above PR.
>>
>> There are ways to move the subfolder into a new repo by retaining commit 
>> history.
>> For example:
>> - 
>> https://medium.com/@ayushya/move-directory-from-one-repository-to-another-preserving-git-history-d210fa049d4b
>> - https://gist.github.com/trongthanh/2779392
>>
>> Please give it a try.
>>
>> Thanks,
>> Ajantha
>>
>> On Fri, Sep 29, 2023 at 4:55 PM Fokko Driesprong  wrote:
>>>
>>> Hey everyone 👋
>>>
>>> A while ago we discussed that Rust and Go are going into a separate 
>>> repository: https://lists.apache.org/thread/4s02lmwf1kyrxxdpj3q9w2fqnxq2llbn
>>>
>>> Since we just did the PyIcerg 0.5.0 release, I think it is a good moment to 
>>> migrate PyIceberg to iceberg-python as well: 
>>> https://github.com/apache/iceberg-python/pull/2 I went over the PRs that 
>>> are ready to merge and got them in. If there is anything missing, please 
>>> let me know.
>>>
>>> I would suggest merging the PR and leaving the source code in the main 
>>> repository for another week or so to make sure that we didn't miss anything.
>>>
>>> Since PyIceberg now also hosts the docs on the Github pages of the Iceberg 
>>> repository, moving PyIceberg will also free up the Github pages for the 
>>> migration of the docs back into the main repository.
>>>
>>> Let me know if there are any concerns.
>>>
>>> Kind regards,
>>> Fokko Driesprong


Re: [VOTE] Release Apache Iceberg 1.4.0 RC1

2023-09-29 Thread Jean-Baptiste Onofré
Thanks Anton,

Do we have unit tests about filter pushdown ? Maybe worth to add
something around that right ?

Anyway, I'm adding filter test cases in my "samples" repo.

Thanks !
Regards
JB

On Fri, Sep 29, 2023 at 6:30 PM Anton Okolnychyi  wrote:
>
> Ugh, it looks like the filter pushdown issue is a regression. I tested 1.3.1 
> and it worked. I guess it is because we migrated to V2 filters and their 
> behavior is different. I need to take a closer look.
>
> On 2023/09/29 11:36:09 Eduard Tudenhoefner wrote:
> > +1 (non-binding)
> >
> > * validated checksum and signature
> > * checked license docs & ran RAT checks
> > * ran build and tests with JDK17
> > * ran some tests with Spark 3.4 + 3.5 and the new iceberg-aws-bundle.jar
> > * ran some internal tests
> >
> > I found two test issues with Flink (#8680
> > <https://github.com/apache/iceberg/issues/8680> and #8679
> > <https://github.com/apache/iceberg/issues/8679>) while running tests
> > locally, but that shouldn't block the release.
> >
> > Also thanks to Anton for doing the release and everyone else who
> > contributed!
> >
> > On Fri, Sep 29, 2023 at 11:44 AM Fokko Driesprong  wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks Anton for running the release and everyone who contributed! Checks
> > > I did:
> > >
> > >- Updated the docker-spark-iceberg repo
> > ><https://github.com/tabular-io/docker-spark-iceberg/pull/93>, and
> > >everything runs fine (still with Spark 3.4 since there were some 
> > > problems
> > >with Jupyters' Scala 2.13 kernel). This includes new new aws-bundle 🥳
> > >- Tested against Trino <https://github.com/trinodb/trino/pull/19188>,
> > >and found three differences, but expected:
> > >   - More defensive cleaning up of files on a failed commit, to make
> > >   table recovery easier when needed.
> > >   - A new property that's set on the table, indicating zstd
> > >   compression.
> > >   - Changes in the exceptions when binding a transform to a column
> > >   type that is not allowed.
> > >
> > > Kind regards, Fokko
> > >
> > >
> > > Op vr 29 sep 2023 om 07:35 schreef Jean-Baptiste Onofré 
> > > :
> > >
> > >> +1 (non binding)
> > >>
> > >> I checked:
> > >> - signatures and hash are ok
> > >> - asf headers are present
> > >> - no binary in the source distribution
> > >> - build is ok
> > >>
> > >> NB: I’m working on a set of use cases with different data sets but it’s
> > >> not yet complete. I should have it for next release and be able to 
> > >> compare
> > >> queries time and behavior between releases.
> > >>
> > >> Thanks !
> > >> Regards
> > >> JB
> > >>
> > >> Le jeu. 28 sept. 2023 à 04:02, Anton Okolnychyi
> > >>  a écrit :
> > >>
> > >>> Hi Everyone,
> > >>>
> > >>> I propose that we release the following RC as the official Apache
> > >>> Iceberg 1.4.0 release.
> > >>>
> > >>> The commit ID is 8f37faa6a21e863551b17992370edc0f8706465d
> > >>> * This corresponds to the tag: apache-iceberg-1.4.0-rc1
> > >>> * https://github.com/apache/iceberg/commits/apache-iceberg-1.4.0-rc1
> > >>> *
> > >>> https://github.com/apache/iceberg/tree/8f37faa6a21e863551b17992370edc0f8706465d
> > >>>
> > >>> The release tarball, signature, and checksums are here:
> > >>> *
> > >>> https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.4.0-rc1
> > >>>
> > >>> You can find the KEYS file here:
> > >>> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
> > >>>
> > >>> Convenience binary artifacts are staged on Nexus. The Maven repository
> > >>> URL is:
> > >>> *
> > >>> https://repository.apache.org/content/repositories/orgapacheiceberg-1145/
> > >>>
> > >>> Please download, verify, and test.
> > >>>
> > >>> Please vote in the next 72 hours. (Weekends excluded)
> > >>>
> > >>> [ ] +1 Release this as Apache Iceberg 1.4.0
> > >>> [ ] +0
> > >>> [ ] -1 Do not release this because...
> > >>>
> > >>> Only PMC members have binding votes, but other community members are
> > >>> encouraged to cast non-binding votes. This vote will pass if there are 3
> > >>> binding +1 votes and more binding +1 votes than -1 votes.
> > >>>
> > >>> - Anton
> > >>
> > >>
> >


Re: [VOTE] Release Apache Iceberg 1.4.0 RC2

2023-09-29 Thread Jean-Baptiste Onofré
+1 (non binding)

As for RC1, I checked:
- signature and hash are OK
- ASF headers are there
- source distribution doesn't contain binary
- build is OK

Thanks,
Regards
JB

On Sat, Sep 30, 2023 at 1:25 AM Anton Okolnychyi
 wrote:
>
> Hi Everyone,
>
> I propose that we release the following RC as the official Apache Iceberg 
> 1.4.0 release.
>
> The commit ID is 10367c380098c2e06a49521a33681ac7f6c64b2c
> * This corresponds to the tag: apache-iceberg-1.4.0-rc2
> * https://github.com/apache/iceberg/commits/apache-iceberg-1.4.0-rc2
> * 
> https://github.com/apache/iceberg/tree/10367c380098c2e06a49521a33681ac7f6c64b2c
>
> The release tarball, signature, and checksums are here:
> * https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.4.0-rc2
>
> You can find the KEYS file here:
> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
> Convenience binary artifacts are staged on Nexus. The Maven repository URL is:
> * https://repository.apache.org/content/repositories/orgapacheiceberg-1146/
>
> Please download, verify, and test.
>
> Please vote in the next 72 hours. (Weekends excluded)
>
> [ ] +1 Release this as Apache Iceberg 1.4.0
> [ ] +0
> [ ] -1 Do not release this because...
>
> Only PMC members have binding votes, but other community members are 
> encouraged to cast non-binding votes. This vote will pass if there are 3 
> binding +1 votes and more binding +1 votes than -1 votes.
>
> - Anton
>


Re: [VOTE] Release Apache Iceberg 1.4.0 RC1

2023-09-30 Thread Jean-Baptiste Onofré
Hi Anton,

Yeah, I saw the addition in PR 8682. Thanks for that !

Sure, I will check and also add in my "manual tests" (iceberg-samples
repo I'm working on, preparing Icekube as manual test platform).

Thanks !
Regards
JB

On Sat, Sep 30, 2023 at 7:42 AM Anton Okolnychyi  wrote:
>
> JB, we do have tests for converting filters as well as for checking actual 
> pushdown. Looks like we initially missed decimals but I've added them in PR 
> 8682.
>
> The more tests we have the better. If you have a bit of time, it would be 
> nice to go back and check what else we missed. I'd start by looking at 
> TestSparkV2Filter and TestFilterPushdown classes.
>
> On 2023/09/30 04:56:07 Jean-Baptiste Onofré wrote:
> > Thanks Anton,
> >
> > Do we have unit tests about filter pushdown ? Maybe worth to add
> > something around that right ?
> >
> > Anyway, I'm adding filter test cases in my "samples" repo.
> >
> > Thanks !
> > Regards
> > JB
> >
> > On Fri, Sep 29, 2023 at 6:30 PM Anton Okolnychyi  
> > wrote:
> > >
> > > Ugh, it looks like the filter pushdown issue is a regression. I tested 
> > > 1.3.1 and it worked. I guess it is because we migrated to V2 filters and 
> > > their behavior is different. I need to take a closer look.
> > >
> > > On 2023/09/29 11:36:09 Eduard Tudenhoefner wrote:
> > > > +1 (non-binding)
> > > >
> > > > * validated checksum and signature
> > > > * checked license docs & ran RAT checks
> > > > * ran build and tests with JDK17
> > > > * ran some tests with Spark 3.4 + 3.5 and the new iceberg-aws-bundle.jar
> > > > * ran some internal tests
> > > >
> > > > I found two test issues with Flink (#8680
> > > > <https://github.com/apache/iceberg/issues/8680> and #8679
> > > > <https://github.com/apache/iceberg/issues/8679>) while running tests
> > > > locally, but that shouldn't block the release.
> > > >
> > > > Also thanks to Anton for doing the release and everyone else who
> > > > contributed!
> > > >
> > > > On Fri, Sep 29, 2023 at 11:44 AM Fokko Driesprong  
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Thanks Anton for running the release and everyone who contributed! 
> > > > > Checks
> > > > > I did:
> > > > >
> > > > >- Updated the docker-spark-iceberg repo
> > > > ><https://github.com/tabular-io/docker-spark-iceberg/pull/93>, and
> > > > >everything runs fine (still with Spark 3.4 since there were some 
> > > > > problems
> > > > >with Jupyters' Scala 2.13 kernel). This includes new new 
> > > > > aws-bundle 🥳
> > > > >- Tested against Trino 
> > > > > <https://github.com/trinodb/trino/pull/19188>,
> > > > >and found three differences, but expected:
> > > > >   - More defensive cleaning up of files on a failed commit, to 
> > > > > make
> > > > >   table recovery easier when needed.
> > > > >   - A new property that's set on the table, indicating zstd
> > > > >   compression.
> > > > >   - Changes in the exceptions when binding a transform to a column
> > > > >   type that is not allowed.
> > > > >
> > > > > Kind regards, Fokko
> > > > >
> > > > >
> > > > > Op vr 29 sep 2023 om 07:35 schreef Jean-Baptiste Onofré 
> > > > > :
> > > > >
> > > > >> +1 (non binding)
> > > > >>
> > > > >> I checked:
> > > > >> - signatures and hash are ok
> > > > >> - asf headers are present
> > > > >> - no binary in the source distribution
> > > > >> - build is ok
> > > > >>
> > > > >> NB: I’m working on a set of use cases with different data sets but 
> > > > >> it’s
> > > > >> not yet complete. I should have it for next release and be able to 
> > > > >> compare
> > > > >> queries time and behavior between releases.
> > > > >>
> > > > >> Thanks !
> > > > >> Regards
> > > > >> JB
> > > > >>
> > > > >> Le jeu. 28 sep

Re: Kafka Connect sink

2023-10-02 Thread Jean-Baptiste Onofré
Hi Bryan

That’s a great news ! Thanks a lot for the proposal.

I will take a look on the PR and existing connector.
I’m sure the Iceberg community will be very happy to see this and we will
able to add new features and improvements thanks to the community feedback.
I would be more than happy to help for donation (I know that the connector
is already under Apache license but we have to double check the ICLA for
the initial contributors etc , just to be sure we are good there).

Thanks again !

Let’s see what the others are thinking.

Regards
JB

Le lun. 2 oct. 2023 à 19:39, Bryan Keller  a écrit :

> Hi all,
>
> We at Tabular would like to contribute our Kafka Connect Iceberg sink to
> the Iceberg project. It would be great to give Iceberg users another option
> for landing data from Kafka into Iceberg tables that is supported by the
> Iceberg community. Kafka Connect is a part of systems from AWS, Confluent,
> Redpanda, and so on, so it can make landing data from Kafka into Iceberg
> much easier for those without a Flink or Spark infrastructure.
>
> There are a few Iceberg sink implementations out there for Kafka Connect,
> but we feel this one covers most of the features users have requested, such
> as exactly-once processing, schema evolution, and multi-table fanout. And
> having the sink backed by the Iceberg community will help it to evolve and
> improve over time.
>
> If this sounds like something everyone would like to see added to Iceberg,
> I've opened a PR that includes some initial pieces of the sink. The thought
> was to break up the submission into parts so each could be reviewed more
> easily. Some design docs and notes can be found in the original repo here:
> https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs
>
> We'd like to get feedback if others approve of moving forward with this or
> not.
>
> Thanks,
> Bryan
>
>


Re: [DISCUSSION] Rename master branch as main for the main repository

2023-10-02 Thread Jean-Baptiste Onofré
Thanks all for your feedback.

I will prepare the renaming then, I will keep you posted.

Regards
JB

On Tue, Oct 3, 2023 at 2:36 AM Renjie Liu  wrote:
>
> +1
>
> Sent from my iPhone
>
> On Oct 3, 2023, at 08:18, John Zhuge  wrote:
>
> 
> +1
>
> On Mon, Oct 2, 2023 at 2:48 PM Brian Olsen  wrote:
>>
>> As with any of these changes, the one and only inescapable side-effect is 
>> that users' local environments will not be able to be updated. GitHub has 
>> otherwise made it very simple to rename branches to accommodate this use 
>> case. https://github.com/github/renaming Any old references to master will 
>> on the GitHub site itself will reroute to main.
>>
>> It's a small annoyance to make the Iceberg community more inclusive. For 
>> those that aren't aware of the why: 
>> https://en.wikipedia.org/wiki/Master/slave_(technology)#Terminology_concerns.
>>
>> On Mon, Oct 2, 2023 at 4:34 PM Hussein Awala  wrote:
>>>
>>> +1
>>>
>>> On Mon, Oct 2, 2023 at 11:27 PM Anton Okolnychyi  
>>> wrote:
>>>>
>>>> +1
>>>>
>>>> On 2023/10/02 20:12:37 Bryan Keller wrote:
>>>> > Hearty +1 from me
>>>> >
>>>> >
>>>> >
>>>> > > On Sep 29, 2023, at 5:37 AM, Brian Olsen  
>>>> > > wrote:
>>>> > >
>>>> > >
>>>> >
>>>> > > 
>>>> > >
>>>> > > +1000
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > Let me know how I can help!
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Fri, Sep 29, 2023 at 7:35 AM Jean-Baptiste Onofré
>>>> > > <[j...@nanthrax.net](mailto:j...@nanthrax.net)> wrote:
>>>> > >
>>>> > >
>>>> >
>>>> > >> Hi guys,
>>>> > >
>>>> > >  The Apache CoC (<https://www.apache.org/foundation/policies/conduct>)
>>>> > >  especially contains section 5 about the wording we use. Several Apache
>>>> > >  projects renamed the master branch to the main branch (Apache Karaf,
>>>> > >  ActiveMQ, Airflow, ...).
>>>> > >  As we already use main for go, rust, and python repositories, I wonder
>>>> > >  (for consistency) if we should not rename master to main on the "main"
>>>> > >  repository.
>>>> > >
>>>> > >  Apache INFRA can do this "smoothly" but we would have to do some 
>>>> > > changes:
>>>> > >  \- update build.gradle
>>>> > >  \- update README.md
>>>> > >  \- update to GH Actions (in .github/workflows/*)
>>>> > >
>>>> > >  Thoughts ?
>>>> > >
>>>> > >  Regards
>>>> > >  JB
>>>> > >
>>>> >
>>>> >
>
>
>
> --
> John Zhuge


Re: Kafka Connect sink

2023-10-02 Thread Jean-Baptiste Onofré
>From my standpoint, Kafka Connect is interesting to also address
processing logic without Spark or Flink runtime. Definitely
interesting to have Kafka integration/processing (even for me Kafka
and Kafka Connect are two different things ;)).

For pure data ingestion part, I think it would make sense to have a
"ingestion layer" in Iceberg where we can have pluggable IO and where
we can both implement our own IO (specifically for Iceberg as Apache
Beam IOs for instance) and where we can leverage existing integration
framework (like Apache Camel).
Why not have JMS/ActiveMQ integration in Iceberg via an IO, or Pulsar
integration ? I think having such layer would be very interesting for
the community and we can have more users (it's what happened at Apache
Beam, the first IOs were only Google "centric" (bigtable, bigquery,
gfs, ...), we added new IOs (JMS, Kafka, JDBC, ...) and we saw a great
benefit for adoption :)).
DISCLAIMER: I've implemented IOs in Beam and components in Camel ;)

I will do some investigation about that. I will draft a proposal.

Regards
JB

On Tue, Oct 3, 2023 at 7:23 AM Ajantha Bhat  wrote:
>
> Hi Bryan,
>
> I am very happy to see this contribution.
> I have recently tested this project with Nessie catalog and very much liked 
> it.
>
> However, I still don't know the benefits of using kafka-connect instead of 
> directly consuming
> from the kafka like Delta-lake's implementation.
> https://github.com/delta-io/kafka-delta-ingest/blob/main/doc/DESIGN.md
>
> I am not an expert in this ingestion domain and recently got started.
> I hope someone will chime in and we will have detailed analysis over the 
> design.
>
> Looking forward to this feature.
>
> Thanks,
> Ajantha
>
> On Tue, Oct 3, 2023 at 12:18 AM Jean-Baptiste Onofré  
> wrote:
>>
>> Hi Bryan
>>
>> That’s a great news ! Thanks a lot for the proposal.
>>
>> I will take a look on the PR and existing connector.
>> I’m sure the Iceberg community will be very happy to see this and we will 
>> able to add new features and improvements thanks to the community feedback.
>> I would be more than happy to help for donation (I know that the connector 
>> is already under Apache license but we have to double check the ICLA for the 
>> initial contributors etc , just to be sure we are good there).
>>
>> Thanks again !
>>
>> Let’s see what the others are thinking.
>>
>> Regards
>> JB
>>
>> Le lun. 2 oct. 2023 à 19:39, Bryan Keller  a écrit :
>>>
>>> Hi all,
>>>
>>> We at Tabular would like to contribute our Kafka Connect Iceberg sink to 
>>> the Iceberg project. It would be great to give Iceberg users another option 
>>> for landing data from Kafka into Iceberg tables that is supported by the 
>>> Iceberg community. Kafka Connect is a part of systems from AWS, Confluent, 
>>> Redpanda, and so on, so it can make landing data from Kafka into Iceberg 
>>> much easier for those without a Flink or Spark infrastructure.
>>>
>>> There are a few Iceberg sink implementations out there for Kafka Connect, 
>>> but we feel this one covers most of the features users have requested, such 
>>> as exactly-once processing, schema evolution, and multi-table fanout. And 
>>> having the sink backed by the Iceberg community will help it to evolve and 
>>> improve over time.
>>>
>>> If this sounds like something everyone would like to see added to Iceberg, 
>>> I've opened a PR that includes some initial pieces of the sink. The thought 
>>> was to break up the submission into parts so each could be reviewed more 
>>> easily. Some design docs and notes can be found in the original repo here: 
>>> https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs
>>>
>>> We'd like to get feedback if others approve of moving forward with this or 
>>> not.
>>>
>>> Thanks,
>>> Bryan
>>>


Re: Kafka Connect sink

2023-10-03 Thread Jean-Baptiste Onofré
Hi Bryan

Yes, very good point. Especially since Kafka Connect is part of the
"Kafka ecosystem", basically anyone using Kafka has Kafka Connect
available. The only requirement is to "enable" Kafka Connect
(standalone or distributed) and then the Kafka Connect API is
available.

It's a good idea to leverage Kafka Connect for data ingestion solution
(especially without an engine). I was also thinking about Apache
Camel, but it's a framework, not a runtime, meaning that we have to
build everything.

So, yeah, definitely a good idea to leverage Kafka Connect.

Thanks
Regards
JB

On Tue, Oct 3, 2023 at 5:30 PM Bryan Keller  wrote:
>
> Thanks for the feedback. Many people use Kafka Connect today, for both 
> loading data into Kafka and writing data from Kafka, so having an Iceberg 
> sink allows someone to make use of their existing infrastructure to write to 
> Iceberg. There are sink connectors for Delta/Databricks, Snowflake, Hudi, etc 
> so it brings Iceberg up to par in that regard and allows someone to more 
> easily switch to using Iceberg as well.
>
> I'll definitely be interested in reading any proposals on data ingestion 
> solutions.
>
> -Bryan
>
> > On Oct 2, 2023, at 11:03 PM, Jean-Baptiste Onofré  wrote:
> >
> > From my standpoint, Kafka Connect is interesting to also address
> > processing logic without Spark or Flink runtime. Definitely
> > interesting to have Kafka integration/processing (even for me Kafka
> > and Kafka Connect are two different things ;)).
> >
> > For pure data ingestion part, I think it would make sense to have a
> > "ingestion layer" in Iceberg where we can have pluggable IO and where
> > we can both implement our own IO (specifically for Iceberg as Apache
> > Beam IOs for instance) and where we can leverage existing integration
> > framework (like Apache Camel).
> > Why not have JMS/ActiveMQ integration in Iceberg via an IO, or Pulsar
> > integration ? I think having such layer would be very interesting for
> > the community and we can have more users (it's what happened at Apache
> > Beam, the first IOs were only Google "centric" (bigtable, bigquery,
> > gfs, ...), we added new IOs (JMS, Kafka, JDBC, ...) and we saw a great
> > benefit for adoption :)).
> > DISCLAIMER: I've implemented IOs in Beam and components in Camel ;)
> >
> > I will do some investigation about that. I will draft a proposal.
> >
> > Regards
> > JB
> >
> > On Tue, Oct 3, 2023 at 7:23 AM Ajantha Bhat  wrote:
> >>
> >> Hi Bryan,
> >>
> >> I am very happy to see this contribution.
> >> I have recently tested this project with Nessie catalog and very much 
> >> liked it.
> >>
> >> However, I still don't know the benefits of using kafka-connect instead of 
> >> directly consuming
> >> from the kafka like Delta-lake's implementation.
> >> https://github.com/delta-io/kafka-delta-ingest/blob/main/doc/DESIGN.md
> >>
> >> I am not an expert in this ingestion domain and recently got started.
> >> I hope someone will chime in and we will have detailed analysis over the 
> >> design.
> >>
> >> Looking forward to this feature.
> >>
> >> Thanks,
> >> Ajantha
> >>
> >> On Tue, Oct 3, 2023 at 12:18 AM Jean-Baptiste Onofré  
> >> wrote:
> >>>
> >>> Hi Bryan
> >>>
> >>> That’s a great news ! Thanks a lot for the proposal.
> >>>
> >>> I will take a look on the PR and existing connector.
> >>> I’m sure the Iceberg community will be very happy to see this and we will 
> >>> able to add new features and improvements thanks to the community 
> >>> feedback.
> >>> I would be more than happy to help for donation (I know that the 
> >>> connector is already under Apache license but we have to double check the 
> >>> ICLA for the initial contributors etc , just to be sure we are good 
> >>> there).
> >>>
> >>> Thanks again !
> >>>
> >>> Let’s see what the others are thinking.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> Le lun. 2 oct. 2023 à 19:39, Bryan Keller  a écrit :
> >>>>
> >>>> Hi all,
> >>>>
> >>>> We at Tabular would like to contribute our Kafka Connect Iceberg sink to 
> >>>> the Iceberg project. It would be great to give Iceberg users another 
> >>>> option for landing data from Kafka into Iceberg tables t

Re: [DISCUSSION] Rename master branch as main for the main repository

2023-10-04 Thread Jean-Baptiste Onofré
Thanks again for your feedback. As we have a consensus, I'm moving forward:

1. I will create a PR to update resources to use main instead of
master (mainly the .github/workflows/* files)
2. I will do a pass on the website/doc repository to create PRs there
if needed as well (renaming master to main)
3. I will create a INFRA ticket to do the rename

I will do that today (my time).

Regards
JB

On Thu, Oct 5, 2023 at 3:02 AM Daniel Weeks  wrote:
>
> +1
>
> On Wed, Oct 4, 2023, 3:08 PM Julien Le Dem  
> wrote:
>>
>> +1
>>
>> On Mon, Oct 2, 2023 at 11:09 PM Fokko Driesprong  wrote:
>>>
>>> Big +1!
>>>
>>> Thanks for raising this JB!
>>>
>>> Kind regards,
>>> Fokko
>>>
>>> Op di 3 okt 2023 om 07:56 schreef Jean-Baptiste Onofré :
>>>>
>>>> Thanks all for your feedback.
>>>>
>>>> I will prepare the renaming then, I will keep you posted.
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On Tue, Oct 3, 2023 at 2:36 AM Renjie Liu  wrote:
>>>> >
>>>> > +1
>>>> >
>>>> > Sent from my iPhone
>>>> >
>>>> > On Oct 3, 2023, at 08:18, John Zhuge  wrote:
>>>> >
>>>> > 
>>>> > +1
>>>> >
>>>> > On Mon, Oct 2, 2023 at 2:48 PM Brian Olsen  
>>>> > wrote:
>>>> >>
>>>> >> As with any of these changes, the one and only inescapable side-effect 
>>>> >> is that users' local environments will not be able to be updated. 
>>>> >> GitHub has otherwise made it very simple to rename branches to 
>>>> >> accommodate this use case. https://github.com/github/renaming Any old 
>>>> >> references to master will on the GitHub site itself will reroute to 
>>>> >> main.
>>>> >>
>>>> >> It's a small annoyance to make the Iceberg community more inclusive. 
>>>> >> For those that aren't aware of the why: 
>>>> >> https://en.wikipedia.org/wiki/Master/slave_(technology)#Terminology_concerns.
>>>> >>
>>>> >> On Mon, Oct 2, 2023 at 4:34 PM Hussein Awala  wrote:
>>>> >>>
>>>> >>> +1
>>>> >>>
>>>> >>> On Mon, Oct 2, 2023 at 11:27 PM Anton Okolnychyi 
>>>> >>>  wrote:
>>>> >>>>
>>>> >>>> +1
>>>> >>>>
>>>> >>>> On 2023/10/02 20:12:37 Bryan Keller wrote:
>>>> >>>> > Hearty +1 from me
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > > On Sep 29, 2023, at 5:37 AM, Brian Olsen 
>>>> >>>> > >  wrote:
>>>> >>>> > >
>>>> >>>> > >
>>>> >>>> >
>>>> >>>> > > 
>>>> >>>> > >
>>>> >>>> > > +1000
>>>> >>>> > >
>>>> >>>> > >
>>>> >>>> > >
>>>> >>>> > >
>>>> >>>> > > Let me know how I can help!
>>>> >>>> > >
>>>> >>>> > >
>>>> >>>> > >
>>>> >>>> > >
>>>> >>>> > > On Fri, Sep 29, 2023 at 7:35 AM Jean-Baptiste Onofré
>>>> >>>> > > <[j...@nanthrax.net](mailto:j...@nanthrax.net)> wrote:
>>>> >>>> > >
>>>> >>>> > >
>>>> >>>> >
>>>> >>>> > >> Hi guys,
>>>> >>>> > >
>>>> >>>> > >  The Apache CoC 
>>>> >>>> > > (<https://www.apache.org/foundation/policies/conduct>)
>>>> >>>> > >  especially contains section 5 about the wording we use. Several 
>>>> >>>> > > Apache
>>>> >>>> > >  projects renamed the master branch to the main branch (Apache 
>>>> >>>> > > Karaf,
>>>> >>>> > >  ActiveMQ, Airflow, ...).
>>>> >>>> > >  As we already use main for go, rust, and python repositories, I 
>>>> >>>> > > wonder
>>>> >>>> > >  (for consistency) if we should not rename master to main on the 
>>>> >>>> > > "main"
>>>> >>>> > >  repository.
>>>> >>>> > >
>>>> >>>> > >  Apache INFRA can do this "smoothly" but we would have to do some 
>>>> >>>> > > changes:
>>>> >>>> > >  \- update build.gradle
>>>> >>>> > >  \- update README.md
>>>> >>>> > >  \- update to GH Actions (in .github/workflows/*)
>>>> >>>> > >
>>>> >>>> > >  Thoughts ?
>>>> >>>> > >
>>>> >>>> > >  Regards
>>>> >>>> > >  JB
>>>> >>>> > >
>>>> >>>> >
>>>> >>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > John Zhuge


Re: [DISCUSSION] Rename master branch as main for the main repository

2023-10-05 Thread Jean-Baptiste Onofré
Hi guys

Here's the PR using main instead of master:

https://github.com/apache/iceberg/pull/8722

I will create the INFRA ticket to actually rename master to main when
the PR is approved. The PR will be merged just after the actual rename
by Apache infra.

It should have no impact for the existing PRs, etc. The only impact is
for users/devs: they have to change their local remote configuration
to use the new URL.
I propose to send a message on the mailing list to announce the rename
is effective and explaining the change users/devs should do.

Thanks !
Regards
JB

On Thu, Oct 5, 2023 at 7:32 AM Jean-Baptiste Onofré  wrote:
>
> Thanks again for your feedback. As we have a consensus, I'm moving forward:
>
> 1. I will create a PR to update resources to use main instead of
> master (mainly the .github/workflows/* files)
> 2. I will do a pass on the website/doc repository to create PRs there
> if needed as well (renaming master to main)
> 3. I will create a INFRA ticket to do the rename
>
> I will do that today (my time).
>
> Regards
> JB
>
> On Thu, Oct 5, 2023 at 3:02 AM Daniel Weeks  wrote:
> >
> > +1
> >
> > On Wed, Oct 4, 2023, 3:08 PM Julien Le Dem  
> > wrote:
> >>
> >> +1
> >>
> >> On Mon, Oct 2, 2023 at 11:09 PM Fokko Driesprong  wrote:
> >>>
> >>> Big +1!
> >>>
> >>> Thanks for raising this JB!
> >>>
> >>> Kind regards,
> >>> Fokko
> >>>
> >>> Op di 3 okt 2023 om 07:56 schreef Jean-Baptiste Onofré 
> >>> :
> >>>>
> >>>> Thanks all for your feedback.
> >>>>
> >>>> I will prepare the renaming then, I will keep you posted.
> >>>>
> >>>> Regards
> >>>> JB
> >>>>
> >>>> On Tue, Oct 3, 2023 at 2:36 AM Renjie Liu  
> >>>> wrote:
> >>>> >
> >>>> > +1
> >>>> >
> >>>> > Sent from my iPhone
> >>>> >
> >>>> > On Oct 3, 2023, at 08:18, John Zhuge  wrote:
> >>>> >
> >>>> > 
> >>>> > +1
> >>>> >
> >>>> > On Mon, Oct 2, 2023 at 2:48 PM Brian Olsen  
> >>>> > wrote:
> >>>> >>
> >>>> >> As with any of these changes, the one and only inescapable 
> >>>> >> side-effect is that users' local environments will not be able to be 
> >>>> >> updated. GitHub has otherwise made it very simple to rename branches 
> >>>> >> to accommodate this use case. https://github.com/github/renaming Any 
> >>>> >> old references to master will on the GitHub site itself will reroute 
> >>>> >> to main.
> >>>> >>
> >>>> >> It's a small annoyance to make the Iceberg community more inclusive. 
> >>>> >> For those that aren't aware of the why: 
> >>>> >> https://en.wikipedia.org/wiki/Master/slave_(technology)#Terminology_concerns.
> >>>> >>
> >>>> >> On Mon, Oct 2, 2023 at 4:34 PM Hussein Awala  wrote:
> >>>> >>>
> >>>> >>> +1
> >>>> >>>
> >>>> >>> On Mon, Oct 2, 2023 at 11:27 PM Anton Okolnychyi 
> >>>> >>>  wrote:
> >>>> >>>>
> >>>> >>>> +1
> >>>> >>>>
> >>>> >>>> On 2023/10/02 20:12:37 Bryan Keller wrote:
> >>>> >>>> > Hearty +1 from me
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> > > On Sep 29, 2023, at 5:37 AM, Brian Olsen 
> >>>> >>>> > >  wrote:
> >>>> >>>> > >
> >>>> >>>> > >
> >>>> >>>> >
> >>>> >>>> > > 
> >>>> >>>> > >
> >>>> >>>> > > +1000
> >>>> >>>> > >
> >>>> >>>> > >
> >>>> >>>> > >
> >>>> >>>> > >
> >>>> >>>> > > Let me know how I can help!
> >>>> >>>> > >
> >>>> >>>> > >
> >>>> >>>> > >
> >>>> >>>> > >
> >>>> >>>> > > On Fri, Sep 29, 2023 at 7:35 AM Jean-Baptiste Onofré
> >>>> >>>> > > <[j...@nanthrax.net](mailto:j...@nanthrax.net)> wrote:
> >>>> >>>> > >
> >>>> >>>> > >
> >>>> >>>> >
> >>>> >>>> > >> Hi guys,
> >>>> >>>> > >
> >>>> >>>> > >  The Apache CoC 
> >>>> >>>> > > (<https://www.apache.org/foundation/policies/conduct>)
> >>>> >>>> > >  especially contains section 5 about the wording we use. 
> >>>> >>>> > > Several Apache
> >>>> >>>> > >  projects renamed the master branch to the main branch (Apache 
> >>>> >>>> > > Karaf,
> >>>> >>>> > >  ActiveMQ, Airflow, ...).
> >>>> >>>> > >  As we already use main for go, rust, and python repositories, 
> >>>> >>>> > > I wonder
> >>>> >>>> > >  (for consistency) if we should not rename master to main on 
> >>>> >>>> > > the "main"
> >>>> >>>> > >  repository.
> >>>> >>>> > >
> >>>> >>>> > >  Apache INFRA can do this "smoothly" but we would have to do 
> >>>> >>>> > > some changes:
> >>>> >>>> > >  \- update build.gradle
> >>>> >>>> > >  \- update README.md
> >>>> >>>> > >  \- update to GH Actions (in .github/workflows/*)
> >>>> >>>> > >
> >>>> >>>> > >  Thoughts ?
> >>>> >>>> > >
> >>>> >>>> > >  Regards
> >>>> >>>> > >  JB
> >>>> >>>> > >
> >>>> >>>> >
> >>>> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > John Zhuge


[PROPOSAL] Regular release pace & some post release actions

2023-10-06 Thread Jean-Baptiste Onofré
Hi guys,

I would like to propose some improvements on our release.

1. Predictable & regular release pace

We started this discussion quickly on the 1.4.0 vote thread: I think
it would be interesting for the community (both our users and also
companies leveraging Iceberg in their products) to have a regular &
predictable release pace.
It would give a kind of roadmap/expected dates for our users.
According to our previous release dates, I propose to target a release
per quarter.
It doesn't mean that we won't be able to release faster (if we want to
quickly include a fix or CVE or whatever, we can always cut a
release), but we would have a minimum of one release per quarter. As
today, the versioning will be discussed on the mailing list.
Website (https://iceberg.apache.org/releases/) should contain this
information and the next release target date.

2. Post release actions

I propose some new actions post release:

  2.1. Prepare an announcement email that will be sent to
dev@iceberg.apache.org and (also important) to annou...@apache.org.
The last "official" announcement we did was for Iceberg 1.0.0. As
annou...@apache.org is "processed" by the ASF communication team, and
Iceberg releases will be included in the ASF updates.
The release guide (https://iceberg.apache.org/how-to-release/) already
contains this point, but it seems we missed annou...@apache.org (it's
not obvious in the guide). I propose to be clearer on this point.

  2.2. Write a blog post with highlights on the release. This "release
highlights blog post" should be on https://iceberg.apache.org/blogs/.
For instance, for 1.4.0, we could mention Spark 3.5 support, push down
function, distributed planning, etc.

   2.3. When we update src distribution on dist.apache.org, we should
clean the previous release. Right now,
https://dist.apache.org/repos/dist/release/iceberg/ contains all
releases. ASF automatically copies all releases from dist.apache.org
to archive.apache.org (see https://archive.apache.org/dist/iceberg/).
Only the latest release should be on dist.apache.org, we should remove
previous releases and use archive.apache.org instead. I will propose a
PR for the website to use archive.apache.org for previous releases and
cleanup previous releases.
So, the action proposal here is to remove previous release artifacts
after new release upload.
Again, the release guide (https://iceberg.apache.org/how-to-release/)
mentions the upload of artifacts, but not the cleanup.

   2.4. We have a PR to upload the Iceberg DOAP file to the iceberg
repo (https://github.com/apache/iceberg/pull/8586). I propose to
update the DOAP file with new release (I will update the PR with all
releases).

If you agree with these topics, I'm volunteer to update the Release
Guide (https://iceberg.apache.org/how-to-release/)  with these points.
I'm also volunteering for the next release to test/validate this process.

Thoughts ?

Thanks,
Regards
JB


Re: [PROPOSAL] Regular release pace & some post release actions

2023-10-06 Thread Jean-Baptiste Onofré
Hi Ryan,

For the pace, yes, it's what I saw with the previous release date. My
proposal is to clearly state that on website (on release page),
something like "We target a release per quarter". Just to inform the
community.

About the other points:
2.1. Great, thanks !
2.2. Yes, release notes on releases page are fine. The proposal is
more to have some details about specific highlight points, with
examples for instance. Something like
http://nanthrax.blogspot.com/2022/04/apache-karaf-runtime-440-has-been.html.
It's a bit long for a release notes page, so it could be "linked" on
release notes page. About your point, I agree, but we already have
https://iceberg.apache.org/blogs/ with posts from different people.
How do we choose the blog posts here ? I guess these blog posts have
been submitted as PR and reviewed/merged. Maybe we can use the same
for release highlights ?
2.3. The cleanup should be done as soon as a new release is uploaded
to dist.apache.org (for instance, we still have Iceberg 0.14.1 on
https://dist.apache.org/repos/dist/release/iceberg/). The tags cleanup
is up to us, but for dist, ASF INFRA asks for cleanup (we should have
only the latest release on dist.apache.org) to limit the space use.
2.4. Cool, thanks ! I'm updating the PR with the DOAP.

Thanks again ! Much appreciated :)

Regards
JB

On Fri, Oct 6, 2023 at 8:34 PM Ryan Blue  wrote:
>
> The Iceberg community has already established a regular release cadence, 
> which is once per quarter. Here's the recent release history, minus patch 
> releases:
>
> - 1.4.0: 2023-10-04
> - 1.3.0: 2023-05-26
> - 1.2.0: 2023-03-20
> - 1.1.0: 2022-11-29
> - 1.0.0: 2022-10-14
> - 0.14.0: 2022-07-16
>
> As you can see, we've generally met the target, so I'm not sure why you're 
> suggesting a change.
>
> If your aim is for more strict adherence to the quarterly release target, I 
> don't think that's a good idea. I think I've mentioned this before, but I 
> think we want to avoid strict policies that inhibit our ability to make 
> reasonable decisions as a community, as was the case here to get Spark 3.5 
> out as soon as possible.
>
> For your other suggestions:
> 2.1. Sure, let's send announcements to the announce list. Note that this has 
> to happen after the website is updated, which causes delays right now. We're 
> working on fixing this.
> 2.2. I don't think it is a good idea for the project to host blog posts 
> because it puts the community in a very awkward position of choosing who can 
> post and what content can be there. And I think what you're asking for is 
> release notes, which we do post on the releases page. If you'd like to help 
> make these better, please do! We always need help translating from PR 
> descriptions to release notes that help people understand what is changing.
> 2.3. Yes, we do this periodically. We also need to clean up tags.
> 2.4. Go for it.
>
> As for the release guide, anyone is welcome to submit a pull request and we'd 
> love to have you contributing.
>
> Ryan
>
> On Fri, Oct 6, 2023 at 2:00 AM Jean-Baptiste Onofré  wrote:
>>
>> Hi guys,
>>
>> I would like to propose some improvements on our release.
>>
>> 1. Predictable & regular release pace
>>
>> We started this discussion quickly on the 1.4.0 vote thread: I think
>> it would be interesting for the community (both our users and also
>> companies leveraging Iceberg in their products) to have a regular &
>> predictable release pace.
>> It would give a kind of roadmap/expected dates for our users.
>> According to our previous release dates, I propose to target a release
>> per quarter.
>> It doesn't mean that we won't be able to release faster (if we want to
>> quickly include a fix or CVE or whatever, we can always cut a
>> release), but we would have a minimum of one release per quarter. As
>> today, the versioning will be discussed on the mailing list.
>> Website (https://iceberg.apache.org/releases/) should contain this
>> information and the next release target date.
>>
>> 2. Post release actions
>>
>> I propose some new actions post release:
>>
>>   2.1. Prepare an announcement email that will be sent to
>> dev@iceberg.apache.org and (also important) to annou...@apache.org.
>> The last "official" announcement we did was for Iceberg 1.0.0. As
>> annou...@apache.org is "processed" by the ASF communication team, and
>> Iceberg releases will be included in the ASF updates.
>> The release guide (https://iceberg.apache.org/how-to-release/) already
>> contains this point, but it seems we missed annou...@apache.org (

Re: [PROPOSAL] Regular release pace & some post release actions

2023-10-07 Thread Jean-Baptiste Onofré
Just to be concrete about "regular & predictable releases pace", the
proposal is to have one line on https://iceberg.apache.org/releases/
like this:

"Apache Iceberg releases are expected every quarter. Next target
release is 1.4.1 planned on Jan 24."

To be honest, only a few Apache projects do that (Karaf, Camel,
ActiveMQ, Subversion, ...), I like this to give "vision" to the
community :)

Regards
JB

On Sat, Oct 7, 2023 at 6:59 AM Jean-Baptiste Onofré  wrote:
>
> Hi Ryan,
>
> For the pace, yes, it's what I saw with the previous release date. My
> proposal is to clearly state that on website (on release page),
> something like "We target a release per quarter". Just to inform the
> community.
>
> About the other points:
> 2.1. Great, thanks !
> 2.2. Yes, release notes on releases page are fine. The proposal is
> more to have some details about specific highlight points, with
> examples for instance. Something like
> http://nanthrax.blogspot.com/2022/04/apache-karaf-runtime-440-has-been.html.
> It's a bit long for a release notes page, so it could be "linked" on
> release notes page. About your point, I agree, but we already have
> https://iceberg.apache.org/blogs/ with posts from different people.
> How do we choose the blog posts here ? I guess these blog posts have
> been submitted as PR and reviewed/merged. Maybe we can use the same
> for release highlights ?
> 2.3. The cleanup should be done as soon as a new release is uploaded
> to dist.apache.org (for instance, we still have Iceberg 0.14.1 on
> https://dist.apache.org/repos/dist/release/iceberg/). The tags cleanup
> is up to us, but for dist, ASF INFRA asks for cleanup (we should have
> only the latest release on dist.apache.org) to limit the space use.
> 2.4. Cool, thanks ! I'm updating the PR with the DOAP.
>
> Thanks again ! Much appreciated :)
>
> Regards
> JB
>
> On Fri, Oct 6, 2023 at 8:34 PM Ryan Blue  wrote:
> >
> > The Iceberg community has already established a regular release cadence, 
> > which is once per quarter. Here's the recent release history, minus patch 
> > releases:
> >
> > - 1.4.0: 2023-10-04
> > - 1.3.0: 2023-05-26
> > - 1.2.0: 2023-03-20
> > - 1.1.0: 2022-11-29
> > - 1.0.0: 2022-10-14
> > - 0.14.0: 2022-07-16
> >
> > As you can see, we've generally met the target, so I'm not sure why you're 
> > suggesting a change.
> >
> > If your aim is for more strict adherence to the quarterly release target, I 
> > don't think that's a good idea. I think I've mentioned this before, but I 
> > think we want to avoid strict policies that inhibit our ability to make 
> > reasonable decisions as a community, as was the case here to get Spark 3.5 
> > out as soon as possible.
> >
> > For your other suggestions:
> > 2.1. Sure, let's send announcements to the announce list. Note that this 
> > has to happen after the website is updated, which causes delays right now. 
> > We're working on fixing this.
> > 2.2. I don't think it is a good idea for the project to host blog posts 
> > because it puts the community in a very awkward position of choosing who 
> > can post and what content can be there. And I think what you're asking for 
> > is release notes, which we do post on the releases page. If you'd like to 
> > help make these better, please do! We always need help translating from PR 
> > descriptions to release notes that help people understand what is changing.
> > 2.3. Yes, we do this periodically. We also need to clean up tags.
> > 2.4. Go for it.
> >
> > As for the release guide, anyone is welcome to submit a pull request and 
> > we'd love to have you contributing.
> >
> > Ryan
> >
> > On Fri, Oct 6, 2023 at 2:00 AM Jean-Baptiste Onofré  
> > wrote:
> >>
> >> Hi guys,
> >>
> >> I would like to propose some improvements on our release.
> >>
> >> 1. Predictable & regular release pace
> >>
> >> We started this discussion quickly on the 1.4.0 vote thread: I think
> >> it would be interesting for the community (both our users and also
> >> companies leveraging Iceberg in their products) to have a regular &
> >> predictable release pace.
> >> It would give a kind of roadmap/expected dates for our users.
> >> According to our previous release dates, I propose to target a release
> >> per quarter.
> >> It doesn't mean that we won't be able to release faster (if we want to
> >> quickly include a fix or CVE or

Re: [PROPOSAL] Regular release pace & some post release actions

2023-10-07 Thread Jean-Baptiste Onofré
Yes, agree. Patch release is whenever needed.
The pace is more for « feature releases » and also the information on
website.

Regards
JB

Le sam. 7 oct. 2023 à 11:41, Renjie Liu  a écrit :

> I think there are two kinds of releases:
> 1. Feature release. That means to upgrade the minor part of the version
> number, e.g. 1.4.0, 1.5.0, etc.
> 2. Patch release. That's bug fixes to minor releases, which upgrades to
> the last part of each release version, e.g. 1.4.1, 1.4.2.
>
> I think the quarterly release should be applied to feature release, while
> patch release should be more frequent to fix bugs.
>
> On Sat, Oct 7, 2023 at 3:20 PM Jean-Baptiste Onofré 
> wrote:
>
>> Just to be concrete about "regular & predictable releases pace", the
>> proposal is to have one line on https://iceberg.apache.org/releases/
>> like this:
>>
>> "Apache Iceberg releases are expected every quarter. Next target
>> release is 1.4.1 planned on Jan 24."
>>
>> To be honest, only a few Apache projects do that (Karaf, Camel,
>> ActiveMQ, Subversion, ...), I like this to give "vision" to the
>> community :)
>>
>> Regards
>> JB
>>
>> On Sat, Oct 7, 2023 at 6:59 AM Jean-Baptiste Onofré 
>> wrote:
>> >
>> > Hi Ryan,
>> >
>> > For the pace, yes, it's what I saw with the previous release date. My
>> > proposal is to clearly state that on website (on release page),
>> > something like "We target a release per quarter". Just to inform the
>> > community.
>> >
>> > About the other points:
>> > 2.1. Great, thanks !
>> > 2.2. Yes, release notes on releases page are fine. The proposal is
>> > more to have some details about specific highlight points, with
>> > examples for instance. Something like
>> >
>> http://nanthrax.blogspot.com/2022/04/apache-karaf-runtime-440-has-been.html
>> .
>> > It's a bit long for a release notes page, so it could be "linked" on
>> > release notes page. About your point, I agree, but we already have
>> > https://iceberg.apache.org/blogs/ with posts from different people.
>> > How do we choose the blog posts here ? I guess these blog posts have
>> > been submitted as PR and reviewed/merged. Maybe we can use the same
>> > for release highlights ?
>> > 2.3. The cleanup should be done as soon as a new release is uploaded
>> > to dist.apache.org (for instance, we still have Iceberg 0.14.1 on
>> > https://dist.apache.org/repos/dist/release/iceberg/). The tags cleanup
>> > is up to us, but for dist, ASF INFRA asks for cleanup (we should have
>> > only the latest release on dist.apache.org) to limit the space use.
>> > 2.4. Cool, thanks ! I'm updating the PR with the DOAP.
>> >
>> > Thanks again ! Much appreciated :)
>> >
>> > Regards
>> > JB
>> >
>> > On Fri, Oct 6, 2023 at 8:34 PM Ryan Blue  wrote:
>> > >
>> > > The Iceberg community has already established a regular release
>> cadence, which is once per quarter. Here's the recent release history,
>> minus patch releases:
>> > >
>> > > - 1.4.0: 2023-10-04
>> > > - 1.3.0: 2023-05-26
>> > > - 1.2.0: 2023-03-20
>> > > - 1.1.0: 2022-11-29
>> > > - 1.0.0: 2022-10-14
>> > > - 0.14.0: 2022-07-16
>> > >
>> > > As you can see, we've generally met the target, so I'm not sure why
>> you're suggesting a change.
>> > >
>> > > If your aim is for more strict adherence to the quarterly release
>> target, I don't think that's a good idea. I think I've mentioned this
>> before, but I think we want to avoid strict policies that inhibit our
>> ability to make reasonable decisions as a community, as was the case here
>> to get Spark 3.5 out as soon as possible.
>> > >
>> > > For your other suggestions:
>> > > 2.1. Sure, let's send announcements to the announce list. Note that
>> this has to happen after the website is updated, which causes delays right
>> now. We're working on fixing this.
>> > > 2.2. I don't think it is a good idea for the project to host blog
>> posts because it puts the community in a very awkward position of choosing
>> who can post and what content can be there. And I think what you're asking
>> for is release notes, which we do post on the releases page. If you'd like
>> to help make these better, please do! We always need h

Re: [PROPOSAL] Regular release pace & some post release actions

2023-10-07 Thread Jean-Baptiste Onofré
It makes sense. A statement on website with « release every quarter » would
be great !

+1

Regards
JB

Le sam. 7 oct. 2023 à 20:09, Fokko Driesprong  a écrit :

> My 2ct,
>
> There is no harm in stating it explicitly, however, I'm not in favor of
> making it so explicit by pinning a date onto it (Jan 24). I would rather
> say that releases can be expected at least every quarter (so it doesn't
> need to be updated :)
>
> I noticed that the releases of Iceberg are also driven by the release
> cadence query engines. Once there is a new Flink or Spark release, an
> Iceberg release follows. I like to add *at least* because I see an uptake
> in the activity and I think want to release as often as possible, without
> introducing too much pressure on testing.
>
> Cheers, Fokko
>
>
>
> Op za 7 okt 2023 om 18:52 schreef Jean-Baptiste Onofré :
>
>> Yes, agree. Patch release is whenever needed.
>> The pace is more for « feature releases » and also the information on
>> website.
>>
>> Regards
>> JB
>>
>> Le sam. 7 oct. 2023 à 11:41, Renjie Liu  a
>> écrit :
>>
>>> I think there are two kinds of releases:
>>> 1. Feature release. That means to upgrade the minor part of the version
>>> number, e.g. 1.4.0, 1.5.0, etc.
>>> 2. Patch release. That's bug fixes to minor releases, which upgrades to
>>> the last part of each release version, e.g. 1.4.1, 1.4.2.
>>>
>>> I think the quarterly release should be applied to feature release,
>>> while patch release should be more frequent to fix bugs.
>>>
>>> On Sat, Oct 7, 2023 at 3:20 PM Jean-Baptiste Onofré 
>>> wrote:
>>>
>>>> Just to be concrete about "regular & predictable releases pace", the
>>>> proposal is to have one line on https://iceberg.apache.org/releases/
>>>> like this:
>>>>
>>>> "Apache Iceberg releases are expected every quarter. Next target
>>>> release is 1.4.1 planned on Jan 24."
>>>>
>>>> To be honest, only a few Apache projects do that (Karaf, Camel,
>>>> ActiveMQ, Subversion, ...), I like this to give "vision" to the
>>>> community :)
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On Sat, Oct 7, 2023 at 6:59 AM Jean-Baptiste Onofré 
>>>> wrote:
>>>> >
>>>> > Hi Ryan,
>>>> >
>>>> > For the pace, yes, it's what I saw with the previous release date. My
>>>> > proposal is to clearly state that on website (on release page),
>>>> > something like "We target a release per quarter". Just to inform the
>>>> > community.
>>>> >
>>>> > About the other points:
>>>> > 2.1. Great, thanks !
>>>> > 2.2. Yes, release notes on releases page are fine. The proposal is
>>>> > more to have some details about specific highlight points, with
>>>> > examples for instance. Something like
>>>> >
>>>> http://nanthrax.blogspot.com/2022/04/apache-karaf-runtime-440-has-been.html
>>>> .
>>>> > It's a bit long for a release notes page, so it could be "linked" on
>>>> > release notes page. About your point, I agree, but we already have
>>>> > https://iceberg.apache.org/blogs/ with posts from different people.
>>>> > How do we choose the blog posts here ? I guess these blog posts have
>>>> > been submitted as PR and reviewed/merged. Maybe we can use the same
>>>> > for release highlights ?
>>>> > 2.3. The cleanup should be done as soon as a new release is uploaded
>>>> > to dist.apache.org (for instance, we still have Iceberg 0.14.1 on
>>>> > https://dist.apache.org/repos/dist/release/iceberg/). The tags
>>>> cleanup
>>>> > is up to us, but for dist, ASF INFRA asks for cleanup (we should have
>>>> > only the latest release on dist.apache.org) to limit the space use.
>>>> > 2.4. Cool, thanks ! I'm updating the PR with the DOAP.
>>>> >
>>>> > Thanks again ! Much appreciated :)
>>>> >
>>>> > Regards
>>>> > JB
>>>> >
>>>> > On Fri, Oct 6, 2023 at 8:34 PM Ryan Blue  wrote:
>>>> > >
>>>> > > The Iceberg community has already established a regular release
>>>> cadence, which is once per quarter. Here's the recent release history,
>>>>

Re: Migration of PyIceberg to iceberg-python repository

2023-10-09 Thread Jean-Baptiste Onofré
;>> Apache Infra to get it fixed.
>>>
>>> Op za 30 sep 2023 om 20:29 schreef Daniel Weeks :
>>>>
>>>> +1 to relocate with history.
>>>>
>>>> On Sat, Sep 30, 2023, 10:24 AM Brian Olsen  wrote:
>>>>>
>>>>> This shouldn’t be too hard and can likely be a nightly build that occurs 
>>>>> with each client repository.
>>>>>
>>>>> We’re already planning on doing the documentation using git submodule to 
>>>>> pull all the documentation under a single build in the central repo. We 
>>>>> can likely go the other direction to run client-core integration tests. I 
>>>>> prefer these go on the client end to avoid too much ci running on the 
>>>>> core repo. We have to also consider whatever we choose to do with Python 
>>>>> client we will also apply to go, Rust, and any future client. Happy to 
>>>>> hear alternatives though!
>>>>>
>>>>> WDYT Fokko?
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Sep 30, 2023 at 7:12 AM Hussein Awala  wrote:
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> I checked the discussion thread, and one of the motivations for this 
>>>>>> separation was to avoid triggering unrelated CI jobs after each change. 
>>>>>> However, I wonder if it isn't (and will not be) necessary to check the 
>>>>>> compatibility between the main repository and the client after each 
>>>>>> change. Otherwise, we will need to trigger the CI across the different 
>>>>>> repositories using the GHA API, not necessarily to block the PR, but 
>>>>>> just to give quick feedback and notification that something needs to be 
>>>>>> changed on the client side.
>>>>>>
>>>>>> On Fri, Sep 29, 2023 at 9:39 PM Brian Olsen  
>>>>>> wrote:
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> Great work Fokko!
>>>>>>>
>>>>>>> Pucheng,
>>>>>>>
>>>>>>> We still want to maintain all of the issues in the Python repository. 
>>>>>>> The one thing we will lose is pull requests, but I assume there are 
>>>>>>> very few.
>>>>>>>
>>>>>>> On Fri, Sep 29, 2023 at 10:34 AM Pucheng Yang 
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> Thanks for doing this. I wonder how do we deal with all the issues 
>>>>>>>> filed for python module but still open in iceberg repo?
>>>>>>>>
>>>>>>>> On Fri, Sep 29, 2023 at 7:55 AM Eduard Tudenhoefner 
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> +1 on moving to a separate repo and maintaining git history
>>>>>>>>>
>>>>>>>>> On Fri, Sep 29, 2023 at 3:30 PM Jean-Baptiste Onofré 
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> Awesome, it looks even better ;)
>>>>>>>>>>
>>>>>>>>>> Thanks !
>>>>>>>>>> Regards
>>>>>>>>>> JB
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 29, 2023 at 2:31 PM Fokko Driesprong  
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Hey Ajantha,
>>>>>>>>>> >
>>>>>>>>>> > That's a great suggestion. I've followed the steps and created a 
>>>>>>>>>> > new PR here: https://github.com/apache/iceberg-python/pull/3
>>>>>>>>>> >
>>>>>>>>>> > The subdirectory-filter command moves a subdirectory to the root 
>>>>>>>>>> > directory. This way I still had to add some files afterward 
>>>>>>>>>> > (.github/*, .gitignore, etc.), these are in a separate commit. 
>>>>>>>>>> > Please take a look.
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> >
>>>>>>>>>> > Fokko
>>>>>>>>>> >
>>>>>>>>>> > Op vr 29 sep 2023 om 13:39 schreef Ajantha Bhat 
>>>>>>>>>> > :
>>>>>>>>>> >>
>>>>>>>>>> >> I think we are gonna lose the history of commits if we merge the 
>>>>>>>>>> >> above PR.
>>>>>>>>>> >>
>>>>>>>>>> >> There are ways to move the subfolder into a new repo by retaining 
>>>>>>>>>> >> commit history.
>>>>>>>>>> >> For example:
>>>>>>>>>> >> - 
>>>>>>>>>> >> https://medium.com/@ayushya/move-directory-from-one-repository-to-another-preserving-git-history-d210fa049d4b
>>>>>>>>>> >> - https://gist.github.com/trongthanh/2779392
>>>>>>>>>> >>
>>>>>>>>>> >> Please give it a try.
>>>>>>>>>> >>
>>>>>>>>>> >> Thanks,
>>>>>>>>>> >> Ajantha
>>>>>>>>>> >>
>>>>>>>>>> >> On Fri, Sep 29, 2023 at 4:55 PM Fokko Driesprong 
>>>>>>>>>> >>  wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> Hey everyone 👋
>>>>>>>>>> >>>
>>>>>>>>>> >>> A while ago we discussed that Rust and Go are going into a 
>>>>>>>>>> >>> separate repository: 
>>>>>>>>>> >>> https://lists.apache.org/thread/4s02lmwf1kyrxxdpj3q9w2fqnxq2llbn
>>>>>>>>>> >>>
>>>>>>>>>> >>> Since we just did the PyIcerg 0.5.0 release, I think it is a 
>>>>>>>>>> >>> good moment to migrate PyIceberg to iceberg-python as well: 
>>>>>>>>>> >>> https://github.com/apache/iceberg-python/pull/2 I went over the 
>>>>>>>>>> >>> PRs that are ready to merge and got them in. If there is 
>>>>>>>>>> >>> anything missing, please let me know.
>>>>>>>>>> >>>
>>>>>>>>>> >>> I would suggest merging the PR and leaving the source code in 
>>>>>>>>>> >>> the main repository for another week or so to make sure that we 
>>>>>>>>>> >>> didn't miss anything.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Since PyIceberg now also hosts the docs on the Github pages of 
>>>>>>>>>> >>> the Iceberg repository, moving PyIceberg will also free up the 
>>>>>>>>>> >>> Github pages for the migration of the docs back into the main 
>>>>>>>>>> >>> repository.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Let me know if there are any concerns.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Kind regards,
>>>>>>>>>> >>> Fokko Driesprong


Re: [DISCUSSION] Rename master branch as main for the main repository

2023-10-12 Thread Jean-Baptiste Onofré
Hi guys

I'm pleased to announce that the master branch has been renamed to main.

https://github.com/apache/iceberg

You can see that all PRs are now based on main (it's completely transparent).

We can now merge the corresponding PR
(https://github.com/apache/iceberg/pull/8722). I will work with the
team to merge it asap.

Thanks !
Regards
JB

On Fri, Sep 29, 2023 at 2:35 PM Jean-Baptiste Onofré  wrote:
>
> Hi guys,
>
> The Apache CoC (https://www.apache.org/foundation/policies/conduct)
> especially contains section 5 about the wording we use. Several Apache
> projects renamed the master branch to the main branch (Apache Karaf,
> ActiveMQ, Airflow, ...).
> As we already use main for go, rust, and python repositories, I wonder
> (for consistency) if we should not rename master to main on the "main"
> repository.
>
> Apache INFRA can do this "smoothly" but we would have to do some changes:
> - update build.gradle
> - update README.md
> - update to GH Actions (in .github/workflows/*)
>
> Thoughts ?
>
> Regards
> JB


Re: [DISCUSSION] Rename master branch as main for the main repository

2023-10-12 Thread Jean-Baptiste Onofré
By the way, don't forget to update your local git repo:

git fetch --all
git checkout main
git branch -D master

And you are good to go :)

Regards
JB

On Thu, Oct 12, 2023 at 3:37 PM Jean-Baptiste Onofré  wrote:
>
> Hi guys
>
> I'm pleased to announce that the master branch has been renamed to main.
>
> https://github.com/apache/iceberg
>
> You can see that all PRs are now based on main (it's completely transparent).
>
> We can now merge the corresponding PR
> (https://github.com/apache/iceberg/pull/8722). I will work with the
> team to merge it asap.
>
> Thanks !
> Regards
> JB
>
> On Fri, Sep 29, 2023 at 2:35 PM Jean-Baptiste Onofré  
> wrote:
> >
> > Hi guys,
> >
> > The Apache CoC (https://www.apache.org/foundation/policies/conduct)
> > especially contains section 5 about the wording we use. Several Apache
> > projects renamed the master branch to the main branch (Apache Karaf,
> > ActiveMQ, Airflow, ...).
> > As we already use main for go, rust, and python repositories, I wonder
> > (for consistency) if we should not rename master to main on the "main"
> > repository.
> >
> > Apache INFRA can do this "smoothly" but we would have to do some changes:
> > - update build.gradle
> > - update README.md
> > - update to GH Actions (in .github/workflows/*)
> >
> > Thoughts ?
> >
> > Regards
> > JB


Re: [DISCUSSION] Rename master branch as main for the main repository

2023-10-12 Thread Jean-Baptiste Onofré
If you have any issue (with your PR, local repo, etc), please don't
hesitate to contact me.

Thanks !
Regards
JB

On Thu, Oct 12, 2023 at 3:41 PM Jean-Baptiste Onofré  wrote:
>
> By the way, don't forget to update your local git repo:
>
> git fetch --all
> git checkout main
> git branch -D master
>
> And you are good to go :)
>
> Regards
> JB
>
> On Thu, Oct 12, 2023 at 3:37 PM Jean-Baptiste Onofré  
> wrote:
> >
> > Hi guys
> >
> > I'm pleased to announce that the master branch has been renamed to main.
> >
> > https://github.com/apache/iceberg
> >
> > You can see that all PRs are now based on main (it's completely 
> > transparent).
> >
> > We can now merge the corresponding PR
> > (https://github.com/apache/iceberg/pull/8722). I will work with the
> > team to merge it asap.
> >
> > Thanks !
> > Regards
> > JB
> >
> > On Fri, Sep 29, 2023 at 2:35 PM Jean-Baptiste Onofré  
> > wrote:
> > >
> > > Hi guys,
> > >
> > > The Apache CoC (https://www.apache.org/foundation/policies/conduct)
> > > especially contains section 5 about the wording we use. Several Apache
> > > projects renamed the master branch to the main branch (Apache Karaf,
> > > ActiveMQ, Airflow, ...).
> > > As we already use main for go, rust, and python repositories, I wonder
> > > (for consistency) if we should not rename master to main on the "main"
> > > repository.
> > >
> > > Apache INFRA can do this "smoothly" but we would have to do some changes:
> > > - update build.gradle
> > > - update README.md
> > > - update to GH Actions (in .github/workflows/*)
> > >
> > > Thoughts ?
> > >
> > > Regards
> > > JB


Re: [DISCUSSION] Rename master branch as main for the main repository

2023-10-12 Thread Jean-Baptiste Onofré
Oh yes, of course. Thanks for the reminder Brian :)

Regards
JB

On Thu, Oct 12, 2023 at 3:43 PM Brian Olsen  wrote:
>
> But also, don’t forget to save any commits you have on local or your fork for 
> master! :)
>
> Thanks for coordinating this JB!!
>
> On Thu, Oct 12, 2023 at 8:42 AM Jean-Baptiste Onofré  
> wrote:
>>
>> By the way, don't forget to update your local git repo:
>>
>> git fetch --all
>> git checkout main
>> git branch -D master
>>
>> And you are good to go :)
>>
>> Regards
>> JB
>>
>> On Thu, Oct 12, 2023 at 3:37 PM Jean-Baptiste Onofré  
>> wrote:
>> >
>> > Hi guys
>> >
>> > I'm pleased to announce that the master branch has been renamed to main.
>> >
>> > https://github.com/apache/iceberg
>> >
>> > You can see that all PRs are now based on main (it's completely 
>> > transparent).
>> >
>> > We can now merge the corresponding PR
>> > (https://github.com/apache/iceberg/pull/8722). I will work with the
>> > team to merge it asap.
>> >
>> > Thanks !
>> > Regards
>> > JB
>> >
>> > On Fri, Sep 29, 2023 at 2:35 PM Jean-Baptiste Onofré  
>> > wrote:
>> > >
>> > > Hi guys,
>> > >
>> > > The Apache CoC (https://www.apache.org/foundation/policies/conduct)
>> > > especially contains section 5 about the wording we use. Several Apache
>> > > projects renamed the master branch to the main branch (Apache Karaf,
>> > > ActiveMQ, Airflow, ...).
>> > > As we already use main for go, rust, and python repositories, I wonder
>> > > (for consistency) if we should not rename master to main on the "main"
>> > > repository.
>> > >
>> > > Apache INFRA can do this "smoothly" but we would have to do some changes:
>> > > - update build.gradle
>> > > - update README.md
>> > > - update to GH Actions (in .github/workflows/*)
>> > >
>> > > Thoughts ?
>> > >
>> > > Regards
>> > > JB


Community Meeting Minutes ?

2023-10-12 Thread Jean-Baptiste Onofré
Hi guys,

Thanks for the community meeting yesterday, it was super interesting
and motivating :)

As we say at Apache: "If it didn't happen on the mailing list, it
never happened" :)
In order to give a chance to anyone in the community to see the topics
and participate, it would be great to share the meeting minutes on the
mailing list.

I know Brian did that in July. It would be great to do it "systematically".

@Brian do you mind sharing the meeting minutes on the mailing list ?
Do you need my help to complete/review ?
Maybe we can add it on the website too ?

Thanks !
Regards
JB


Re: [DISCUSSION] Should we have a place to host improvement proposals?

2023-10-13 Thread Jean-Baptiste Onofré
Hi Renjie,

I like the idea. I would propose to rename
https://iceberg.apache.org/roadmap/ as
https://iceberg.apache.org/technology/ and put the proposal here with
the current status and eventually linking PR.
I propose to rework the roadmap page as it doesn't seem up to date and
should contain the proposals.

+1 for me.

The same about the email I sent yesterday regarding Community Meeting
Notes, I think it would be great to have the notes in a space on
https://iceberg.apache.org/community/.

Regards
JB

On Fri, Oct 13, 2023 at 8:57 AM Renjie Liu  wrote:
>
> Hi:
>
> I notice that currently the iceberg community discusses new ideas and 
> improvements in different places, e.g. dev mail list, github issues. I feel 
> that there are some drawbacks with this approach:
>
> 1. Not friendly for new contributors. Though we already have docs, it's not 
> easy for new contributors to learn the background and design philosophy.
> 2. Not easy to track the major improvements/roadmaps to the project.
>
> Some projects already have a process for improvement proposals, for example, 
> Apache Flink (FLIP) , Apache Spark(SPIP) . I'm wondering if we should have a 
> central place to host improvement proposals.
>
> --
> Thanks!
> Renjie Liu


Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-13 Thread Jean-Baptiste Onofré
hat stored?
>> > > > >
>> > > > >
>> > > > > Sorry for the confusion, by file format I mean roaring bitmap's file
>> > > > > format <
>> > > https://github.com/RoaringBitmap/RoaringFormatSpec#general-layout>.
>> > > > > I checked that it has been implemented in several languages, such as
>> > > java,
>> > > > > go, rust, c. Metadata will be stored in manifest file as other 
>> > > > > entries
>> > > such
>> > > > > as datafile, deletion file. The starting position doesn't need to be
>> > > stored
>> > > > > since it's used by the file reader. I think your suggestion to 
>> > > > > provide
>> > > an
>> > > > > interface in design will make things clearer, and I will add it to 
>> > > > > the
>> > > > > design doc.
>> > > > >
>> > > > > 2. How would DML operations work? Just a sketch would be great. I 
>> > > > > don't
>> > > > >> think it is a good idea to leave the implications for DML fuzzy.
>> > > > >
>> > > > >
>> > > > > I'll add sketches for other DML operations.
>> > > > >
>> > > > > 3. The comparison appears to be between rewriting data files and 
>> > > > > using
>> > > > >> delete vectors. I think it needs to compare the existing delete file
>> > > > >> formats to delete vectors so that we know why there is a benefit to
>> > > doing
>> > > > >> this beyond using the current positional delete files. The doc 
>> > > > >> states
>> > > that
>> > > > >> there aren't measurements here, which I think we need. Otherwise,
>> > > should we
>> > > > >> just have a version of DML that produces one position delete per 
>> > > > >> data
>> > > file?
>> > > > >
>> > > > >
>> > > > > I think deletion vector files are quite similar to position delete
>> > > files,
>> > > > > e.g. you can think of a deletion vector file as one position delete 
>> > > > > per
>> > > > > data file. But this change brings new chances for optimization, and
>> > > there
>> > > > > is one section talking about it in the design doc. As with the
>> > > > > measurements, I'll try to design some experiments for it.
>> > > > >
>> > > > > 4. I think this is missing some justification for how you're changing
>> > > data
>> > > > >> file metadata.
>> > > > >
>> > > > >
>> > > > > I agree with your comment that if we associate one deletion vector
>> > > with a
>> > > > > data file, maybe it's better to extend the DataFile struct rather 
>> > > > > than
>> > > > > introducing new entries.
>> > > > >
>> > > > > I'll update the doc to address the comments.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Oct 9, 2023 at 1:44 AM Ryan Blue  wrote:
>> > > > >
>> > > > >> Thanks, Renjie. I went through and made some comments about what is
>> > > still
>> > > > >> not clear. Here's a summary:
>> > > > >>
>> > > > >> 1. What is the exact file format for these on disk that you're
>> > > proposing?
>> > > > >> Even if you're saying that it is what is produced by roaring bitmap,
>> > > we
>> > > > >> need more information. Is that a portable format? Do you wrap it at
>> > > all in
>> > > > >> the file to carry extra metadata? For example, the proposal says 
>> > > > >> that
>> > > a
>> > > > >> starting position for a bitmap would be used. Where is that stored?
>> > > > >> 2. How would DML operations work? Just a sketch would be great. I
>> > > don't
>> > > > >> think it is a good idea to leave the implications for DML fuzzy.
>> &g

[HEADS UP] Apache Iceberg DOAP file registered and update on the release process

2023-10-13 Thread Jean-Baptiste Onofré
Hi guys,

We added Apache Iceberg DOAP file on the repo:

https://github.com/apache/iceberg/blob/main/doap.rdf

I registered the DOAP file in the ASF comdev, and now we are all set:

https://projects.apache.org/project.html?iceberg

I will submit a PR for the website to update the release process by
updating the release section of the DOAP file.

Thanks all for your help,
Regards
JB


Re: [DISCUSS] Apache Iceberg 1.4.1

2023-10-16 Thread Jean-Baptiste Onofré
Hi Eduard

It sounds good to me. I have some dep upgrades I think it would be
good to include (I'm working on testing/benchmark about that). Thanks
to you, we already merged the most important. I'm doing a pass now.

In terms of pure bug fix, I don't have anything in mind.

Regards
JB

On Mon, Oct 16, 2023 at 12:49 PM Eduard Tudenhoefner  wrote:
>
> Hi everyone,
>
> I wanted to start a discussion to have a 1.4.1 patch release.
>
> We discovered a critical issue with split offsets in manifests that can 
> result in corrupted metadata when those invalid offsets are read and then 
> written back. Details with a proposed fix are described in #8834.
>
> I've created a 1.4.1 milestone in GH to track any additional fixes we'd like 
> to get out. One particular issue that quite a few people are running into is 
> #8677.
>
> Do people have any other bug fixes that should be included?
>
> Thanks
> Eduard


Re: [VOTE] Release Apache PyIceberg 0.5.1 (RC1)

2023-10-16 Thread Jean-Baptiste Onofré
+1 (non binding)

I checked:
- hash and signature are good
- source distribution is good
- run a quick test locally

Thanks,
Regards
JB

On Mon, Oct 16, 2023 at 11:28 PM Fokko Driesprong  wrote:
>
> Hi Everyone,
>
>
> I propose that we release the following RC as the official PyIceberg 0.5.1 
> release.
>
>
> This is a patch release due to a bug that has been found. Smaller bugs also 
> have been backported.
>
>
> The commit ID is ea9da8856a686eaeda0d5c2be78d5e3102b67c44
>
>
> * This corresponds to the tag: pyiceberg-0.5.1rc1 
> (320b0f499d14178210c3b9cb7d94dab1e1b149e6)
>
> * https://github.com/apache/iceberg-python/releases/tag/pyiceberg-0.5.1rc1
>
> * 
> https://github.com/apache/iceberg-python/tree/ea9da8856a686eaeda0d5c2be78d5e3102b67c44
>
>
> The release tarball, signature, and checksums are here:
>
>
> * https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.5.1rc1/
>
>
> You can find the KEYS file here:
>
>
> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
>
> Convenience binary artifacts are staged on pypi:
>
>
> https://pypi.org/project/pyiceberg/0.5.1rc1/
>
>
> And can be installed using: pip3 install pyiceberg==0.5.1rc1
>
>
> Please download, verify, and test.
>
>
> Please vote in the next 72 hours.
>
> [ ] +1 Release this as PyIceberg 0.5.1
>
> [ ] +0
>
> [ ] -1 Do not release this because...
>
>
> Kind regards,
>
> Fokko


Re: [DISCUSSION] Should we have a place to host improvement proposals?

2023-10-17 Thread Jean-Baptiste Onofré
Hi

OK, so +1 for me to have a "proposals space". The details can be
discussed later.

Regards
JB

On Tue, Oct 17, 2023 at 9:17 AM Renjie Liu  wrote:
>
> Hi, JB:
>
> Thanks for replying.
>
> I want to use this thread to discuss whether we need a central place for 
> improvement proposals first. We can talk about the exact format if the 
> community is interested in it later.
>
> On Fri, Oct 13, 2023 at 4:52 PM Jean-Baptiste Onofré  
> wrote:
>>
>> Hi Renjie,
>>
>> I like the idea. I would propose to rename
>> https://iceberg.apache.org/roadmap/ as
>> https://iceberg.apache.org/technology/ and put the proposal here with
>> the current status and eventually linking PR.
>> I propose to rework the roadmap page as it doesn't seem up to date and
>> should contain the proposals.
>>
>> +1 for me.
>>
>> The same about the email I sent yesterday regarding Community Meeting
>> Notes, I think it would be great to have the notes in a space on
>> https://iceberg.apache.org/community/.
>>
>> Regards
>> JB
>>
>> On Fri, Oct 13, 2023 at 8:57 AM Renjie Liu  wrote:
>> >
>> > Hi:
>> >
>> > I notice that currently the iceberg community discusses new ideas and 
>> > improvements in different places, e.g. dev mail list, github issues. I 
>> > feel that there are some drawbacks with this approach:
>> >
>> > 1. Not friendly for new contributors. Though we already have docs, it's 
>> > not easy for new contributors to learn the background and design 
>> > philosophy.
>> > 2. Not easy to track the major improvements/roadmaps to the project.
>> >
>> > Some projects already have a process for improvement proposals, for 
>> > example, Apache Flink (FLIP) , Apache Spark(SPIP) . I'm wondering if we 
>> > should have a central place to host improvement proposals.
>> >
>> > --
>> > Thanks!
>> > Renjie Liu


Re: [VOTE] Release Apache Iceberg 1.4.1 RC0

2023-10-18 Thread Jean-Baptiste Onofré
+1 (non binding)

I checked:
* hashes and signatures are OK
* I did quick tests using spark 3.5

I found the following issues that we should fix:
* the source distribution contains two binary files (used for
tests, empty-puffin-uncompressed.bin
and sample-metric-data-uncompressed.bin). Binary files should not be
included in the source distribution.
* some files don't contain ASF header

I will work to fix these issues, and also, I will propose to include rat to
test our distribution.

Regards
JB


On Wed, Oct 18, 2023 at 11:15 AM Eduard Tudenhoefner 
wrote:

> Hi Everyone,
>
> I propose that we release the following RC as the official Apache Iceberg
> 1.4.1 release.
>
> The commit ID is 445664fb8d82950215872cbfec91e37c5fa0920f
> * This corresponds to the tag: apache-iceberg-1.4.1-rc0
> * https://github.com/apache/iceberg/commits/apache-iceberg-1.4.1-rc0
> *
> https://github.com/apache/iceberg/tree/445664fb8d82950215872cbfec91e37c5fa0920f
>
> The release tarball, signature, and checksums are here:
> * https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.4.1-rc0
>
> You can find the KEYS file here:
> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
> Convenience binary artifacts are staged on Nexus. The Maven repository URL
> is:
> *
> https://repository.apache.org/content/repositories/orgapacheiceberg-1147/
>
> Please download, verify, and test.
>
> Please vote in the next 72 hours.
>
> [ ] +1 Release this as Apache Iceberg 1.4.1
> [ ] +0
> [ ] -1 Do not release this because...
>
> Only PMC members have binding votes, but other community members are
> encouraged to cast
> non-binding votes. This vote will pass if there are 3 binding +1 votes and
> more binding
> +1 votes than -1 votes.
>
>


Re: Kafka Connect sink

2023-10-18 Thread Jean-Baptiste Onofré
Hi Bryan,

Any update on this thread ? Can I help somehow ?

Thanks,
Regards
JB

On Mon, Oct 2, 2023 at 7:39 PM Bryan Keller  wrote:
>
> Hi all,
>
> We at Tabular would like to contribute our Kafka Connect Iceberg sink to the 
> Iceberg project. It would be great to give Iceberg users another option for 
> landing data from Kafka into Iceberg tables that is supported by the Iceberg 
> community. Kafka Connect is a part of systems from AWS, Confluent, Redpanda, 
> and so on, so it can make landing data from Kafka into Iceberg much easier 
> for those without a Flink or Spark infrastructure.
>
> There are a few Iceberg sink implementations out there for Kafka Connect, but 
> we feel this one covers most of the features users have requested, such as 
> exactly-once processing, schema evolution, and multi-table fanout. And having 
> the sink backed by the Iceberg community will help it to evolve and improve 
> over time.
>
> If this sounds like something everyone would like to see added to Iceberg, 
> I've opened a PR that includes some initial pieces of the sink. The thought 
> was to break up the submission into parts so each could be reviewed more 
> easily. Some design docs and notes can be found in the original repo here: 
> https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs
>
> We'd like to get feedback if others approve of moving forward with this or 
> not.
>
> Thanks,
> Bryan
>


Re: [VOTE] Release Apache Iceberg 1.4.1 RC0

2023-10-18 Thread Jean-Baptiste Onofré
Hi

You can see it’s what I mentioned in my vote email. However, as it’s like
this for a while, I voted +1 and I have PRs ready to be submitted
(including rat execution).

So do you think it’s blocking ?

Regards
JB

Le mer. 18 oct. 2023 à 16:27, Xuanwo  a écrit :

> -1 (non-binding)
>
> - checksum and signature is good
>
> - the following files not have license
>   - .baseline/idea/intellij-java-palantir-style.xml
>   - .baseline/checkstyle/checkstyle.xml
>   - gradle/libs.versions.toml
>   - .baseline/checkstyle/checkstyle-suppressions.xml
>   - .baseline/checkstyle/checkstyle-suppressions.xml
>
> - release contains binary files
>   -
> core/src/test/resources/org/apache/iceberg/puffin/v1/empty-puffin-uncompressed.bin
>   -
> core/src/test/resources/org/apache/iceberg/puffin/v1/sample-metric-data-compressed-zstd.bin
>   -
> core/src/test/resources/org/apache/iceberg/puffin/v1/sample-metric-data-uncompressed.bin
>
> On Wed, Oct 18, 2023, at 21:55, Eduard Tudenhoefner wrote:
>
> +1 (non-binding)
>
> * validated checksum and signature
> * checked license docs & ran RAT checks
> * ran build and tests with JDK8
> * ran into one test failure, which is reported in
> https://github.com/apache/iceberg/issues/8824, but this shouldn't block
> the release
> * tested with Trino in https://github.com/trinodb/trino/pull/19434
>
> On Wed, Oct 18, 2023 at 3:15 PM Jean-Baptiste Onofré 
> wrote:
>
> +1 (non binding)
>
> I checked:
> * hashes and signatures are OK
> * I did quick tests using spark 3.5
>
> I found the following issues that we should fix:
> * the source distribution contains two binary files (used for
> tests, empty-puffin-uncompressed.bin
> and sample-metric-data-uncompressed.bin). Binary files should not be
> included in the source distribution.
> * some files don't contain ASF header
>
> I will work to fix these issues, and also, I will propose to include rat
> to test our distribution.
>
> Regards
> JB
>
>
> On Wed, Oct 18, 2023 at 11:15 AM Eduard Tudenhoefner 
> wrote:
>
> Hi Everyone,
>
> I propose that we release the following RC as the official Apache Iceberg
> 1.4.1 release.
>
> The commit ID is 445664fb8d82950215872cbfec91e37c5fa0920f
> * This corresponds to the tag: apache-iceberg-1.4.1-rc0
> * https://github.com/apache/iceberg/commits/apache-iceberg-1.4.1-rc0
> *
> https://github.com/apache/iceberg/tree/445664fb8d82950215872cbfec91e37c5fa0920f
>
> The release tarball, signature, and checksums are here:
> * https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.4.1-rc0
>
> You can find the KEYS file here:
> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
> Convenience binary artifacts are staged on Nexus. The Maven repository URL
> is:
> *
> https://repository.apache.org/content/repositories/orgapacheiceberg-1147/
>
> Please download, verify, and test.
>
> Please vote in the next 72 hours.
>
> [ ] +1 Release this as Apache Iceberg 1.4.1
> [ ] +0
> [ ] -1 Do not release this because...
>
> Only PMC members have binding votes, but other community members are
> encouraged to cast
> non-binding votes. This vote will pass if there are 3 binding +1 votes and
> more binding
> +1 votes than -1 votes.
>
>
>
> Xuanwo
>
>


Re: [VOTE] Release Apache Iceberg 1.4.1 RC0

2023-10-19 Thread Jean-Baptiste Onofré
By the way, at Apache, it's not really possible to veto or block a release:
you need three binding votes, even if we have a fourth binding vote with
-1, the release can pass.
That said, from a community standpoint, it's good to take any -1 (binding
or non binding) into account.

In your case, I would have voted -0 (to avoid confusion).

You can see that I voted +1 because:
- the release is the same as the previous ones
- the issues have been identified and so we can fix it

Regards
JB

On Thu, Oct 19, 2023 at 10:15 AM Xuanwo  wrote:

> You can see it’s what I mentioned in my vote email. However, as it’s like
> this for a while, I voted +1 and I have PRs ready to be submitted
> (including rat execution).
>
> So do you think it’s blocking ?
>
>
> Thanks for the clarification.
>
> I'm voting -1 due to the reasons mentioned, but it doesn't block this
> release (especially since it's non-binding). This release can proceed once
> it garners enough +1 votes. My -1 vote is simply to highlight areas we
> could improve in future releases.
>
>
> On Thu, Oct 19, 2023, at 13:11, Jean-Baptiste Onofré wrote:
>
> Hi
>
> You can see it’s what I mentioned in my vote email. However, as it’s like
> this for a while, I voted +1 and I have PRs ready to be submitted
> (including rat execution).
>
> So do you think it’s blocking ?
>
> Regards
> JB
>
> Le mer. 18 oct. 2023 à 16:27, Xuanwo  a écrit :
>
>
> -1 (non-binding)
>
> - checksum and signature is good
>
> - the following files not have license
>   - .baseline/idea/intellij-java-palantir-style.xml
>   - .baseline/checkstyle/checkstyle.xml
>   - gradle/libs.versions.toml
>   - .baseline/checkstyle/checkstyle-suppressions.xml
>   - .baseline/checkstyle/checkstyle-suppressions.xml
>
> - release contains binary files
>   -
> core/src/test/resources/org/apache/iceberg/puffin/v1/empty-puffin-uncompressed.bin
>   -
> core/src/test/resources/org/apache/iceberg/puffin/v1/sample-metric-data-compressed-zstd.bin
>   -
> core/src/test/resources/org/apache/iceberg/puffin/v1/sample-metric-data-uncompressed.bin
>
> On Wed, Oct 18, 2023, at 21:55, Eduard Tudenhoefner wrote:
>
> +1 (non-binding)
>
> * validated checksum and signature
> * checked license docs & ran RAT checks
> * ran build and tests with JDK8
> * ran into one test failure, which is reported in
> https://github.com/apache/iceberg/issues/8824, but this shouldn't block
> the release
> * tested with Trino in https://github.com/trinodb/trino/pull/19434
>
> On Wed, Oct 18, 2023 at 3:15 PM Jean-Baptiste Onofré 
> wrote:
>
> +1 (non binding)
>
> I checked:
> * hashes and signatures are OK
> * I did quick tests using spark 3.5
>
> I found the following issues that we should fix:
> * the source distribution contains two binary files (used for
> tests, empty-puffin-uncompressed.bin
> and sample-metric-data-uncompressed.bin). Binary files should not be
> included in the source distribution.
> * some files don't contain ASF header
>
> I will work to fix these issues, and also, I will propose to include rat
> to test our distribution.
>
> Regards
> JB
>
>
> On Wed, Oct 18, 2023 at 11:15 AM Eduard Tudenhoefner 
> wrote:
>
> Hi Everyone,
>
> I propose that we release the following RC as the official Apache Iceberg
> 1.4.1 release.
>
> The commit ID is 445664fb8d82950215872cbfec91e37c5fa0920f
> * This corresponds to the tag: apache-iceberg-1.4.1-rc0
> * https://github.com/apache/iceberg/commits/apache-iceberg-1.4.1-rc0
> *
> https://github.com/apache/iceberg/tree/445664fb8d82950215872cbfec91e37c5fa0920f
>
> The release tarball, signature, and checksums are here:
> * https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.4.1-rc0
>
> You can find the KEYS file here:
> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
> Convenience binary artifacts are staged on Nexus. The Maven repository URL
> is:
> *
> https://repository.apache.org/content/repositories/orgapacheiceberg-1147/
>
> Please download, verify, and test.
>
> Please vote in the next 72 hours.
>
> [ ] +1 Release this as Apache Iceberg 1.4.1
> [ ] +0
> [ ] -1 Do not release this because...
>
> Only PMC members have binding votes, but other community members are
> encouraged to cast
> non-binding votes. This vote will pass if there are 3 binding +1 votes and
> more binding
> +1 votes than -1 votes.
>
>
>
> Xuanwo
>
>
> Xuanwo
>


Re: Kafka Connect sink

2023-10-19 Thread Jean-Baptiste Onofré
Awesome ! Thanks for the update Bryan.

I'm looking forward to the PR !

Regards
JB

On Thu, Oct 19, 2023 at 11:13 AM Bryan Keller  wrote:
>
> Hi JB,
>
> The plan is to move forward, unless there are concerns from anyone. I got a 
> little bit sidetracked but will be working on addressing comments in the 
> first PR then following up with more PRs after that one is merged, so stay 
> tuned for those.
>
> Thanks,
> Bryan
>
>
>
> > On Oct 18, 2023, at 7:13 AM, Jean-Baptiste Onofré  wrote:
> >
> > Hi Bryan,
> >
> > Any update on this thread ? Can I help somehow ?
> >
> > Thanks,
> > Regards
> > JB
> >
> >> On Mon, Oct 2, 2023 at 7:39 PM Bryan Keller  wrote:
> >>
> >> Hi all,
> >>
> >> We at Tabular would like to contribute our Kafka Connect Iceberg sink to 
> >> the Iceberg project. It would be great to give Iceberg users another 
> >> option for landing data from Kafka into Iceberg tables that is supported 
> >> by the Iceberg community. Kafka Connect is a part of systems from AWS, 
> >> Confluent, Redpanda, and so on, so it can make landing data from Kafka 
> >> into Iceberg much easier for those without a Flink or Spark infrastructure.
> >>
> >> There are a few Iceberg sink implementations out there for Kafka Connect, 
> >> but we feel this one covers most of the features users have requested, 
> >> such as exactly-once processing, schema evolution, and multi-table fanout. 
> >> And having the sink backed by the Iceberg community will help it to evolve 
> >> and improve over time.
> >>
> >> If this sounds like something everyone would like to see added to Iceberg, 
> >> I've opened a PR that includes some initial pieces of the sink. The 
> >> thought was to break up the submission into parts so each could be 
> >> reviewed more easily. Some design docs and notes can be found in the 
> >> original repo here: 
> >> https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs
> >>
> >> We'd like to get feedback if others approve of moving forward with this or 
> >> not.
> >>
> >> Thanks,
> >> Bryan
> >>


Re: Feedback on Iceberg Materialized View Spec

2023-10-24 Thread Jean-Baptiste Onofré
Hi Jan

Thanks for the reminder. I will take a look.

As proposed by Renjie a few days ago, it would be great to
gather/store all document proposals in a central place.

If there are no objections, I will prepare a PR for the website about
that (with a space listing/linking all proposals).

Regards
JB



On Tue, Oct 24, 2023 at 9:22 AM Jan Kaul  wrote:
>
> Hi all,
>
> I've created an issue to propose a design for a Materialized View Spec a 
> while ago. After further discussion we reached a first draft for the spec. It 
> would be great if you could have another look at the design and share your 
> feedback.
>
> Here is the google doc: 
> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?usp=sharing
>
> Thanks in advance,
>
> Jan


Re: Feedback on Iceberg Materialized View Spec

2023-10-26 Thread Jean-Baptiste Onofré
Hi Brian

I like the idea of GitHub. Why not enabling (in .asf.yml) GitHub
discussions ? A GitHub Discussion could be a good place to share the
doc and exchange both in the doc and in the discussion comments.

Regards
JB

On Thu, Oct 26, 2023 at 1:13 PM Brian Olsen  wrote:
>
> Hey JB,
>
> I totally agree we need a place to centralize this but I'm nit a huge fan of 
> all the lists we currently have going on the site. SSGs are just not an 
> accessible method of storing lists. ( roadmap, blogs, videos, etc..).
>
> The roadmap is barely touched for this reason. I want to propose we move 
> roadmap to GitHub projects.
>
> Likewise, I feel like somewhere on GitHub might be a better location for this 
> type of thing.
>
> Maybe posting these in GitHub issues and adding a proposal label?
>
> On Tue, Oct 24, 2023 at 9:28 AM Jean-Baptiste Onofré  
> wrote:
>>
>> Hi Jan
>>
>> Thanks for the reminder. I will take a look.
>>
>> As proposed by Renjie a few days ago, it would be great to
>> gather/store all document proposals in a central place.
>>
>> If there are no objections, I will prepare a PR for the website about
>> that (with a space listing/linking all proposals).
>>
>> Regards
>> JB
>>
>>
>>
>> On Tue, Oct 24, 2023 at 9:22 AM Jan Kaul  wrote:
>> >
>> > Hi all,
>> >
>> > I've created an issue to propose a design for a Materialized View Spec a 
>> > while ago. After further discussion we reached a first draft for the spec. 
>> > It would be great if you could have another look at the design and share 
>> > your feedback.
>> >
>> > Here is the google doc: 
>> > https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?usp=sharing
>> >
>> > Thanks in advance,
>> >
>> > Jan


Re: Feedback on Iceberg Materialized View Spec

2023-10-26 Thread Jean-Baptiste Onofré
Just to be clear: we can GH Discussions subjects template via
.asf.yaml but we have to open a ticket to INFRA to enable it.

Regards
JB

On Thu, Oct 26, 2023 at 1:56 PM Jean-Baptiste Onofré  wrote:
>
> Hi Brian
>
> I like the idea of GitHub. Why not enabling (in .asf.yml) GitHub
> discussions ? A GitHub Discussion could be a good place to share the
> doc and exchange both in the doc and in the discussion comments.
>
> Regards
> JB
>
> On Thu, Oct 26, 2023 at 1:13 PM Brian Olsen  wrote:
> >
> > Hey JB,
> >
> > I totally agree we need a place to centralize this but I'm nit a huge fan 
> > of all the lists we currently have going on the site. SSGs are just not an 
> > accessible method of storing lists. ( roadmap, blogs, videos, etc..).
> >
> > The roadmap is barely touched for this reason. I want to propose we move 
> > roadmap to GitHub projects.
> >
> > Likewise, I feel like somewhere on GitHub might be a better location for 
> > this type of thing.
> >
> > Maybe posting these in GitHub issues and adding a proposal label?
> >
> > On Tue, Oct 24, 2023 at 9:28 AM Jean-Baptiste Onofré  
> > wrote:
> >>
> >> Hi Jan
> >>
> >> Thanks for the reminder. I will take a look.
> >>
> >> As proposed by Renjie a few days ago, it would be great to
> >> gather/store all document proposals in a central place.
> >>
> >> If there are no objections, I will prepare a PR for the website about
> >> that (with a space listing/linking all proposals).
> >>
> >> Regards
> >> JB
> >>
> >>
> >>
> >> On Tue, Oct 24, 2023 at 9:22 AM Jan Kaul  
> >> wrote:
> >> >
> >> > Hi all,
> >> >
> >> > I've created an issue to propose a design for a Materialized View Spec a 
> >> > while ago. After further discussion we reached a first draft for the 
> >> > spec. It would be great if you could have another look at the design and 
> >> > share your feedback.
> >> >
> >> > Here is the google doc: 
> >> > https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?usp=sharing
> >> >
> >> > Thanks in advance,
> >> >
> >> > Jan


[PROPOSAL] Improve dev/check-license

2023-10-26 Thread Jean-Baptiste Onofré
Hi guys,

During the 1.4.1 vote, we identified some files without ASF headers,
more specifically in hidden directories (like .baseline). These files
have not been detected by dev/check-license script.

The reason is because the script uses apache-rat via java -jar (the
Apache RAT CLI), executing the RAT Report class ()

When providing a directory to scan, the Report class uses a
DirectoryWalker to traverse the directories, looking for files to
check.
Unfortunately, by default, the DirectoryWalker ignores the hidden
directory (all directories starting with .):

https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/DirectoryWalker.java#L71

https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/Walker.java#L53

In our case, it means that it ignores .baseline, .github, .git,
.palantir directories. This is not good as .baseline, .github and
.palantir directories are included in our source distribution, so the
ASF headers should be clean here.

FYI, I will propose a change in rat to, at least, be able to use a
ReportConfiguration to define if we want to restrict directories or
not (be able to configure the Walkers basically). I will try to
include it in rat 0.17 (I discussed with Claude about that).
NB: rat gradle and maven plugins define their own Walkers/IReportable
to avoid this issue.

I propose three options to improve this:
1. We keep dev/check-license as it is today (using rat 0.15) and we
exclude .baseline, .github, .palantir directories from our source
distribution. It means that we will probably have issues while
building from source distribution. We can revisit dev/check-license
when we upgrade to rat 0.17

2. We keep dev/check-license but we create our own rat scanner with a
custom IReportable class considering all files/directories. Something
like:

public static void main(String[] args) throws Exception {
ReportConfiguration reportConfiguration = new ReportConfiguration();
reportConfiguration.setHeaderMatcher(Defaults.createDefaultMatcher());

XmlWriter writer = new XmlWriter(new FileWriter("report.xml"));
ClaimStatistic claimStatistic = new ClaimStatistic();
RatReport ratReport = XmlReportFactory.createStandardReport(
writer,
claimStatistic,
reportConfiguration
);
ratReport.startReport();
IncludeHiddenDirectoryWalker walker = new
IncludeHiddenDirectoryWalker(new File("/path/to/iceberg"));
walker.run(ratReport);
ratReport.endReport();
writer.closeDocument();
}

public class IncludeHiddenDirectoryWalker implements IReportable {


private File file;

public IncludeHiddenDirectoryWalker(File file) {
this.file = file;
}

@Override
public void run(RatReport report) throws RatException {
process(report, file);
}

public void process(RatReport report, File file) {
final File[] files = file.listFiles();
if (files != null) {
for (File current : files) {
if (current.isDirectory()) {
process(report, current);
} else {
try {
Document document = new FileDocument(current);
report.report(document);
} catch (RatException e) {
System.err.println("Can't report file " +
current.getAbsolutePath() + ": " + e);
}
}
}
}
}

}

So, I can contribute this in dev/src for example.

3. Instead of using dev/check-license, we can use the rat gradle
plugin (https://github.com/eskatos/creadur-rat-gradle/). I tested it
and it works fine as it uses a custom IReportable like:

https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L135

https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L189

We can include this plugin the check gradle phase, meaning that we can
verify headers for each PR.

My preference would be for 3, mainly because:
1. it integrates smoothly in our gradle ecosystem, adding a new plugin
as we have gradle-baseline-java, gradle-errorprone-plugin,
spotless-plugin-gradle, etc
2. As we can hook rat gradle plugin in the gradle check task, it means
license check will be perform at build time, including check on PR by
GitHub Actions

If you are OK with 3, I will work on:
1. a PR to use it
2. a PR for website to update release check procedure

Thoughts ?

Regards
JB


Re: Feedback on Iceberg Materialized View Spec

2023-10-26 Thread Jean-Baptiste Onofré
The idea is really to "square" GH Discussion only to roadmap/design proposals.

For "user support", more than Slack, I would love to see
u...@iceberg.apache.org.

So I would distinguish:
- the design/spec proposals where we could use GH Discussions. If
people use GH Discussion for support questions, then we can move to GH
Issue or direct to the mailing list/slack.
- the user "support" should be on user mailing list and/or Slack

You have a valid point: GH Discussions could be hard to manage because
most users will use it as a "support forum".

My point is really:
- we need a central space for design/spec proposals
- it has to be on Iceberg community and visible for all

Regards
JB

On Thu, Oct 26, 2023 at 5:30 PM Brian Olsen  wrote:
>
> GitHub Discussions could be a solution that we should consider. We used it on 
> the Trino side but still have mixed results with it. On one hand, there's a 
> lot of overlap between creating Issues and Discussions. In fact, GitHub 
> allows you to migrate Issues that only involve discussing a topic, or 
> something that can't immediately be tied to any upcoming work to be a 
> discussion. This keeps the Issue backlog focused on actionable requests.
>
> That said, Discussions can become difficult to maintain if no person or body 
> of people drives it. Of course, the community will drive it to some degree, 
> especially when it's new and shiny, but GitHub Discussions, much like Slack, 
> becomes a support channel that encourages the messy human interactions that 
> help us arrive at a solution. So the question is do we want to open 
> Discussions knowing that it may become a second support channel compared to 
> Slack? Would we want to use Discussions in place of Slack so that there's 
> still a single triage channel?
>
> I personally lean towards keeping a single real-time "support-like" channel 
> in the community, otherwise, you will fragment the attention of the 
> community. Most of what we would need to support the centralization of 
> proposals can be accomplished with Issues. Slack still seems to be the 
> dominant interactive system of choice and where we are now so I wouldn't 
> suggest moving that. I do think this is worth a discussion at the next sync 
> so I'll add it.
>
> In full transparency, Tabular is building an Iceberg-focused Discourse forum 
> (not to be confused with Discord) instance to solve the problem of 
> centralizing discussions in the community to wiki-style answers we can link 
> to and having dedicated content curators to those solutions. Think of it as 
> an Iceberg-specific Stack Overflow with lightened rules to allow more open 
> discussion. Adding GitHub discussions wouldn't collide with our goals as it 
> would become another signal that we could use to inform the answers on our 
> forum. It still comes back to the value given the cost for the community to 
> manage it.
>
> I know I have a lot of thoughts around this and its because I've been down 
> this road before, but perhaps there's a nuance I'm not seeing yet.
>
> On Thu, Oct 26, 2023 at 7:15 AM Jean-Baptiste Onofré  
> wrote:
>>
>> Just to be clear: we can GH Discussions subjects template via
>> .asf.yaml but we have to open a ticket to INFRA to enable it.
>>
>> Regards
>> JB
>>
>> On Thu, Oct 26, 2023 at 1:56 PM Jean-Baptiste Onofré  
>> wrote:
>> >
>> > Hi Brian
>> >
>> > I like the idea of GitHub. Why not enabling (in .asf.yml) GitHub
>> > discussions ? A GitHub Discussion could be a good place to share the
>> > doc and exchange both in the doc and in the discussion comments.
>> >
>> > Regards
>> > JB
>> >
>> > On Thu, Oct 26, 2023 at 1:13 PM Brian Olsen  
>> > wrote:
>> > >
>> > > Hey JB,
>> > >
>> > > I totally agree we need a place to centralize this but I'm nit a huge 
>> > > fan of all the lists we currently have going on the site. SSGs are just 
>> > > not an accessible method of storing lists. ( roadmap, blogs, videos, 
>> > > etc..).
>> > >
>> > > The roadmap is barely touched for this reason. I want to propose we move 
>> > > roadmap to GitHub projects.
>> > >
>> > > Likewise, I feel like somewhere on GitHub might be a better location for 
>> > > this type of thing.
>> > >
>> > > Maybe posting these in GitHub issues and adding a proposal label?
>> > >
>> > > On Tue, Oct 24, 2023 at 9:28 AM Jean-Baptiste Onofré  
>> > > wrote:
>> > >>
>> > >> 

Re: Feedback on Iceberg Materialized View Spec

2023-10-26 Thread Jean-Baptiste Onofré
Oh, I don't say we have to provide a user mailing list. Personally, I
like mailing list mainly because we have https://lists.apache.org/
where we can browse and search on the mailing lists.
A lot of Apache projects are using Slack or Zulip, but in parallel of
mailing lists. As we say at Apache: "if it doesn't happen on the
mailing list, it never happens".
That said I would distinguish:
- for dev, obviously we can use Slack for discussion, community
meetings, etc, but we have to send main topics/discussions on the dev
mailing list.
- for user, I think Slack is good, but I like the user mailing list,
to track/search/async communication as well.

That's another discussion anyway, let's focus on the design proposals
space: my understanding is that we want to have a space listing all
proposals, for review, tagged as "done" or "in progress". Right ?
I don't think a forum/stack overflow like would help here (it helps
for users, not for dev/technical/design proposals).

At Apache Beam, we have a similar page as at Iceberg:
https://beam.apache.org/roadmap/ where you can click on roadmap items
for details (https://beam.apache.org/roadmap/portability/).
So, initially, I proposed to update
https://iceberg.apache.org/roadmap/ with proposals (status
"discussion").  As most of the proposals (all ?) come as Google Link,
we can change a bit the look'n feel of this page including the list of
proposals.

That could be a first move, we can update later.

Regards
JB

On Thu, Oct 26, 2023 at 5:54 PM Brian Olsen  wrote:
>
> Yeah, unfortunately there's no way to limit the functionality to only 
> facilitate this. In fact, the product that gets closest to it is GitHub 
> Issues.
>
> I believe putting the onus on developers deeply involved in the project makes 
> sense. Expecting users, especially newer users of a newer generation will use 
> an email list is unlikely, especially if they're in a discovery mode and 
> figuring out how to solve an issue. A lot of garnering adoption from users is 
> lowering every barrier to entry as well as lowering time to that first hello 
> world dopamine hit.
>
> I'm middle millennial and even I find using email for discussion outside of 
> my mental model/preference but I also see the benefits.
>
> On Thu, Oct 26, 2023 at 10:45 AM Jean-Baptiste Onofré  
> wrote:
>>
>> The idea is really to "square" GH Discussion only to roadmap/design 
>> proposals.
>>
>> For "user support", more than Slack, I would love to see
>> u...@iceberg.apache.org.
>>
>> So I would distinguish:
>> - the design/spec proposals where we could use GH Discussions. If
>> people use GH Discussion for support questions, then we can move to GH
>> Issue or direct to the mailing list/slack.
>> - the user "support" should be on user mailing list and/or Slack
>>
>> You have a valid point: GH Discussions could be hard to manage because
>> most users will use it as a "support forum".
>>
>> My point is really:
>> - we need a central space for design/spec proposals
>> - it has to be on Iceberg community and visible for all
>>
>> Regards
>> JB
>>
>> On Thu, Oct 26, 2023 at 5:30 PM Brian Olsen  wrote:
>> >
>> > GitHub Discussions could be a solution that we should consider. We used it 
>> > on the Trino side but still have mixed results with it. On one hand, 
>> > there's a lot of overlap between creating Issues and Discussions. In fact, 
>> > GitHub allows you to migrate Issues that only involve discussing a topic, 
>> > or something that can't immediately be tied to any upcoming work to be a 
>> > discussion. This keeps the Issue backlog focused on actionable requests.
>> >
>> > That said, Discussions can become difficult to maintain if no person or 
>> > body of people drives it. Of course, the community will drive it to some 
>> > degree, especially when it's new and shiny, but GitHub Discussions, much 
>> > like Slack, becomes a support channel that encourages the messy human 
>> > interactions that help us arrive at a solution. So the question is do we 
>> > want to open Discussions knowing that it may become a second support 
>> > channel compared to Slack? Would we want to use Discussions in place of 
>> > Slack so that there's still a single triage channel?
>> >
>> > I personally lean towards keeping a single real-time "support-like" 
>> > channel in the community, otherwise, you will fragment the attention of 
>> > the community. Most of what we would need to support the centralization of 
>> > propos

Re: Feedback on Iceberg Materialized View Spec

2023-10-26 Thread Jean-Baptiste Onofré
Daniel is right, we deviated :)

OK Brian, let's do that.

Apologies.

Regards
JB

On Thu, Oct 26, 2023 at 8:40 PM Brian Olsen  wrote:
>
> Agreed, apologies to Jan :). JB, let's discuss this at the sync this Wed, and 
> after that we can create a new thread if needed.
>
> On Thu, Oct 26, 2023 at 1:38 PM Daniel Weeks  wrote:
>>
>> JB and Brian,
>>
>> I think we should probably move this discussion to a discuss thread 
>> specifically for the topics you want to address.
>>
>> We've had a few instances now where the original intent of the thread is 
>> redirected to talk about other subjects.  I don't feel this is a good 
>> approach because, while it is on the apache mailing list, the topic of the 
>> thread doesn't reflect the content, so you don't get the right 
>> audience/level of engagement or buy-in.
>>
>> I'm not disagreeing with trying to improve how we communicate and track 
>> improvements/proposals/etc, but I think we should try to keep the thread on 
>> topic.
>>
>> Thanks,
>> -Dan
>>
>> On Thu, Oct 26, 2023 at 9:26 AM Jean-Baptiste Onofré  
>> wrote:
>>>
>>> Oh, I don't say we have to provide a user mailing list. Personally, I
>>> like mailing list mainly because we have https://lists.apache.org/
>>> where we can browse and search on the mailing lists.
>>> A lot of Apache projects are using Slack or Zulip, but in parallel of
>>> mailing lists. As we say at Apache: "if it doesn't happen on the
>>> mailing list, it never happens".
>>> That said I would distinguish:
>>> - for dev, obviously we can use Slack for discussion, community
>>> meetings, etc, but we have to send main topics/discussions on the dev
>>> mailing list.
>>> - for user, I think Slack is good, but I like the user mailing list,
>>> to track/search/async communication as well.
>>>
>>> That's another discussion anyway, let's focus on the design proposals
>>> space: my understanding is that we want to have a space listing all
>>> proposals, for review, tagged as "done" or "in progress". Right ?
>>> I don't think a forum/stack overflow like would help here (it helps
>>> for users, not for dev/technical/design proposals).
>>>
>>> At Apache Beam, we have a similar page as at Iceberg:
>>> https://beam.apache.org/roadmap/ where you can click on roadmap items
>>> for details (https://beam.apache.org/roadmap/portability/).
>>> So, initially, I proposed to update
>>> https://iceberg.apache.org/roadmap/ with proposals (status
>>> "discussion").  As most of the proposals (all ?) come as Google Link,
>>> we can change a bit the look'n feel of this page including the list of
>>> proposals.
>>>
>>> That could be a first move, we can update later.
>>>
>>> Regards
>>> JB
>>>
>>> On Thu, Oct 26, 2023 at 5:54 PM Brian Olsen  wrote:
>>> >
>>> > Yeah, unfortunately there's no way to limit the functionality to only 
>>> > facilitate this. In fact, the product that gets closest to it is GitHub 
>>> > Issues.
>>> >
>>> > I believe putting the onus on developers deeply involved in the project 
>>> > makes sense. Expecting users, especially newer users of a newer 
>>> > generation will use an email list is unlikely, especially if they're in a 
>>> > discovery mode and figuring out how to solve an issue. A lot of garnering 
>>> > adoption from users is lowering every barrier to entry as well as 
>>> > lowering time to that first hello world dopamine hit.
>>> >
>>> > I'm middle millennial and even I find using email for discussion outside 
>>> > of my mental model/preference but I also see the benefits.
>>> >
>>> > On Thu, Oct 26, 2023 at 10:45 AM Jean-Baptiste Onofré  
>>> > wrote:
>>> >>
>>> >> The idea is really to "square" GH Discussion only to roadmap/design 
>>> >> proposals.
>>> >>
>>> >> For "user support", more than Slack, I would love to see
>>> >> u...@iceberg.apache.org.
>>> >>
>>> >> So I would distinguish:
>>> >> - the design/spec proposals where we could use GH Discussions. If
>>> >> people use GH Discussion for support questions, then we can move to GH
>>> >> Issue or direct to the mailing li

Re: Community Meeting Minutes ?

2023-10-27 Thread Jean-Baptiste Onofré
Thanks Brian, much appreciated!

Regards
JB

On Thu, Oct 26, 2023 at 10:29 PM Brian Olsen  wrote:
>
> Thanks for the reminder here JB. I just created a list to follow for this 
> process so I don't forget. At some point, I'll add it to the documentation so 
> that anyone can run this over time. I will share out the last few meeting 
> minutes in their own threads now.
>
> On Thu, Oct 12, 2023 at 9:03 AM Jean-Baptiste Onofré  
> wrote:
>>
>> Hi guys,
>>
>> Thanks for the community meeting yesterday, it was super interesting
>> and motivating :)
>>
>> As we say at Apache: "If it didn't happen on the mailing list, it
>> never happened" :)
>> In order to give a chance to anyone in the community to see the topics
>> and participate, it would be great to share the meeting minutes on the
>> mailing list.
>>
>> I know Brian did that in July. It would be great to do it "systematically".
>>
>> @Brian do you mind sharing the meeting minutes on the mailing list ?
>> Do you need my help to complete/review ?
>> Maybe we can add it on the website too ?
>>
>> Thanks !
>> Regards
>> JB


Re: [VOTE] Release Apache PyIceberg 0.5.1 RC2

2023-10-27 Thread Jean-Baptiste Onofré
+1 (non binding)

I checked:
- hash and signatures are good
- I will check NOTICE (copyright is 2022 and I think some deps are
missing there), not release blocker
- ASF headers are present
- no binary file detected
- very quick test

Regards
JB

On Tue, Oct 24, 2023 at 8:48 PM Fokko Driesprong  wrote:
>
> Hi Everyone,
>
> I propose that we release the following RC as the official PyIceberg 0.5.1 
> release.
>
> This is a patch release due to bugs:
>
> - Part of the expression is ignored when multiple and/or expressions are 
> specified
> - Update like statements to reflect sql behaviors
>
> That has been found. Smaller bugs also have been backported.
>
> The commit ID is 891b4c7f4214fb9118080ce2215a210a770a5019
>
> * This corresponds to the tag: pyiceberg-0.5.1rc2 
> (c5085159079fe100b7fbd38b5037d1408525dc46)
> * https://github.com/apache/iceberg-python/releases/tag/pyiceberg-0.5.1rc2
> * 
> https://github.com/apache/iceberg-python/tree/891b4c7f4214fb9118080ce2215a210a770a5019
>
> The release tarball, signature, and checksums are here:
>
> * https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.5.1rc2/
>
> You can find the KEYS file here:
>
> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
> Convenience binary artifacts are staged on pypi:
>
> https://pypi.org/project/pyiceberg/0.5.1rc2/
>
> And can be installed using: pip3 install pyiceberg==0.5.1rc2
>
> Please download, verify, and test.
>
> Please vote in the next 72 hours.
> [ ] +1 Release this as PyIceberg 0.5.1
> [ ] +0
> [ ] -1 Do not release this because...
>
> Consider this mail my +1 vote (binding) after running against our example 
> notebooks.
>
> Kind regards, Fokko


Re: [PROPOSAL] Improve dev/check-license

2023-10-27 Thread Jean-Baptiste Onofré
By the way, as dev/check-license is also used in iceberg-python and
iceberg-go repositories (iceberg-rust doesn't have it), maybe I can
move forward on new rat release with the fix on hidden directories and
update there as well.

Regards
JB

On Thu, Oct 26, 2023 at 5:19 PM Jean-Baptiste Onofré  wrote:
>
> Hi guys,
>
> During the 1.4.1 vote, we identified some files without ASF headers,
> more specifically in hidden directories (like .baseline). These files
> have not been detected by dev/check-license script.
>
> The reason is because the script uses apache-rat via java -jar (the
> Apache RAT CLI), executing the RAT Report class ()
>
> When providing a directory to scan, the Report class uses a
> DirectoryWalker to traverse the directories, looking for files to
> check.
> Unfortunately, by default, the DirectoryWalker ignores the hidden
> directory (all directories starting with .):
>
> https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/DirectoryWalker.java#L71
>
> https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/Walker.java#L53
>
> In our case, it means that it ignores .baseline, .github, .git,
> .palantir directories. This is not good as .baseline, .github and
> .palantir directories are included in our source distribution, so the
> ASF headers should be clean here.
>
> FYI, I will propose a change in rat to, at least, be able to use a
> ReportConfiguration to define if we want to restrict directories or
> not (be able to configure the Walkers basically). I will try to
> include it in rat 0.17 (I discussed with Claude about that).
> NB: rat gradle and maven plugins define their own Walkers/IReportable
> to avoid this issue.
>
> I propose three options to improve this:
> 1. We keep dev/check-license as it is today (using rat 0.15) and we
> exclude .baseline, .github, .palantir directories from our source
> distribution. It means that we will probably have issues while
> building from source distribution. We can revisit dev/check-license
> when we upgrade to rat 0.17
>
> 2. We keep dev/check-license but we create our own rat scanner with a
> custom IReportable class considering all files/directories. Something
> like:
>
> public static void main(String[] args) throws Exception {
> ReportConfiguration reportConfiguration = new ReportConfiguration();
> reportConfiguration.setHeaderMatcher(Defaults.createDefaultMatcher());
>
> XmlWriter writer = new XmlWriter(new FileWriter("report.xml"));
> ClaimStatistic claimStatistic = new ClaimStatistic();
> RatReport ratReport = XmlReportFactory.createStandardReport(
> writer,
> claimStatistic,
> reportConfiguration
> );
> ratReport.startReport();
> IncludeHiddenDirectoryWalker walker = new
> IncludeHiddenDirectoryWalker(new File("/path/to/iceberg"));
> walker.run(ratReport);
> ratReport.endReport();
> writer.closeDocument();
> }
>
> public class IncludeHiddenDirectoryWalker implements IReportable {
>
>
> private File file;
>
> public IncludeHiddenDirectoryWalker(File file) {
> this.file = file;
> }
>
> @Override
> public void run(RatReport report) throws RatException {
> process(report, file);
> }
>
> public void process(RatReport report, File file) {
> final File[] files = file.listFiles();
> if (files != null) {
> for (File current : files) {
> if (current.isDirectory()) {
> process(report, current);
> } else {
> try {
> Document document = new FileDocument(current);
> report.report(document);
> } catch (RatException e) {
> System.err.println("Can't report file " +
> current.getAbsolutePath() + ": " + e);
> }
> }
> }
> }
> }
>
> }
>
> So, I can contribute this in dev/src for example.
>
> 3. Instead of using dev/check-license, we can use the rat gradle
> plugin (https://github.com/eskatos/creadur-rat-gradle/). I tested it
> and it works fine as it uses a custom IReportable like:
>
> https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L135
>
> https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L189
>
> We can include this plugin the check gradle phas

Re: [PROPOSAL] Improve dev/check-license

2023-10-27 Thread Jean-Baptiste Onofré
Thanks for the heads up Xuanwo.

It's the fourth option :) I will make a comparison with RAT.

Regards
JB

On Fri, Oct 27, 2023 at 12:15 PM Xuanwo  wrote:
>
> iceberg-rust is using apache/skywalking-eyes/header@v0.5.0 now.
>
> BTW, we found skywalking-eyes works really well. It's fast, correct and 
> well-maintained.
>
> Maybe worth take a look.
>
> On Fri, Oct 27, 2023, at 17:48, Jean-Baptiste Onofré wrote:
> > By the way, as dev/check-license is also used in iceberg-python and
> > iceberg-go repositories (iceberg-rust doesn't have it), maybe I can
> > move forward on new rat release with the fix on hidden directories and
> > update there as well.
> >
> > Regards
> > JB
> >
> > On Thu, Oct 26, 2023 at 5:19 PM Jean-Baptiste Onofré  
> > wrote:
> >>
> >> Hi guys,
> >>
> >> During the 1.4.1 vote, we identified some files without ASF headers,
> >> more specifically in hidden directories (like .baseline). These files
> >> have not been detected by dev/check-license script.
> >>
> >> The reason is because the script uses apache-rat via java -jar (the
> >> Apache RAT CLI), executing the RAT Report class ()
> >>
> >> When providing a directory to scan, the Report class uses a
> >> DirectoryWalker to traverse the directories, looking for files to
> >> check.
> >> Unfortunately, by default, the DirectoryWalker ignores the hidden
> >> directory (all directories starting with .):
> >>
> >> https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/DirectoryWalker.java#L71
> >>
> >> https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/Walker.java#L53
> >>
> >> In our case, it means that it ignores .baseline, .github, .git,
> >> .palantir directories. This is not good as .baseline, .github and
> >> .palantir directories are included in our source distribution, so the
> >> ASF headers should be clean here.
> >>
> >> FYI, I will propose a change in rat to, at least, be able to use a
> >> ReportConfiguration to define if we want to restrict directories or
> >> not (be able to configure the Walkers basically). I will try to
> >> include it in rat 0.17 (I discussed with Claude about that).
> >> NB: rat gradle and maven plugins define their own Walkers/IReportable
> >> to avoid this issue.
> >>
> >> I propose three options to improve this:
> >> 1. We keep dev/check-license as it is today (using rat 0.15) and we
> >> exclude .baseline, .github, .palantir directories from our source
> >> distribution. It means that we will probably have issues while
> >> building from source distribution. We can revisit dev/check-license
> >> when we upgrade to rat 0.17
> >>
> >> 2. We keep dev/check-license but we create our own rat scanner with a
> >> custom IReportable class considering all files/directories. Something
> >> like:
> >>
> >> public static void main(String[] args) throws Exception {
> >> ReportConfiguration reportConfiguration = new 
> >> ReportConfiguration();
> >> 
> >> reportConfiguration.setHeaderMatcher(Defaults.createDefaultMatcher());
> >>
> >> XmlWriter writer = new XmlWriter(new FileWriter("report.xml"));
> >> ClaimStatistic claimStatistic = new ClaimStatistic();
> >> RatReport ratReport = XmlReportFactory.createStandardReport(
> >> writer,
> >> claimStatistic,
> >> reportConfiguration
> >> );
> >> ratReport.startReport();
> >> IncludeHiddenDirectoryWalker walker = new
> >> IncludeHiddenDirectoryWalker(new File("/path/to/iceberg"));
> >> walker.run(ratReport);
> >> ratReport.endReport();
> >> writer.closeDocument();
> >> }
> >>
> >> public class IncludeHiddenDirectoryWalker implements IReportable {
> >>
> >>
> >> private File file;
> >>
> >> public IncludeHiddenDirectoryWalker(File file) {
> >> this.file = file;
> >> }
> >>
> >> @Override
> >> public void run(RatReport report) throws RatException {
> >> process(report, file);
> >> }
> >>
> >> public void process(RatReport report, File file) {
> >> final Fil

Re: [PROPOSAL] Improve dev/check-license

2023-10-27 Thread Jean-Baptiste Onofré
Thanks for the details!

To be honest, I still prefer the "light build" approach with gradle,
because it's pretty easy for contributors to check license headers in
their contributed file (as with gradle plugin, it will be included in
the check phase).
I think it's good to have it in the regular "local" contributor build
instead of need of docker/workflow execution.

Just my $0.01

Regards
JB

On Fri, Oct 27, 2023 at 5:16 PM Xuanwo  wrote:
>
> Here are some quick notes for skywalking-eyes, hoping them will be helpful.
>
> Before using skywalking-eyes, we need to setup config as said in [1]. Take 
> iceberg-rust as an example [2].
>
> For checking in CI:
>
> Adding following content in workflow [3]
>
> - name: Check License Header
>   uses: apache/skywalking-eyes/header@v0.5.0
>
> For local usage:
>
> docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header 
> check
> docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header 
> fix
>
> [1]: 
> https://github.com/apache/skywalking-eyes?tab=readme-ov-file#configurations
> [2]: https://github.com/apache/iceberg-rust/blob/main/.licenserc.yaml
> [3]: 
> https://github.com/apache/iceberg-rust/blob/94a1c5d7742bc3b2a9ac7c8da20711a5e2578b89/.github/workflows/ci.yml#L38C1-L39C51
>
> On Fri, Oct 27, 2023, at 22:17, Jean-Baptiste Onofré wrote:
> > Thanks for the heads up Xuanwo.
> >
> > It's the fourth option :) I will make a comparison with RAT.
> >
> > Regards
> > JB
> >
> > On Fri, Oct 27, 2023 at 12:15 PM Xuanwo  wrote:
> >>
> >> iceberg-rust is using apache/skywalking-eyes/header@v0.5.0 now.
> >>
> >> BTW, we found skywalking-eyes works really well. It's fast, correct and 
> >> well-maintained.
> >>
> >> Maybe worth take a look.
> >>
> >> On Fri, Oct 27, 2023, at 17:48, Jean-Baptiste Onofré wrote:
> >> > By the way, as dev/check-license is also used in iceberg-python and
> >> > iceberg-go repositories (iceberg-rust doesn't have it), maybe I can
> >> > move forward on new rat release with the fix on hidden directories and
> >> > update there as well.
> >> >
> >> > Regards
> >> > JB
> >> >
> >> > On Thu, Oct 26, 2023 at 5:19 PM Jean-Baptiste Onofré  
> >> > wrote:
> >> >>
> >> >> Hi guys,
> >> >>
> >> >> During the 1.4.1 vote, we identified some files without ASF headers,
> >> >> more specifically in hidden directories (like .baseline). These files
> >> >> have not been detected by dev/check-license script.
> >> >>
> >> >> The reason is because the script uses apache-rat via java -jar (the
> >> >> Apache RAT CLI), executing the RAT Report class ()
> >> >>
> >> >> When providing a directory to scan, the Report class uses a
> >> >> DirectoryWalker to traverse the directories, looking for files to
> >> >> check.
> >> >> Unfortunately, by default, the DirectoryWalker ignores the hidden
> >> >> directory (all directories starting with .):
> >> >>
> >> >> https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/DirectoryWalker.java#L71
> >> >>
> >> >> https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/Walker.java#L53
> >> >>
> >> >> In our case, it means that it ignores .baseline, .github, .git,
> >> >> .palantir directories. This is not good as .baseline, .github and
> >> >> .palantir directories are included in our source distribution, so the
> >> >> ASF headers should be clean here.
> >> >>
> >> >> FYI, I will propose a change in rat to, at least, be able to use a
> >> >> ReportConfiguration to define if we want to restrict directories or
> >> >> not (be able to configure the Walkers basically). I will try to
> >> >> include it in rat 0.17 (I discussed with Claude about that).
> >> >> NB: rat gradle and maven plugins define their own Walkers/IReportable
> >> >> to avoid this issue.
> >> >>
> >> >> I propose three options to improve this:
> >> >> 1. We keep dev/check-license as it is today (using rat 0.15) and we
> >> >> exclude .baseline, .github, .palantir directories from our source
> >> >> distributio

Re: [PROPOSAL] Improve dev/check-license

2023-10-27 Thread Jean-Baptiste Onofré
Correct, we run check-license for each PR thanks to license_check.yml
GH workflow.
However, the contributor has to run the dev/check-license manually
(it's not part of gradle build).

I agree that the PR level is good enough. So, I propose to move
forward on a new rat version without dot-directory limitation. I'm
working on it now and I will update in dev/check-license as soon as I
will have new apache rat version released.

Thanks,
Regards
JB

On Fri, Oct 27, 2023 at 5:48 PM Ryan Blue  wrote:
>
> We already run RAT checks on every PR, so I'm not sure there's a lot of value 
> in moving the checks to gradle. That just means that we would need to use a 
> different framework across the implementations. If there's a way to run 
> license checks in CI that doesn't have the dot-file limitation, that seems 
> ideal to me.
>
> On Fri, Oct 27, 2023 at 8:46 AM Jean-Baptiste Onofré  
> wrote:
>>
>> Thanks for the details!
>>
>> To be honest, I still prefer the "light build" approach with gradle,
>> because it's pretty easy for contributors to check license headers in
>> their contributed file (as with gradle plugin, it will be included in
>> the check phase).
>> I think it's good to have it in the regular "local" contributor build
>> instead of need of docker/workflow execution.
>>
>> Just my $0.01
>>
>> Regards
>> JB
>>
>> On Fri, Oct 27, 2023 at 5:16 PM Xuanwo  wrote:
>> >
>> > Here are some quick notes for skywalking-eyes, hoping them will be helpful.
>> >
>> > Before using skywalking-eyes, we need to setup config as said in [1]. Take 
>> > iceberg-rust as an example [2].
>> >
>> > For checking in CI:
>> >
>> > Adding following content in workflow [3]
>> >
>> > - name: Check License Header
>> >   uses: apache/skywalking-eyes/header@v0.5.0
>> >
>> > For local usage:
>> >
>> > docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes 
>> > header check
>> > docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes 
>> > header fix
>> >
>> > [1]: 
>> > https://github.com/apache/skywalking-eyes?tab=readme-ov-file#configurations
>> > [2]: https://github.com/apache/iceberg-rust/blob/main/.licenserc.yaml
>> > [3]: 
>> > https://github.com/apache/iceberg-rust/blob/94a1c5d7742bc3b2a9ac7c8da20711a5e2578b89/.github/workflows/ci.yml#L38C1-L39C51
>> >
>> > On Fri, Oct 27, 2023, at 22:17, Jean-Baptiste Onofré wrote:
>> > > Thanks for the heads up Xuanwo.
>> > >
>> > > It's the fourth option :) I will make a comparison with RAT.
>> > >
>> > > Regards
>> > > JB
>> > >
>> > > On Fri, Oct 27, 2023 at 12:15 PM Xuanwo  wrote:
>> > >>
>> > >> iceberg-rust is using apache/skywalking-eyes/header@v0.5.0 now.
>> > >>
>> > >> BTW, we found skywalking-eyes works really well. It's fast, correct and 
>> > >> well-maintained.
>> > >>
>> > >> Maybe worth take a look.
>> > >>
>> > >> On Fri, Oct 27, 2023, at 17:48, Jean-Baptiste Onofré wrote:
>> > >> > By the way, as dev/check-license is also used in iceberg-python and
>> > >> > iceberg-go repositories (iceberg-rust doesn't have it), maybe I can
>> > >> > move forward on new rat release with the fix on hidden directories and
>> > >> > update there as well.
>> > >> >
>> > >> > Regards
>> > >> > JB
>> > >> >
>> > >> > On Thu, Oct 26, 2023 at 5:19 PM Jean-Baptiste Onofré 
>> > >> >  wrote:
>> > >> >>
>> > >> >> Hi guys,
>> > >> >>
>> > >> >> During the 1.4.1 vote, we identified some files without ASF headers,
>> > >> >> more specifically in hidden directories (like .baseline). These files
>> > >> >> have not been detected by dev/check-license script.
>> > >> >>
>> > >> >> The reason is because the script uses apache-rat via java -jar (the
>> > >> >> Apache RAT CLI), executing the RAT Report class ()
>> > >> >>
>> > >> >> When providing a directory to scan, the Report class uses a
>> > >> >> DirectoryWalker to traverse the directories, looking for files to
>> > >&g

Re: [DISCUSS] Apache Iceberg 1.4.2

2023-10-28 Thread Jean-Baptiste Onofré
Hi Amogh

It sounds good to me.
I just saw that you already started the release vote. It's a bit short
to discuss in this thread, especially during the weekend :)

Since it's a regression, you did well to start the vote. We can always
do a 1.4.3 including other fixes if needed.

Thanks,
Regards
JB

On Sat, Oct 28, 2023 at 7:28 PM Amogh Jahagirdar  wrote:
>
> Hi all,
>
> I wanted to start a discussion to have a 1.4.2 patch release.
>
> For context, invalid split offsets were written in metadata due to a bug in 
> 1.4.0.
> 1.4.1 included a patch for ignoring split offsets in case it is known they 
> are invalid. The goal of the patch was to prevent engines from failing when 
> reading these invalid offsets.
>
> However, recently it was discovered that the patch doesn't cover another code 
> path so engines can still fail when reading the invalid split offset 
> metadata. The patch for handling this is here .
>
> Starting this thread to see if folks have any other bug fixes that should be 
> included in a 1.4.2 patch release? I've created a milestone here .
>
> Thanks,
>
> Amogh Jahagirdar


Re: [VOTE] Release Apache Iceberg 1.4.2 RC0

2023-10-28 Thread Jean-Baptiste Onofré
+1 (non binding)

I checked:
- hash and signature are good
- ASF headers are still missing on some files (.baseline, etc): it has
been fixed on main but not cherry picked on 1.4.x branch
- still puffin binary files in the source distribution (I will work on
a fix about that)
- build ok from source distribution
- NB: dist area (both release and dev) should be cleaned from old
releases. I will tackle that.

Thanks,
Regards
JB

On Sat, Oct 28, 2023 at 11:09 PM Amogh Jahagirdar  wrote:
>
> Hi Everyone,
>
> I propose that we release the following RC as the official Apache Iceberg 
> 1.4.2 release.
>
> The commit ID is f6bb9173b13424d77e7ad8439b5ef9627e530cb2
> * This corresponds to the tag: apache-iceberg-1.4.2-rc0
> * https://github.com/apache/iceberg/commits/apache-iceberg-1.4.2-rc0
> * 
> https://github.com/apache/iceberg/tree/f6bb9173b13424d77e7ad8439b5ef9627e530cb2
>
> The release tarball, signature, and checksums are here:
> * https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.4.2-rc0
>
> You can find the KEYS file here:
> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
> Convenience binary artifacts are staged on Nexus. The Maven repository URL is:
> * https://repository.apache.org/content/repositories/org apache iceberg-1148
>
> This release includes a patch for ensuring engines can successfully read 
> tables when their split offset metadata was corrupted due to a bug in 1.4.0. 
> See https://github.com/apache/iceberg/pull/8925 for more details.
>
> Please download, verify, and test.
>
> Please vote in the next 72 hours.
> [ ] +1 Release this as Apache Iceberg 1.4.2
> [ ] +0
> [ ] -1 Do not release this because...
>
> Only PMC members have binding votes, but other community members are 
> encouraged to cast
> non-binding votes. This vote will pass if there are 3 binding +1 votes and 
> more binding
> +1 votes than -1 votes.


Re: Iceberg Logo Fix and Iceberg Swag Shop

2023-11-01 Thread Jean-Baptiste Onofré
Hi Brian,

Good catch.

We need to get approval from the PMC, and notify ASF VP Brand Management
(Mark Thomas) by sending a message to tradema...@apache.org.
We can also be in touch with ASF comdev and marketing teams to help to
update the logo and so.

I can help with this, don't hesitate to ping me !

Regards
JB

On Wed, Nov 1, 2023 at 7:56 AM Brian Olsen  wrote:

> Hey Iceberg Nation,
>
> I wanted to address an issue with the Iceberg Logo used by the ASF.
> Somewhere along the way, a hole was added to the Iceberg logo (global
> warming? 😬). I first noticed it when uploading the logo to the Wikipedia
> Commons ,
> but thought it was perhaps intentional at the time.
>
> This came up again when I was looking for options to buy an Iceberg shirt
> on RedBubble from the ASF Official store
> . However, when looking at
> the shirts I remembered the holey Iceberg seeing the logo on a non-white
> shirt.
>
> [image: image.png]
>
> I followed up with Ryan, and he said this hole wasn't originally there and
> isn't supposed to be there. I want to add the RedBubble shop to the Iceberg
> site. I believe having a way for all of us to show our ❤️ for Iceberg is
> one of the best ways to build not only awareness but a common identity
> around the project.
>
> The Tabular design team has created a fixed SVG file, I just wanted to
> better understand the steps necessary to get this approved by the PMC, and
> where we should submit this to update the ASF logo and get them to
> ultimately update the redbubble site. Following that, I will add a PR to
> add the Redbubble site with the fixed logo to our site.
>
> Thanks all,
> Bits
>
>
>


Re: [PROPOSAL] Use Microsoft Style Guide for documentation

2023-11-01 Thread Jean-Baptiste Onofré
Hi Brian

I like the proposal, it sounds like a good way to "align" our documentation.

Thanks !
Regards
JB

On Wed, Nov 1, 2023 at 8:20 AM Brian Olsen  wrote:
>
> Hey Iceberg Nation, As I've gone through the Iceberg docs, I've noticed a lot 
> of inconsistencies with terminology, grammar, and style. As a distributed 
> community, we have a lot of non-native English speakers reading and writing 
> our documentation. I propose we adopt the Microsoft Style Guide to improve 
> the communication and consistency of the docs. Common rules like defaulting 
> to use present tense not only make the documentation consistent but also more 
> accessible for those who struggle to understand complex conjugations. Then 
> there are examples like making sure to capitalize proper nouns like (Spark, 
> Flink, Trino, Apache Software Foundation, etc...). You may think, that's 
> great Brian, but good luck getting everyone reading the project and following 
> that. I also want to propose adding a prose linter called Vale, that will 
> enable us to add the existing rules for the Microsoft Style Guide, and our 
> own custom rules to ensure consistent style with documentation changes.
> Let's discuss this in the sync tomorrow! Bits


Re: PR: implementation of timestamp_ns and timestamptz_ns

2023-11-07 Thread Jean-Baptiste Onofré
Hi Jacob,

Thanks for the update. I will take a look.

Regards
JB

NB: sorry guys, I was offline since last Thursday as we have been hit
by Ciaran storm. We still don't have electricity and damages at home.
Things should be better in the coming days.

On Tue, Nov 7, 2023 at 6:00 PM Jacob Marble  wrote:
>
> Good morning Iceberg Dev-ers,
>
> I've worked through most of a first-pass review in this PR. Please take a 
> look!
> https://github.com/apache/iceberg/pull/8971
>
> --
> Jacob Marble
> 🇺🇸 🇺🇦


Re: Updating the Iceberg table architecture diagram

2023-11-09 Thread Jean-Baptiste Onofré
Hi

I think we have to keep it "clear and simple" as possible.
I would prefer to have one diagram per spec version (to be clear in the
scope).

So, I would rather keep the current diagram (working for v1) and add a new
one v2 centric.
It would be great to have a "side by side" presentation, something like:

   v1 |v 2

When v3 will be out, we can add a new v3 centric diagram.

Regards
JB

On Wed, Nov 8, 2023 at 6:33 PM Jason Hughes 
wrote:

> since v2 has been out for a while and most tools that support iceberg
> support v2 (not to mention some only support v2), I think having a single
> diagram and using dotted lines for the delete manifests and delete files
> will cause more confusion than benefit. also because of the support and
> adoption of v2, personally I'm in favor of replacing the arch diagram with
> this one that's for v2. that said, if folks are in favor of it, I can also
> edit the v1 table diagram to include stats files too and have them coexist
> on the spec page, noting which is v1 and which is v2
>
> what does everyone think?
>
>
> Jason Hughes
>
>
> Dremio | Director of Technical Advocacy
>
>
>
>
>
>
> On Mon, Nov 6, 2023 at 12:47 AM Ajantha Bhat 
> wrote:
>
>> However, there are a lot of boxes and new terms. What do you think of
>>> keeping both files, and indicating that the old applies to V1 tables, and
>>> the new one to V2 tables.
>>
>>
>> Statistics are common for both V1 and V2. So, we can't say old applies to
>> V1 and new applies to V2.
>> For delete, we are using existing boxes.
>> So, I think we can keep only one image with dotted delete manifest and
>> delete files mentioning it is specific to V2 merge-on-read condition.
>>
>> Suggestions are welcome.
>>
>> On Mon, Nov 6, 2023 at 1:54 PM Eduard Tudenhoefner <
>> etudenhoef...@apache.org> wrote:
>>
>>> Thanks for updating the diagram and +1 to Fokko's suggestion.
>>>
>>> On Fri, Nov 3, 2023 at 3:43 PM Fokko Driesprong 
>>> wrote:
>>>
 Hey Jason, thanks for updating the chart.

 I like it a lot. However, there are a lot of boxes and new terms. What
 do you think of keeping both files, and indicating that the old applies to
 V1 tables, and the new one to V2 tables.

 Kind regards,
 Fokko

 Op vr 3 nov 2023 om 14:37 schreef Aaron Niskode-Dossett
 :

> An update would be greatly appreciated, thank you!
>
> On Thu, Nov 2, 2023 at 12:42 PM Jason Hughes 
> wrote:
>
>> Hey all,
>>
>> The current architecture diagram
>>  for an iceberg
>> table hasn't been updated in over 3 years, and there's are some aspects 
>> to
>> the architecture of an iceberg table that have changed, most notably 
>> delete
>> files and puffin files. since this diagram gets a lot of use in 
>> enablement
>> content around the community and isn't totally accurate anymore, @Ajantha
>> Bhat U  and I discussed updating it to be
>> more accurate
>>
>> here's an updated version of the diagram
>> 
>> we put together
>>
>> a few points for discussion that we're interested in others' thoughts
>> on:
>>
>>1. the diagram is obviously somewhat more visually complicated
>>than the current one, but IMO the benefit of being more accurate for 
>> people
>>learning iceberg outweighs the additional complexity
>>2. since the partition stats spec PR
>> just got merged, we
>>thought it'd be good to include that too while we're updating it, and
>>combine puffin files with partition stats files into one category of 
>> files
>>in the diagram labeled "statistics files". we combined them in the 
>> diagram,
>>rather than splitting them up, because 1. it provides a simpler 
>> diagram, 2.
>>gets the primary point across, and 3. they both serve the purpose of
>>providing statistics for tools to leverage (albeit for different use 
>> cases)
>>3. we put statistics files in place in the diagram for both s0
>>and s1, though we could only have statistics files for s1, which 
>> would 1.
>>make the diagram simpler, and 2. show a simple example of the use 
>> case of
>>not needing stats files initially, but then as data grows and/or query
>>patterns change, now stats files are needed
>>
>> if folks are on board with updating the diagram, and after we come to
>> a conclusion on the above discussion points and any others that come up, 
>> I
>> can export it to a png and create a PR to update the arch diagram image 
>> on
>> the site
>>
>> thanks!
>>
>>
>> Jason Hughes
>>
>>
>> Dremio | Director of Technical Advocacy
>>
>>
>>
>>
>>

Re: Kafka Connect sink

2023-11-09 Thread Jean-Baptiste Onofré
Hi Bryan

I would like to follow up about kafka connect sink donation.

If you don't mind, I would like to propose a PR (on the tabular repo)
with the changes we would need for the donation into the Apache
Iceberg repo.
I should have time to work on this next week.

Thoughts ?

Thanks
Regards
JB

On Thu, Oct 19, 2023 at 11:13 AM Bryan Keller  wrote:
>
> Hi JB,
>
> The plan is to move forward, unless there are concerns from anyone. I got a 
> little bit sidetracked but will be working on addressing comments in the 
> first PR then following up with more PRs after that one is merged, so stay 
> tuned for those.
>
> Thanks,
> Bryan
>
>
>
> > On Oct 18, 2023, at 7:13 AM, Jean-Baptiste Onofré  wrote:
> >
> > Hi Bryan,
> >
> > Any update on this thread ? Can I help somehow ?
> >
> > Thanks,
> > Regards
> > JB
> >
> >> On Mon, Oct 2, 2023 at 7:39 PM Bryan Keller  wrote:
> >>
> >> Hi all,
> >>
> >> We at Tabular would like to contribute our Kafka Connect Iceberg sink to 
> >> the Iceberg project. It would be great to give Iceberg users another 
> >> option for landing data from Kafka into Iceberg tables that is supported 
> >> by the Iceberg community. Kafka Connect is a part of systems from AWS, 
> >> Confluent, Redpanda, and so on, so it can make landing data from Kafka 
> >> into Iceberg much easier for those without a Flink or Spark infrastructure.
> >>
> >> There are a few Iceberg sink implementations out there for Kafka Connect, 
> >> but we feel this one covers most of the features users have requested, 
> >> such as exactly-once processing, schema evolution, and multi-table fanout. 
> >> And having the sink backed by the Iceberg community will help it to evolve 
> >> and improve over time.
> >>
> >> If this sounds like something everyone would like to see added to Iceberg, 
> >> I've opened a PR that includes some initial pieces of the sink. The 
> >> thought was to break up the submission into parts so each could be 
> >> reviewed more easily. Some design docs and notes can be found in the 
> >> original repo here: 
> >> https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs
> >>
> >> We'd like to get feedback if others approve of moving forward with this or 
> >> not.
> >>
> >> Thanks,
> >> Bryan
> >>


Re: Ingestion Layer?

2023-11-09 Thread Jean-Baptiste Onofré
Hi Austin

I agree. The idea is to have a ingestion layer with a kind of mix of Apache
Camel (for EIPs), Apache Beam like IOs, etc.

I started a PoC powered by Apache Karaf Minho like runtime.

I should be able to share a first more concrete proposal by next week
(sorry still no electricity at home after Ciaran storm so I don’t moving
forward as fast as I would like).

Regards
JB

Le jeu. 9 nov. 2023 à 18:53, Austin Bennett  a écrit :

> Just a little comment that I imagine there would be massive value from an
> ingestion layer.
>
> Making it easier to add more integrations will be a great benefit for the
> ecosystem, adoption.
>
>
> Concretely, FWIW, I'm evaluating Iceberg [ and alternatives ] for an
> enterprise adoption, and existing integrations [ both for reads from and
> ingesting into iceberg ] and ease-of-contributing lacking integrations are
> TOP of mind.
>
>
>
> On Mon, Oct 2, 2023 at 11:03 PM Jean-Baptiste Onofré 
> wrote:
>
>> From my standpoint, Kafka Connect is interesting to also address
>> processing logic without Spark or Flink runtime. Definitely
>> interesting to have Kafka integration/processing (even for me Kafka
>> and Kafka Connect are two different things ;)).
>>
>> For pure data ingestion part, I think it would make sense to have a
>> "ingestion layer" in Iceberg where we can have pluggable IO and where
>> we can both implement our own IO (specifically for Iceberg as Apache
>> Beam IOs for instance) and where we can leverage existing integration
>> framework (like Apache Camel).
>> Why not have JMS/ActiveMQ integration in Iceberg via an IO, or Pulsar
>> integration ? I think having such layer would be very interesting for
>> the community and we can have more users (it's what happened at Apache
>> Beam, the first IOs were only Google "centric" (bigtable, bigquery,
>> gfs, ...), we added new IOs (JMS, Kafka, JDBC, ...) and we saw a great
>> benefit for adoption :)).
>> DISCLAIMER: I've implemented IOs in Beam and components in Camel ;)
>>
>> I will do some investigation about that. I will draft a proposal.
>>
>> Regards
>> JB
>>
>> On Tue, Oct 3, 2023 at 7:23 AM Ajantha Bhat 
>> wrote:
>> >
>> > Hi Bryan,
>> >
>> > I am very happy to see this contribution.
>> > I have recently tested this project with Nessie catalog and very much
>> liked it.
>> >
>> > However, I still don't know the benefits of using kafka-connect instead
>> of directly consuming
>> > from the kafka like Delta-lake's implementation.
>> > https://github.com/delta-io/kafka-delta-ingest/blob/main/doc/DESIGN.md
>> >
>> > I am not an expert in this ingestion domain and recently got started.
>> > I hope someone will chime in and we will have detailed analysis over
>> the design.
>> >
>> > Looking forward to this feature.
>> >
>> > Thanks,
>> > Ajantha
>> >
>> > On Tue, Oct 3, 2023 at 12:18 AM Jean-Baptiste Onofré 
>> wrote:
>> >>
>> >> Hi Bryan
>> >>
>> >> That’s a great news ! Thanks a lot for the proposal.
>> >>
>> >> I will take a look on the PR and existing connector.
>> >> I’m sure the Iceberg community will be very happy to see this and we
>> will able to add new features and improvements thanks to the community
>> feedback.
>> >> I would be more than happy to help for donation (I know that the
>> connector is already under Apache license but we have to double check the
>> ICLA for the initial contributors etc , just to be sure we are good there).
>> >>
>> >> Thanks again !
>> >>
>> >> Let’s see what the others are thinking.
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> Le lun. 2 oct. 2023 à 19:39, Bryan Keller  a écrit
>> :
>> >>>
>> >>> Hi all,
>> >>>
>> >>> We at Tabular would like to contribute our Kafka Connect Iceberg sink
>> to the Iceberg project. It would be great to give Iceberg users another
>> option for landing data from Kafka into Iceberg tables that is supported by
>> the Iceberg community. Kafka Connect is a part of systems from AWS,
>> Confluent, Redpanda, and so on, so it can make landing data from Kafka into
>> Iceberg much easier for those without a Flink or Spark infrastructure.
>> >>>
>> >>> There are a few Iceberg sink implementations out there for Kafka
>> Connect, but we feel this one covers most of the features users have
>> requested, such as exactly-once processing, schema evolution, and
>> multi-table fanout. And having the sink backed by the Iceberg community
>> will help it to evolve and improve over time.
>> >>>
>> >>> If this sounds like something everyone would like to see added to
>> Iceberg, I've opened a PR that includes some initial pieces of the sink.
>> The thought was to break up the submission into parts so each could be
>> reviewed more easily. Some design docs and notes can be found in the
>> original repo here:
>> https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs
>> >>>
>> >>> We'd like to get feedback if others approve of moving forward with
>> this or not.
>> >>>
>> >>> Thanks,
>> >>> Bryan
>> >>>
>>
>


Re: Kafka Connect sink

2023-11-13 Thread Jean-Baptiste Onofré
Hi Bryan

Thanks for the update. Please let me know which part I can help. I
will do a new pass on the first PR.

Regards
JB

On Mon, Nov 13, 2023 at 3:31 PM Bryan Keller  wrote:
>
> Hey JB,
>
> Smaller PRs to the current repo are welcome, though I’m trying to keep 
> disruptive changes to a minimum during the submission process. I will have 
> the next PR out as soon as the first is merged.
>
> Bryan
>
> > On Nov 9, 2023, at 9:30 AM, Jean-Baptiste Onofré  wrote:
> >
> > Hi Bryan
> >
> > I would like to follow up about kafka connect sink donation.
> >
> > If you don't mind, I would like to propose a PR (on the tabular repo)
> > with the changes we would need for the donation into the Apache
> > Iceberg repo.
> > I should have time to work on this next week.
> >
> > Thoughts ?
> >
> > Thanks
> > Regards
> > JB
> >
> >> On Thu, Oct 19, 2023 at 11:13 AM Bryan Keller  wrote:
> >>
> >> Hi JB,
> >>
> >> The plan is to move forward, unless there are concerns from anyone. I got 
> >> a little bit sidetracked but will be working on addressing comments in the 
> >> first PR then following up with more PRs after that one is merged, so stay 
> >> tuned for those.
> >>
> >> Thanks,
> >> Bryan
> >>
> >>
> >>
> >>>> On Oct 18, 2023, at 7:13 AM, Jean-Baptiste Onofré  
> >>>> wrote:
> >>>
> >>> Hi Bryan,
> >>>
> >>> Any update on this thread ? Can I help somehow ?
> >>>
> >>> Thanks,
> >>> Regards
> >>> JB
> >>>
> >>>> On Mon, Oct 2, 2023 at 7:39 PM Bryan Keller  wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> We at Tabular would like to contribute our Kafka Connect Iceberg sink to 
> >>>> the Iceberg project. It would be great to give Iceberg users another 
> >>>> option for landing data from Kafka into Iceberg tables that is supported 
> >>>> by the Iceberg community. Kafka Connect is a part of systems from AWS, 
> >>>> Confluent, Redpanda, and so on, so it can make landing data from Kafka 
> >>>> into Iceberg much easier for those without a Flink or Spark 
> >>>> infrastructure.
> >>>>
> >>>> There are a few Iceberg sink implementations out there for Kafka 
> >>>> Connect, but we feel this one covers most of the features users have 
> >>>> requested, such as exactly-once processing, schema evolution, and 
> >>>> multi-table fanout. And having the sink backed by the Iceberg community 
> >>>> will help it to evolve and improve over time.
> >>>>
> >>>> If this sounds like something everyone would like to see added to 
> >>>> Iceberg, I've opened a PR that includes some initial pieces of the sink. 
> >>>> The thought was to break up the submission into parts so each could be 
> >>>> reviewed more easily. Some design docs and notes can be found in the 
> >>>> original repo here: 
> >>>> https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs
> >>>>
> >>>> We'd like to get feedback if others approve of moving forward with this 
> >>>> or not.
> >>>>
> >>>> Thanks,
> >>>> Bryan
> >>>>


[PROPOSAL] Apache Iceberg 1.4.3 release

2023-11-14 Thread Jean-Baptiste Onofré
Hi guys,

Avro 1.11.3 has been released, fixing CVE-2023-39410.
We already updated to Avro 1.11.3 on main.

About CVE, we also already use guava 32.1.3, fixing CVE-2023-2976.

As the Avro CVE is classified high (see
https://nvd.nist.gov/vuln/detail/CVE-2023-39410), I propose to bump to
Avro 1.11.3 on our 1.4.x branch and release Iceberg 1.4.3 including
this.

Thoughts ?

If there are no objections, I'm volunteer to drive this release.

Thanks,
Regards
JB


Re: [PROPOSAL] Apache Iceberg 1.4.3 release

2023-11-19 Thread Jean-Baptiste Onofré
Hi

As there's no objection, I will move forward and prepare the release to vote.

I will keep you posted asap.

Thanks,
Regards
JB

On Wed, Nov 15, 2023 at 6:11 AM Jean-Baptiste Onofré  wrote:
>
> Hi guys,
>
> Avro 1.11.3 has been released, fixing CVE-2023-39410.
> We already updated to Avro 1.11.3 on main.
>
> About CVE, we also already use guava 32.1.3, fixing CVE-2023-2976.
>
> As the Avro CVE is classified high (see
> https://nvd.nist.gov/vuln/detail/CVE-2023-39410), I propose to bump to
> Avro 1.11.3 on our 1.4.x branch and release Iceberg 1.4.3 including
> this.
>
> Thoughts ?
>
> If there are no objections, I'm volunteer to drive this release.
>
> Thanks,
> Regards
> JB


Re: [PROPOSAL] Apache Iceberg 1.4.3 release

2023-11-20 Thread Jean-Baptiste Onofré
Thanks Fokko !

I'm on the local build check and issue pass. I plan to start the
release tomorrow.

Regards
JB

On Mon, Nov 20, 2023 at 8:56 AM Driesprong, Fokko  wrote:
>
> I took the liberty and created a 1.4.3 milestone to track any issues that we 
> want to backport.
>
> Kind regards,
> Fokko Driesprong
>
> Op ma 20 nov 2023 om 08:50 schreef Driesprong, Fokko :
>>
>> Hey JB,
>>
>> Late to the party here, but 1.4.3 sounds like a great idea. Let me know if 
>> you need any help with any release steps.
>>
>> Kind regards,
>> Fokko Driesprong
>>
>> Op ma 20 nov 2023 om 08:16 schreef Jean-Baptiste Onofré :
>>>
>>> Hi
>>>
>>> As there's no objection, I will move forward and prepare the release to 
>>> vote.
>>>
>>> I will keep you posted asap.
>>>
>>> Thanks,
>>> Regards
>>> JB
>>>
>>> On Wed, Nov 15, 2023 at 6:11 AM Jean-Baptiste Onofré  
>>> wrote:
>>> >
>>> > Hi guys,
>>> >
>>> > Avro 1.11.3 has been released, fixing CVE-2023-39410.
>>> > We already updated to Avro 1.11.3 on main.
>>> >
>>> > About CVE, we also already use guava 32.1.3, fixing CVE-2023-2976.
>>> >
>>> > As the Avro CVE is classified high (see
>>> > https://nvd.nist.gov/vuln/detail/CVE-2023-39410), I propose to bump to
>>> > Avro 1.11.3 on our 1.4.x branch and release Iceberg 1.4.3 including
>>> > this.
>>> >
>>> > Thoughts ?
>>> >
>>> > If there are no objections, I'm volunteer to drive this release.
>>> >
>>> > Thanks,
>>> > Regards
>>> > JB


Re: [PROPOSAL] Apache Iceberg 1.4.3 release

2023-11-21 Thread Jean-Baptiste Onofré
Hi

We chatted about the 1.4.3 release with Ed.

We have few PRs we want to include and as it’s Thanksgiving this week, I
will submit the release to vote on Tuesday next week.

Regards
JB

Le lun. 20 nov. 2023 à 17:24, Jean-Baptiste Onofré  a
écrit :

> Thanks Fokko !
>
> I'm on the local build check and issue pass. I plan to start the
> release tomorrow.
>
> Regards
> JB
>
> On Mon, Nov 20, 2023 at 8:56 AM Driesprong, Fokko 
> wrote:
> >
> > I took the liberty and created a 1.4.3 milestone to track any issues
> that we want to backport.
> >
> > Kind regards,
> > Fokko Driesprong
> >
> > Op ma 20 nov 2023 om 08:50 schreef Driesprong, Fokko
> :
> >>
> >> Hey JB,
> >>
> >> Late to the party here, but 1.4.3 sounds like a great idea. Let me know
> if you need any help with any release steps.
> >>
> >> Kind regards,
> >> Fokko Driesprong
> >>
> >> Op ma 20 nov 2023 om 08:16 schreef Jean-Baptiste Onofré <
> j...@nanthrax.net>:
> >>>
> >>> Hi
> >>>
> >>> As there's no objection, I will move forward and prepare the release
> to vote.
> >>>
> >>> I will keep you posted asap.
> >>>
> >>> Thanks,
> >>> Regards
> >>> JB
> >>>
> >>> On Wed, Nov 15, 2023 at 6:11 AM Jean-Baptiste Onofré 
> wrote:
> >>> >
> >>> > Hi guys,
> >>> >
> >>> > Avro 1.11.3 has been released, fixing CVE-2023-39410.
> >>> > We already updated to Avro 1.11.3 on main.
> >>> >
> >>> > About CVE, we also already use guava 32.1.3, fixing CVE-2023-2976.
> >>> >
> >>> > As the Avro CVE is classified high (see
> >>> > https://nvd.nist.gov/vuln/detail/CVE-2023-39410), I propose to bump
> to
> >>> > Avro 1.11.3 on our 1.4.x branch and release Iceberg 1.4.3 including
> >>> > this.
> >>> >
> >>> > Thoughts ?
> >>> >
> >>> > If there are no objections, I'm volunteer to drive this release.
> >>> >
> >>> > Thanks,
> >>> > Regards
> >>> > JB
>


Re: [PROPOSAL] Apache Iceberg 1.4.3 release

2023-11-22 Thread Jean-Baptiste Onofré
Hi guys

Quick update about that:
1. I took a deeper look today about the Avro CVE issue. I don't think
we are impacted on Iceberg (the CVE is about deserialization of
corrupted data potentially causing out of memory). The fix
(https://github.com/apache/avro/commit/a12a7e44d) introduces
SystemLimitException that uses system properties to define boundaries
and avoid the OOM (even if the deserialization won't still work :)).
So, nothing really changes from an Iceberg perspective.
2. As discussed during the community meeting today, as (1) doesn't
really have an impact on Iceberg, there's no urgency to release 1.4.3.
We agreed to wait new fixes for 1.4.3 release.

I'm still volunteering to cut the 1.4.3 patch release when ready (I
did all the build checks on my machine :)), and I'm doing a pass on GH
issues.

Thanks !
Regards
JB

On Tue, Nov 21, 2023 at 8:49 PM Jean-Baptiste Onofré  wrote:
>
> Hi
>
> We chatted about the 1.4.3 release with Ed.
>
> We have few PRs we want to include and as it’s Thanksgiving this week, I will 
> submit the release to vote on Tuesday next week.
>
> Regards
> JB
>
> Le lun. 20 nov. 2023 à 17:24, Jean-Baptiste Onofré  a 
> écrit :
>>
>> Thanks Fokko !
>>
>> I'm on the local build check and issue pass. I plan to start the
>> release tomorrow.
>>
>> Regards
>> JB
>>
>> On Mon, Nov 20, 2023 at 8:56 AM Driesprong, Fokko  
>> wrote:
>> >
>> > I took the liberty and created a 1.4.3 milestone to track any issues that 
>> > we want to backport.
>> >
>> > Kind regards,
>> > Fokko Driesprong
>> >
>> > Op ma 20 nov 2023 om 08:50 schreef Driesprong, Fokko 
>> > :
>> >>
>> >> Hey JB,
>> >>
>> >> Late to the party here, but 1.4.3 sounds like a great idea. Let me know 
>> >> if you need any help with any release steps.
>> >>
>> >> Kind regards,
>> >> Fokko Driesprong
>> >>
>> >> Op ma 20 nov 2023 om 08:16 schreef Jean-Baptiste Onofré 
>> >> :
>> >>>
>> >>> Hi
>> >>>
>> >>> As there's no objection, I will move forward and prepare the release to 
>> >>> vote.
>> >>>
>> >>> I will keep you posted asap.
>> >>>
>> >>> Thanks,
>> >>> Regards
>> >>> JB
>> >>>
>> >>> On Wed, Nov 15, 2023 at 6:11 AM Jean-Baptiste Onofré  
>> >>> wrote:
>> >>> >
>> >>> > Hi guys,
>> >>> >
>> >>> > Avro 1.11.3 has been released, fixing CVE-2023-39410.
>> >>> > We already updated to Avro 1.11.3 on main.
>> >>> >
>> >>> > About CVE, we also already use guava 32.1.3, fixing CVE-2023-2976.
>> >>> >
>> >>> > As the Avro CVE is classified high (see
>> >>> > https://nvd.nist.gov/vuln/detail/CVE-2023-39410), I propose to bump to
>> >>> > Avro 1.11.3 on our 1.4.x branch and release Iceberg 1.4.3 including
>> >>> > this.
>> >>> >
>> >>> > Thoughts ?
>> >>> >
>> >>> > If there are no objections, I'm volunteer to drive this release.
>> >>> >
>> >>> > Thanks,
>> >>> > Regards
>> >>> > JB


Re: [PROPOSAL] Apache Iceberg 1.4.3 release

2023-12-02 Thread Jean-Baptiste Onofré
Hi Fokko,

I know we have some fixes on the fly. So, let me do a new pass on
issues and backports and I will send an update on the mailing list
(early next week).

Thanks !
Regards
JB

On Sun, Dec 3, 2023 at 1:40 AM Fokko Driesprong  wrote:
>
> Hey JB,
>
> I think there is no harm in doing a patch release.
>
> There was another request to backport an issue, I've created a PR: 
> https://github.com/apache/iceberg/pull/8969#issuecomment-1837286383
>
> Kind regards,
> Fokko
>
> Op wo 22 nov 2023 om 18:50 schreef Jean-Baptiste Onofré :
>>
>> Hi guys
>>
>> Quick update about that:
>> 1. I took a deeper look today about the Avro CVE issue. I don't think
>> we are impacted on Iceberg (the CVE is about deserialization of
>> corrupted data potentially causing out of memory). The fix
>> (https://github.com/apache/avro/commit/a12a7e44d) introduces
>> SystemLimitException that uses system properties to define boundaries
>> and avoid the OOM (even if the deserialization won't still work :)).
>> So, nothing really changes from an Iceberg perspective.
>> 2. As discussed during the community meeting today, as (1) doesn't
>> really have an impact on Iceberg, there's no urgency to release 1.4.3.
>> We agreed to wait new fixes for 1.4.3 release.
>>
>> I'm still volunteering to cut the 1.4.3 patch release when ready (I
>> did all the build checks on my machine :)), and I'm doing a pass on GH
>> issues.
>>
>> Thanks !
>> Regards
>> JB
>>
>> On Tue, Nov 21, 2023 at 8:49 PM Jean-Baptiste Onofré  
>> wrote:
>> >
>> > Hi
>> >
>> > We chatted about the 1.4.3 release with Ed.
>> >
>> > We have few PRs we want to include and as it’s Thanksgiving this week, I 
>> > will submit the release to vote on Tuesday next week.
>> >
>> > Regards
>> > JB
>> >
>> > Le lun. 20 nov. 2023 à 17:24, Jean-Baptiste Onofré  a 
>> > écrit :
>> >>
>> >> Thanks Fokko !
>> >>
>> >> I'm on the local build check and issue pass. I plan to start the
>> >> release tomorrow.
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> On Mon, Nov 20, 2023 at 8:56 AM Driesprong, Fokko  
>> >> wrote:
>> >> >
>> >> > I took the liberty and created a 1.4.3 milestone to track any issues 
>> >> > that we want to backport.
>> >> >
>> >> > Kind regards,
>> >> > Fokko Driesprong
>> >> >
>> >> > Op ma 20 nov 2023 om 08:50 schreef Driesprong, Fokko 
>> >> > :
>> >> >>
>> >> >> Hey JB,
>> >> >>
>> >> >> Late to the party here, but 1.4.3 sounds like a great idea. Let me 
>> >> >> know if you need any help with any release steps.
>> >> >>
>> >> >> Kind regards,
>> >> >> Fokko Driesprong
>> >> >>
>> >> >> Op ma 20 nov 2023 om 08:16 schreef Jean-Baptiste Onofré 
>> >> >> :
>> >> >>>
>> >> >>> Hi
>> >> >>>
>> >> >>> As there's no objection, I will move forward and prepare the release 
>> >> >>> to vote.
>> >> >>>
>> >> >>> I will keep you posted asap.
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Regards
>> >> >>> JB
>> >> >>>
>> >> >>> On Wed, Nov 15, 2023 at 6:11 AM Jean-Baptiste Onofré 
>> >> >>>  wrote:
>> >> >>> >
>> >> >>> > Hi guys,
>> >> >>> >
>> >> >>> > Avro 1.11.3 has been released, fixing CVE-2023-39410.
>> >> >>> > We already updated to Avro 1.11.3 on main.
>> >> >>> >
>> >> >>> > About CVE, we also already use guava 32.1.3, fixing CVE-2023-2976.
>> >> >>> >
>> >> >>> > As the Avro CVE is classified high (see
>> >> >>> > https://nvd.nist.gov/vuln/detail/CVE-2023-39410), I propose to bump 
>> >> >>> > to
>> >> >>> > Avro 1.11.3 on our 1.4.x branch and release Iceberg 1.4.3 including
>> >> >>> > this.
>> >> >>> >
>> >> >>> > Thoughts ?
>> >> >>> >
>> >> >>> > If there are no objections, I'm volunteer to drive this release.
>> >> >>> >
>> >> >>> > Thanks,
>> >> >>> > Regards
>> >> >>> > JB


Re: [PROPOSAL] Apache Iceberg 1.4.3 release

2023-12-04 Thread Jean-Baptiste Onofré
Hi Gabor

It sounds reasonable to me.

Let me review #9183.

Thanks !
Regards
JB

On Mon, Dec 4, 2023 at 9:51 AM Gabor Kaszab  wrote:
>
> Hey,
>
> We have recently found a regression in the expireSnapshot functionality. 
> https://github.com/apache/iceberg/pull/9183 Would it make sense to include 
> thisfix as well to the next patch release?
>
> Gabor
>
> On Sun, Dec 3, 2023 at 6:04 AM Jean-Baptiste Onofré  wrote:
>>
>> Hi Fokko,
>>
>> I know we have some fixes on the fly. So, let me do a new pass on
>> issues and backports and I will send an update on the mailing list
>> (early next week).
>>
>> Thanks !
>> Regards
>> JB
>>
>> On Sun, Dec 3, 2023 at 1:40 AM Fokko Driesprong  wrote:
>> >
>> > Hey JB,
>> >
>> > I think there is no harm in doing a patch release.
>> >
>> > There was another request to backport an issue, I've created a PR: 
>> > https://github.com/apache/iceberg/pull/8969#issuecomment-1837286383
>> >
>> > Kind regards,
>> > Fokko
>> >
>> > Op wo 22 nov 2023 om 18:50 schreef Jean-Baptiste Onofré 
>> > :
>> >>
>> >> Hi guys
>> >>
>> >> Quick update about that:
>> >> 1. I took a deeper look today about the Avro CVE issue. I don't think
>> >> we are impacted on Iceberg (the CVE is about deserialization of
>> >> corrupted data potentially causing out of memory). The fix
>> >> (https://github.com/apache/avro/commit/a12a7e44d) introduces
>> >> SystemLimitException that uses system properties to define boundaries
>> >> and avoid the OOM (even if the deserialization won't still work :)).
>> >> So, nothing really changes from an Iceberg perspective.
>> >> 2. As discussed during the community meeting today, as (1) doesn't
>> >> really have an impact on Iceberg, there's no urgency to release 1.4.3.
>> >> We agreed to wait new fixes for 1.4.3 release.
>> >>
>> >> I'm still volunteering to cut the 1.4.3 patch release when ready (I
>> >> did all the build checks on my machine :)), and I'm doing a pass on GH
>> >> issues.
>> >>
>> >> Thanks !
>> >> Regards
>> >> JB
>> >>
>> >> On Tue, Nov 21, 2023 at 8:49 PM Jean-Baptiste Onofré  
>> >> wrote:
>> >> >
>> >> > Hi
>> >> >
>> >> > We chatted about the 1.4.3 release with Ed.
>> >> >
>> >> > We have few PRs we want to include and as it’s Thanksgiving this week, 
>> >> > I will submit the release to vote on Tuesday next week.
>> >> >
>> >> > Regards
>> >> > JB
>> >> >
>> >> > Le lun. 20 nov. 2023 à 17:24, Jean-Baptiste Onofré  
>> >> > a écrit :
>> >> >>
>> >> >> Thanks Fokko !
>> >> >>
>> >> >> I'm on the local build check and issue pass. I plan to start the
>> >> >> release tomorrow.
>> >> >>
>> >> >> Regards
>> >> >> JB
>> >> >>
>> >> >> On Mon, Nov 20, 2023 at 8:56 AM Driesprong, Fokko 
>> >> >>  wrote:
>> >> >> >
>> >> >> > I took the liberty and created a 1.4.3 milestone to track any issues 
>> >> >> > that we want to backport.
>> >> >> >
>> >> >> > Kind regards,
>> >> >> > Fokko Driesprong
>> >> >> >
>> >> >> > Op ma 20 nov 2023 om 08:50 schreef Driesprong, Fokko 
>> >> >> > :
>> >> >> >>
>> >> >> >> Hey JB,
>> >> >> >>
>> >> >> >> Late to the party here, but 1.4.3 sounds like a great idea. Let me 
>> >> >> >> know if you need any help with any release steps.
>> >> >> >>
>> >> >> >> Kind regards,
>> >> >> >> Fokko Driesprong
>> >> >> >>
>> >> >> >> Op ma 20 nov 2023 om 08:16 schreef Jean-Baptiste Onofré 
>> >> >> >> :
>> >> >> >>>
>> >> >> >>> Hi
>> >> >> >>>
>> >> >> >>> As there's no objection, I will move forward and prepare the 
>> >> >> >>> release to vote.
>> >> >> >>>
>> >> >> >>> I will keep you posted asap.
>> >> >> >>>
>> >> >> >>> Thanks,
>> >> >> >>> Regards
>> >> >> >>> JB
>> >> >> >>>
>> >> >> >>> On Wed, Nov 15, 2023 at 6:11 AM Jean-Baptiste Onofré 
>> >> >> >>>  wrote:
>> >> >> >>> >
>> >> >> >>> > Hi guys,
>> >> >> >>> >
>> >> >> >>> > Avro 1.11.3 has been released, fixing CVE-2023-39410.
>> >> >> >>> > We already updated to Avro 1.11.3 on main.
>> >> >> >>> >
>> >> >> >>> > About CVE, we also already use guava 32.1.3, fixing 
>> >> >> >>> > CVE-2023-2976.
>> >> >> >>> >
>> >> >> >>> > As the Avro CVE is classified high (see
>> >> >> >>> > https://nvd.nist.gov/vuln/detail/CVE-2023-39410), I propose to 
>> >> >> >>> > bump to
>> >> >> >>> > Avro 1.11.3 on our 1.4.x branch and release Iceberg 1.4.3 
>> >> >> >>> > including
>> >> >> >>> > this.
>> >> >> >>> >
>> >> >> >>> > Thoughts ?
>> >> >> >>> >
>> >> >> >>> > If there are no objections, I'm volunteer to drive this release.
>> >> >> >>> >
>> >> >> >>> > Thanks,
>> >> >> >>> > Regards
>> >> >> >>> > JB


  1   2   3   4   5   >