Re: [DISCUSS] September report

2019-09-06 Thread Carl Steinbach
+1 to the report +1 to graduation for the same set of reasons mentioned by Owen. - Carl On Fri, Sep 6, 2019 at 12:04 PM Owen O'Malley wrote: > On Fri, Sep 6, 2019 at 12:19 AM Justin Mclean wrote: > >> So why does the project think it's ready to graduate? Mentors do you >> think the project is

Re: New committer and PPMC member, Anton Okolnychyi

2019-09-06 Thread RD
Congratulations Anton! Regards On Tue, Sep 3, 2019 at 9:03 AM Xabriel Collazo Mojica wrote: > Congrats Anton! > > > > *Xabriel J Collazo Mojica* | Senior Software Engineer | Adobe | > xcoll...@adobe.com > > > > *From: *Anjali Norwood > *Reply-To: *"dev@iceberg.apache.org" > *Date: *Tuesd

Re: [DISCUSS] September report

2019-09-06 Thread Owen O'Malley
On Fri, Sep 6, 2019 at 12:19 AM Justin Mclean wrote: > So why does the project think it's ready to graduate? Mentors do you think > the project is ready to graduate? > It has to make a release or two, but I agree with Ryan that it approaching graduation. The project entered Apache with five Apac

Re: [DISCUSS] September report

2019-09-06 Thread Owen O'Malley
On Wed, Sep 4, 2019 at 4:55 PM Ryan Blue wrote: > Hi everyone, > > Here's a draft of this month's report to the IPMC. Please reply with > comments if you'd like to add anything! > > rb > > ## Iceberg > > Iceberg is a table format for large, slow-moving tabular data. > > Iceberg has been incubatin

Re: [DISCUSS] September report

2019-09-06 Thread Ryan Blue
Hi Justin, I checked the box that Iceberg is "nearing graduation", not that it is ready to graduate. I think the numbers show that we've had good community growth and we have added a PPMC member. Adding more and actually getting a release out are the points that I've listed as the unfinished steps

Re: Avoiding sort when writing 1TB+ of data

2019-09-06 Thread Ryan Blue
There are other ways to prepare the data. You just need to make sure that the data for each partition is clustered so that you don’t open more than one file per partition in a task. A global sort is usually the best way to do that, but you can use a local sort as well, by using sortWithinPartition

Avoiding sort when writing 1TB+ of data

2019-09-06 Thread Xabriel Collazo Mojica
Hi folks, We are now consistently hitting this problem: 1) We want to write a data set in the order of 1-5TB of data into a partitioned Iceberg table. 2) Iceberg, on a single write, will fail a partitioned writer if it detects the data is not sorted. This has been discussed before in this list,

Re: [DISCUSS] September report

2019-09-06 Thread Justin Mclean
Hi, I can a couple of concern about the report, first off I don't think any project would be ready to graduate if they have not made a release or added committers/PPMC members. And it would be grate to see some something other than meaningless stats on community growth in the report. Those numb