+1 (binding) Julian
> On Nov 13, 2018, at 9:28 AM, Arthur Wiedmer <art...@apache.org> wrote: > > +1 > > (Non-binding) > > Best, > Arthur > > On Tue, Nov 13, 2018, 09:24 Hugo Louro <hmclo...@gmail.com wrote: > >> +1 (non-binding) >> >>> On Nov 13, 2018, at 9:19 AM, Owen O'Malley <owen.omal...@gmail.com> >> wrote: >>> >>> +1 (binding) >>> >>>> On Tue, Nov 13, 2018 at 12:12 PM Dave Fisher <dave2w...@comcast.net> >> wrote: >>>> >>>> +1 (binding) >>>> >>>>> On Nov 13, 2018, at 9:10 AM, Matt Sicker <boa...@gmail.com> wrote: >>>>> >>>>> +1 binding >>>>> >>>>>> On Tue, 13 Nov 2018 at 11:09, Ryan Blue <b...@apache.org> wrote: >>>>>> >>>>>> +1 (binding) >>>>>> >>>>>>> On Tue, Nov 13, 2018 at 9:06 AM Ryan Blue <b...@apache.org> wrote: >>>>>>> >>>>>>> The discuss thread seems to have reached consensus, so I propose >>>>>> accepting >>>>>>> the Iceberg project for incubation. >>>>>>> >>>>>>> The proposal is copied below and in the wiki: >>>>>>> https://wiki.apache.org/incubator/IcebergProposal >>>>>>> >>>>>>> Please vote on whether to accept Iceberg in the next 72 hours: >>>>>>> >>>>>>> [ ] +1, accept Iceberg for incubation >>>>>>> [ ] -1, reject the Iceberg proposal because . . . >>>>>>> >>>>>>> Thank you for reviewing the proposal and voting, >>>>>>> >>>>>>> rb >>>>>>> ------------------------------ >>>>>>> Iceberg Proposal Abstract >>>>>>> >>>>>>> Iceberg is a table format for large, slow-moving tabular data. >>>>>>> >>>>>>> It is designed to improve on the de-facto standard table layout built >>>>>> into >>>>>>> Apache Hive, Presto, and Apache Spark. >>>>>>> Proposal >>>>>>> >>>>>>> The purpose of Iceberg is to provide SQL-like tables that are backed >> by >>>>>>> large sets of data files. Iceberg is similar to the Hive table >> layout, >>>>>> the >>>>>>> de-facto standard structure used to track files in a table, but >>>> provides >>>>>>> additional guarantees and performance optimizations: >>>>>>> >>>>>>> - Atomicity - Each change to the table is will be complete or will >>>>>>> fail. “Do or do not. There is no try.” >>>>>>> - Snapshot isolation - Reads use one and only one snapshot of a >> table >>>>>>> at some time without holding a lock. >>>>>>> - Safe schema evolution - A table’s schema can change in >> well-defined >>>>>>> ways, without breaking older data files. >>>>>>> - Column projection - An engine may request a subset of the >> available >>>>>>> columns, including nested fields. >>>>>>> - Predicate pushdown - An engine can push filters into read planning >>>>>>> to improve performance using partition data and file-level >>>> statistics. >>>>>>> >>>>>>> Iceberg does NOT define a new file format. All data is stored in >> Apache >>>>>>> Avro, Apache ORC, or Apache Parquet files. >>>>>>> >>>>>>> Additionally, Iceberg is designed to work well when data files are >>>> stored >>>>>>> in cloud blob stores, even when those systems provide weaker >> guarantees >>>>>>> than a file system, including: >>>>>>> >>>>>>> - Eventual consistency in the namespace >>>>>>> - High latency for directory listings >>>>>>> - No renames of objects >>>>>>> - No folder hierarchy >>>>>>> >>>>>>> Rationale >>>>>>> >>>>>>> Initial benchmarks show dramatic improvements in query planning. For >>>>>>> example, in Netflix’s Atlas use case, which stores time-series >> metrics >>>>>> from >>>>>>> Netflix runtime systems and 1 month is stored across 2.7 million >> files >>>> in >>>>>>> 2,688 partitions: >>>>>>> >>>>>>> - Hive table using Parquet: >>>>>>> - 400k+ splits, not combined >>>>>>> - Explain query: 9.6 minutes wall time (planning only) >>>>>>> - Iceberg table with partition filtering: >>>>>>> - 15,218 splits, combined >>>>>>> - Planning: 10 seconds >>>>>>> - Query wall time: 13 minutes >>>>>>> - Iceberg table with partition and min/max filtering: >>>>>>> - 412 splits >>>>>>> - Planning: 25 seconds >>>>>>> - Query wall time: 42 seconds >>>>>>> >>>>>>> These performance gains combined with the cross-engine compatibility >>>> are >>>>>> a >>>>>>> very compelling story. >>>>>>> Initial Goals >>>>>>> >>>>>>> The initial goal will be to move the existing codebase to Apache and >>>>>>> integrate with the Apache development process and infrastructure. A >>>>>> primary >>>>>>> goal of incubation will be to grow and diversify the Iceberg >> community. >>>>>> We >>>>>>> are well aware that the project community is largely comprised of >>>>>>> individuals from a single company. We aim to change that during >>>>>> incubation. >>>>>>> Current Status >>>>>>> >>>>>>> As previously mentioned, Iceberg is under active development at >>>> Netflix, >>>>>>> and is being used in processing large volumes of data in Amazon EC2. >>>>>>> >>>>>>> Iceberg license documentation is already based on Apache guidelines >> for >>>>>>> LICENSE and NOTICE content. >>>>>>> Meritocracy >>>>>>> >>>>>>> We value meritocracy and we understand that it is the basis for an >> open >>>>>>> community that encourages multiple companies and individuals to >>>>>> contribute >>>>>>> and be invested in the project’s future. We will encourage and >> monitor >>>>>>> participation and make sure to extend privileges and responsibilities >>>> to >>>>>>> all contributors. >>>>>>> Community >>>>>>> >>>>>>> Iceberg is currently being used by developers at Netflix and a >> growing >>>>>>> number of users are actively using it in production environments. >>>> Iceberg >>>>>>> has received contributions from developers working at Hortonworks, >>>>>> WeWork, >>>>>>> and Palantir. By bringing Iceberg to Apache we aim to assure current >>>> and >>>>>>> future contributors that the Iceberg community is meritocratic and >>>> open, >>>>>> in >>>>>>> order to broaden and diversity the user and developer community. >>>>>>> Core Developers >>>>>>> >>>>>>> Iceberg was initially developed at Netflix and is under active >>>>>>> development. We believe Netflix will be of interest to a broad range >> of >>>>>>> users and developers and that incubating the project at the ASF will >>>> help >>>>>>> us build a diverse, sustainable community. >>>>>>> Alignment >>>>>>> >>>>>>> Iceberg utilizes other Apache projects such as Avro, Hadoop, Hive, >> ORC, >>>>>>> Parquet, Pig, and Spark. We anticipate integration with additional >>>> Apache >>>>>>> projects as the Iceberg community and interest in the project grows. >>>>>>> Known Risks Orphaned Products >>>>>>> >>>>>>> Netflix is committed to the future development of Iceberg and >>>> understands >>>>>>> that graduation to a TLP, while preferable, is not the only positive >>>>>>> outcome of incubation. >>>>>>> >>>>>>> Should the Iceberg project be accepted by the Incubator, the >>>> prospective >>>>>>> PPMC would be willing to agree to a target incubation period of 2 >> years >>>>>> or >>>>>>> less, knowing that every Incubator project incurs a certain cost in >>>> terms >>>>>>> of ASF infrastructure and volunteer time. >>>>>>> Inexperience with Open Source >>>>>>> >>>>>>> Three of the initial committers are Apache members and Incubator PMC >>>>>>> members. They will work with the other community members to teach >> them >>>>>> the >>>>>>> Apache Way. >>>>>>> Homogenous Developers >>>>>>> >>>>>>> The majority of the committers work at Netflix, though we are >> committed >>>>>> to >>>>>>> recruiting and developing additional committers from a wide spectrum >> of >>>>>>> industries and backgrounds. >>>>>>> Reliance on Salaried Developers >>>>>>> >>>>>>> It is expected that Iceberg development will occur on both salaried >>>> time >>>>>>> and on volunteer time, after hours. Most of the initial committers >> are >>>>>> paid >>>>>>> by Netflix to contribute to this project. However, they are all >>>>>> passionate >>>>>>> about the project, and we are both confident and hopeful that the >>>> project >>>>>>> will continue even if no salaried developers contribute to the >> project. >>>>>>> Relationships with Other Apache Products >>>>>>> >>>>>>> As mentioned in the Rationale section, Iceberg utilizes a number of >>>>>>> existing Apache projects (Avro, Hadoop, Hive, ORC, Parquet, Pig, & >>>>>> Spark), >>>>>>> and we expect that list to expand as the community grows and >>>> diversifies. >>>>>>> Any Apache project in the big data space that needs to store or >> process >>>>>>> tabular data would be potentially relevant. >>>>>>> An Excessive Fascination with the Apache Brand >>>>>>> >>>>>>> We are applying to the Incubator process because we think it is the >>>> next >>>>>>> logical step for the Iceberg project after open-sourcing the code. >> This >>>>>>> proposal is not for the purpose of generating publicity. Rather, we >>>> want >>>>>> to >>>>>>> make sure to create a very inclusive and meritocratic community, >>>> outside >>>>>>> the umbrella of a single company. Netflix has a long history of >>>>>>> contributing to Apache projects and the Iceberg developers and >>>>>> contributors >>>>>>> understand the implication of making it an Apache project. >>>>>>> Required Resources Mailing lists >>>>>>> >>>>>>> - d...@iceberg.incubator.apache.org >>>>>>> - comm...@iceberg.incubator.apache.org >>>>>>> - priv...@iceberg.incubator.apache.org >>>>>>> >>>>>>> The podling may also create a user mailing list, if needed. >>>>>>> Source Control and Issue Tracking >>>>>>> >>>>>>> The Iceberg podling would use Apache’s gitbox integration to sync >>>> between >>>>>>> github and Apache infrastructure. The podling would use github issues >>>> and >>>>>>> pull requests for community engagement. >>>>>>> Current Resources >>>>>>> >>>>>>> - Initial source: https://github.com/Netflix/iceberg >>>>>>> - Java documentation: >>>>>>> >>>>>> >>>> >> https://netflix.github.io/iceberg/current/javadoc/index.html?com/netflix/iceberg/package-summary.html >>>>>>> - Table specification: >>>>>>> >>>>>> >>>> >> https://docs.google.com/document/d/1Q-zL5lSCle6NEEdyfiYsXYzX_Q8Qf0ctMyGBKslOswA/edit >>>>>>> >>>>>>> Source and Intellectual Property Submission Plan >>>>>>> >>>>>>> The Iceberg source code in Github is currently licensed under Apache >>>>>>> License v2.0 and the copyright is assigned to Netflix. If Iceberg >>>> becomes >>>>>>> an Incubator project at the ASF, Netflix will transfer the source >> code >>>>>> and >>>>>>> trademark ownership to the Apache Software Foundation via a Software >>>>>> Grant >>>>>>> Agreement. >>>>>>> External Dependencies >>>>>>> >>>>>>> External dependencies licensed under Apache License 2.0 >>>>>>> >>>>>>> - Guava https://github.com/google/guava >>>>>>> - Jackson https://github.com/FasterXML/jackson-core >>>>>>> - Joda-Time http://www.joda.org/joda-time/ >>>>>>> >>>>>>> External dependencies licensed under the MIT License >>>>>>> >>>>>>> - SLF4J https://www.slf4j.org/ >>>>>>> - Mockito https://github.com/mockito/mockito >>>>>>> >>>>>>> ASF Projects >>>>>>> >>>>>>> - Apache Avro >>>>>>> - Apache Hadoop >>>>>>> - Apache Hive >>>>>>> - Apache ORC >>>>>>> - Apache Parquet >>>>>>> - Apache Pig >>>>>>> - Apache Spark >>>>>>> >>>>>>> Cryptography >>>>>>> >>>>>>> We do not expect Iceberg to be a controlled export item due to the >> use >>>> of >>>>>>> encryption. >>>>>>> Initial Committers and Affiliations >>>>>>> >>>>>>> - Ryan Blue b...@apache.org (Netflix) >>>>>>> - Parth Brahmbhatt pa...@apache.org (Netflix) >>>>>>> - Julien Le Dem jul...@apache.org (WeWork) >>>>>>> - Owen O’Malley omal...@apache.org (Hortonworks) >>>>>>> - Daniel Weeks dwe...@apache.org (Netflix) >>>>>>> >>>>>>> Sponsors and Nominated Mentors >>>>>>> >>>>>>> - Champion and mentor: Owen O’Malley omal...@apache.org >>>>>>> - Mentor: Ryan Blue b...@apache.org >>>>>>> - Mentor: Julien Le Dem jul...@apache.org >>>>>>> >>>>>>> Sponsoring Entity >>>>>>> >>>>>>> The Apache Incubator >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> >>>>> >>>>> >>>>> -- >>>>> Matt Sicker <boa...@gmail.com> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>>> For additional commands, e-mail: general-h...@incubator.apache.org >>>> >>>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org