+1 from me binding ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message----- From: Henry Robinson <he...@cloudera.com> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org> Date: Tuesday, November 24, 2015 at 1:03 PM To: "general@incubator.apache.org" <general@incubator.apache.org> Subject: [VOTE] Accept Impala into the Apache Incubator >Hi - > >The [DISCUSS] thread has been quiet for a few days, so I think there's >been >sufficient opportunity for discussion around our proposal to bring Impala >to the ASF Incubator. > >I'd like to call a VOTE on that proposal, which is on the wiki at >https://wiki.apache.org/incubator/ImpalaProposal, and which I've pasted >below. > >During the discussion period, the proposal has been amended to add Brock >Noland as a new mentor, to add one missed committer from the list and to >correct some issues with the dependency list. > >Please cast your votes as follows: > >[] +1, accept Impala into the Incubator >[] +/-0, non-counted vote to express a disposition >[] -1, do not accept Impala into the Incubator (please give your >reason(s)) > >As with the concurrent Kudu vote, I propose leaving the vote open for a >full seven days (to close at Tuesday, December 1st at noon PST), due to >the >upcoming US holiday. > >Thanks, >Henry > >-------- > >= Abstract = >Impala is a high-performance C++ and Java SQL query engine for data stored >in Apache Hadoop-based clusters. > >= Proposal = > >We propose to contribute the Impala codebase and associated artifacts >(e.g. >documentation, web-site content etc.) to the Apache Software Foundation >with the intent of forming a productive, meritocratic and open community >around Impala’s continued development, according to the ‘Apache Way’. > >Cloudera owns several trademarks regarding Impala, and proposes to >transfer >ownership of those trademarks in full to the ASF. > >= Background = >Engineers at Cloudera developed Impala and released it as an >Apache-licensed open-source project in Fall 2012. Impala was written as a >brand-new, modern C++ SQL engine targeted from the start for data stored >in >Apache Hadoop clusters. > >Impala’s most important benefit to users is high-performance, making it >extremely appropriate for common enterprise analytic and business >intelligence workloads. This is achieved by a number of software >techniques, including: native support for data stored in HDFS and related >filesystems, just-in-time compilation and optimization of individual query >plans, high-performance C++ codebase and massively-parallel distributed >architecture. In benchmarks, Impala is routinely amongst the very highest >performing SQL query engines. > >= Rationale = > >Despite the exciting innovation in the so-called ‘big-data’ space, SQL >remains by far the most common interface for interacting with data in both >traditional warehouses and modern ‘big-data’ clusters. There is clearly a >need, as evidenced by the eager adoption of Impala and other SQL engines >in >enterprise contexts, for a query engine that offers the familiar SQL >interface, but that has been specifically designed to operate in massive, >distributed clusters rather than in traditional, fixed-hardware, >warehouse-specific deployments. Impala is one such query engine. > >We believe that the ASF is the right venue to foster an open-source >community around Impala’s development. We expect that Impala will benefit >from more productive collaboration with related Apache projects, and under >the auspices of the ASF will attract talented contributors who will push >Impala’s development forward at pace. > >We believe that the timing is right for Impala’s development to move >wholesale to the ASF: Impala is well-established, has been Apache-licensed >open-source for more than three years, and the core project is relatively >stable. We are excited to see where an ASF-based community can take Impala >from this strong starting point. > >= Initial Goals = >Our initial goals are as follows: > > * Establish ASF-compatible engineering practices and workflows > * Refactor and publish existing internal build scripts and test >infrastructure, in order to make them usable by any community member. > * Transfer source code, documentation and associated artifacts to the >ASF. > * Grow the user and developer communities > >= Current Status = > >Impala is developed as an Apache-licensed open-source project. The source >code is available at http://github.com/cloudera/Impala, and developer >documentation is at https://github.com/cloudera/Impala/wiki. The majority >of commits to the project have come from Cloudera-employed developers, but >we have accepted some contributions from individuals from other >organizations. > >All code reviews are done via a public instance of the Gerrit review tool >at http://gerrit.cloudera.org:8080/, and discussed on a public mailing >list. All patches must be reviewed before they are accepted into the >codebase, via a voting mechanism that is similar to that used on Apache >projects such as Hadoop and HBase. > >Before a patch is committed, it must pass a suite of pre-commit tests. >These tests are currently run on Cloudera’s internal infrastructure. One >of >our initial goals will be to work with the ASF Infrastructure team to find >a way to run these tests in an acceptable way on publicly accessible >machines. > >Issues are tracked in JIRA at https://issues.cloudera.org/projects/IMPALA, >in a way that is extremely similar to existing practices at other ASF >projects. > >= Meritocracy = > >We understand the central importance of meritocracy to the Apache Way. We >will work to establish a welcoming, fair and meritocratic community, in >part by expanding the set of committers on the project. Although Impala’s >committer list will initially be dominated by members of the Impala >engineering team at Cloudera, we look forward to growing a rich user and >developer community. > >= Community = >Impala has a strong user community (see >https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user), and a >growing developer community (see >https://groups.google.com/a/cloudera.org/forum/#!forum/impala-dev). We >wish >to attract more developers to the project, and we believe that the ASF’s >open and meritocratic philosophy will help us with this. We note the >success of other, similar projects already part of the ASF. > >= Core Developers = >Most - but not all - of Impala’s core developers are not currently >affiliated with the ASF, and will require new ICLAs. > >= Alignment = >Impala is related to several other Apache projects: > > * Data that is read by Impala is very often stored in Apache Hadoop >clusters powered by the HDFS filesystem. > * Impala can also read data stored in Apache HBase > * Metadata for databases, tables and so on is read by Impala from Apache >Hive. > * The preferred data format for HDFS-based tables is Apache Parquet, and >Apache Avro is also a supported data format. > * Impala is closely integrated with Kudu, which is also being proposed to >the Incubator. > * Impala uses Apache Thrift as its RPC and serialization framework of >choice. > >= Known Risks = > >== Orphaned Products == >Impala is used by most of Cloudera’s customers, and Cloudera remains >committed to developing and supporting the project. Cloudera has a strong >track record in standing behind projects that were contributed to the ASF >by its employees, including Apache Flume, Apache Sqoop, and others. Other >companies both ship and support Impala, lending credence to the idea that >Impala is not at risk of being suddenly orphaned. > >== Inexperience with Open Source == >Although all committers on the initial list have significant experience >with at least one open-source project - namely Impala - fewer have much >experience with ASF-based software projects as contributors and community >members. However, with the guidance of our mentors, committers who do have >ASF experience, and time to learn during Incubation, we are confident that >the project can be run in accordance with Apache principles on an ongoing >basis. > >== Homogeneous Developers == > >The initial committers are employees of Cloudera. > >The project has received some contributions from developers outside of >Cloudera, from individuals belonging to organizations such as Intel and >Google, from hobbyists and from students using Impala to advance their >understanding of distributed databases. The project attracted an active >user community as well. We hope to continue to encourage contributions >from >these developers and community members and grow them into committers after >they have had time to continue their contributions. > >== Reliance on Salaried Developers == > >Many of Impala’s initial set of committers work full-time on Impala, and >are paid to do so. However, as mentioned elsewhere, we anticipate growth >in >the developer community which we hope will include hobbyists and academics >who have an interested in distributed data systems. > >== An Excessive Fascination with the Apache Brand == >Although we hope that Impala benefits from the Apache Brand, any reflected >goodwill to Cloudera as the contributing entity is not the goal of >establishing Impala as an Apache project. We will work with the Incubator >PMC and the PRC to ensure that the Apache Brand is respected. > >= Documentation = >Impala: A Modern, Open-Source SQL Engine for Hadoop ( >http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf) > >Impala’s developer wiki (https://github.com/cloudera/Impala/wiki) > >Impala’s auto-generated API documentation ( >http://impala.io/doc/html/index.html) > >= Initial Source = >Impala’s initial source contribution will come from >http://github.com/cloudera/Impala/. > >= External Dependencies = > >Impala depends upon a number of third-party libraries, which we list >below. >We intend to compile a LICENSE.txt file in the very short term (see >https://issues.cloudera.org/browse/IMPALA-2670). > > * Google gflags (BSD) > * Google glog (BSD) > * Apache Thrift (Apache Software License v2.0) > * Apache Commons (Apache Software License v2.0) > * Apache Hadoop (Apache Software License v2.0) > * Apache HBase (Apache Software License v2.0) > * Apache Hive (Apache Software License v2.0) > * Boost (Boost Software License) > * OpenLdap (OpenLDAP Software License) > * rapidjson (MIT) > * Google RE2 (BSD-style) > * lz4 (BSD) > * snappy (BSD) > * cyrus-sasl (CMU License) > * Apache Avro (Apache Software License v2.0) > * Cloudera squeasel (Apache Software License v2.0) > * Apache htrace (Incubating) (Apache Software License v2.0) > * Apache Sentry (Incubating) (Apache Software License v2.0) > * Apache Shiro (Apache Software License v2.0) > * Twitter Bootstrap (Apache Software License v2.0) > * d3 (BSD) > * LLVM (BSD-like) > >Build and test dependencies: > > * ant (Apache Software License v2.0) > * Apache Maven (Apache Software License v2.0) > * cmake (BSD) > * clang (BSD) > * Google gtest (Apache Software License v2.0) > >= Required Resources = > >We request that following resources be created for the project to use: > >== Mailing lists == > > * priv...@impala.incubator.apache.org (moderated subscriptions) > * comm...@impala.incubator.apache.org > * d...@impala.incubator.apache.org > * iss...@impala.incubator.apache.org > * u...@impala.incubator.apache.org > >== Git repository == >https://git.apache.org/impala.git > >== JIRA instance == >JIRA project IMPALA (IMPALA or IMP) > >== Other Resources == >We hope to continue using Gerrit for our code review and commit workflow. >We are involved with discussions that the Kudu team at Cloudera have been >having with Jake Farrell to start discussions on how Gerrit can fit into >the ASF. We know that several other ASF projects or podlings are also >interested in Gerrit. > >If the Infrastructure team does not have the bandwidth to support gerrit, >we will continue to support our own instance of gerrit for Impala, and >make >the necessary integrations such that commits are properly authenticated >and >maintain sufficient provenance to uphold the ASF standards (e.g. via the >solution adopted by the AsterixDB podling). > >= Initial Committers = > > * Tim Armstrong > * Alex Behm > * Taras Bobrovytsky > * Casey Ching > * Martin Grund > * Daniel Hecht > * Michael Ho > * Matthew Jacobs > * Ishaan Joshi > * Lenni Kuff > * Marcel Kornacker > * Sailesh Mukil > * Henry Robinson > * John Russell > * Dimitris Tsirogiannis > * Skye Wanderman-Milne > * Juan Yu > >== Affiliations == >All: Cloudera Inc. > >= Sponsors = > >== Champion == >Tom White > >== Nominated Mentors == > * Tom White (Cloudera) > * Todd Lipcon (Cloudera) > * Carl Steinbach (LinkedIn) > * Brock Noland (StreamSets) > > >= Sponsoring Entity = >We ask that the Incubator PMC sponsor this proposal.