+1 (non-binding) On Nov 26, 2015 3:04 PM, "Joe Witt" <joe.w...@gmail.com> wrote:
> +1 (non-binding) > > On Wed, Nov 25, 2015 at 5:26 PM, Hitesh Shah <hit...@apache.org> wrote: > > +1 (binding) > > > > — Hitesh > > > > On Nov 24, 2015, at 11:32 AM, Todd Lipcon <t...@apache.org> wrote: > > > >> Hi all, > >> > >> Discussion on the [DISCUSS] thread seems to have wound down, so I'd > like to > >> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal > is > >> pasted below and also available on the wiki at: > >> https://wiki.apache.org/incubator/KuduProposal > >> > >> The proposal is unchanged since the original version, except for the > >> addition of Carl Steinbach as a Mentor. > >> > >> Please cast your votes: > >> > >> [] +1, accept Kudu into the Incubator > >> [] +/-0, positive/negative non-counted expression of feelings > >> [] -1, do not accept Kudu into the incubator (please state reasoning) > >> > >> Given the US holiday this week, I imagine many folks are traveling or > >> otherwise offline. So, let's run the vote for a full week rather than > the > >> traditional 72 hours. Unless the IPMC objects to the extended voting > >> period, the vote will close on Tues, Dec 1st at noon PST. > >> > >> Thanks > >> -Todd > >> ----- > >> > >> = Kudu Proposal = > >> > >> == Abstract == > >> > >> Kudu is a distributed columnar storage engine built for the Apache > Hadoop > >> ecosystem. > >> > >> == Proposal == > >> > >> Kudu is an open source storage engine for structured data which supports > >> low-latency random access together with efficient analytical access > >> patterns. Kudu distributes data using horizontal partitioning and > >> replicates each partition using Raft consensus, providing low > >> mean-time-to-recovery and low tail latencies. Kudu is designed within > the > >> context of the Apache Hadoop ecosystem and supports many integrations > with > >> other data analytics projects both inside and outside of the Apache > >> Software Foundation. > >> > >> > >> > >> We propose to incubate Kudu as a project of the Apache Software > Foundation. > >> > >> == Background == > >> > >> In recent years, explosive growth in the amount of data being generated > and > >> captured by enterprises has resulted in the rapid adoption of open > source > >> technology which is able to store massive data sets at scale and at low > >> cost. In particular, the Apache Hadoop ecosystem has become a focal > point > >> for such “big data” workloads, because many traditional open source > >> database systems have lagged in offering a scalable alternative. > >> > >> > >> > >> Structured storage in the Hadoop ecosystem has typically been achieved > in > >> two ways: for static data sets, data is typically stored on Apache HDFS > >> using binary data formats such as Apache Avro or Apache Parquet. > However, > >> neither HDFS nor these formats has any provision for updating individual > >> records, or for efficient random access. Mutable data sets are typically > >> stored in semi-structured stores such as Apache HBase or Apache > Cassandra. > >> These systems allow for low-latency record-level reads and writes, but > lag > >> far behind the static file formats in terms of sequential read > throughput > >> for applications such as SQL-based analytics or machine learning. > >> > >> > >> > >> Kudu is a new storage system designed and implemented from the ground > up to > >> fill this gap between high-throughput sequential-access storage systems > >> such as HDFS and low-latency random-access systems such as HBase or > >> Cassandra. While these existing systems continue to hold advantages in > some > >> situations, Kudu offers a “happy medium” alternative that can > dramatically > >> simplify the architecture of many common workloads. In particular, Kudu > >> offers a simple API for row-level inserts, updates, and deletes, while > >> providing table scans at throughputs similar to Parquet, a commonly-used > >> columnar format for static data. > >> > >> > >> > >> More information on Kudu can be found at the existing open source > project > >> website: http://getkudu.io and in particular in the Kudu white-paper > PDF: > >> http://getkudu.io/kudu.pdf from which the above was excerpted. > >> > >> == Rationale == > >> > >> As described above, Kudu fills an important gap in the open source > storage > >> ecosystem. After our initial open source project release in September > 2015, > >> we have seen a great amount of interest across a diverse set of users > and > >> companies. We believe that, as a storage system, it is critical to > build an > >> equally diverse set of contributors in the development community. Our > >> experiences as committers and PMC members on other Apache projects have > >> taught us the value of diverse communities in ensuring both longevity > and > >> high quality for such foundational systems. > >> > >> == Initial Goals == > >> > >> * Move the existing codebase, website, documentation, and mailing lists > to > >> Apache-hosted infrastructure > >> * Work with the infrastructure team to implement and approve our code > >> review, build, and testing workflows in the context of the ASF > >> * Incremental development and releases per Apache guidelines > >> > >> == Current Status == > >> > >> ==== Releases ==== > >> > >> Kudu has undergone one public release, tagged here > >> https://github.com/cloudera/kudu/tree/kudu0.5.0-release > >> > >> This initial release was not performed in the typical ASF fashion -- no > >> source tarball was released, but rather only convenience binaries made > >> available in Cloudera’s repositories. We will adopt the ASF source > release > >> process upon joining the incubator. > >> > >> > >> ==== Source ==== > >> > >> Kudu’s source is currently hosted on GitHub at > >> https://github.com/cloudera/kudu > >> > >> This repository will be transitioned to Apache’s git hosting during > >> incubation. > >> > >> > >> > >> ==== Code review ==== > >> > >> Kudu’s code reviews are currently public and hosted on Gerrit at > >> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu > >> > >> The Kudu developer community is very happy with gerrit and hopes to work > >> with the Apache Infrastructure team to figure out how we can continue to > >> use Gerrit within ASF policies. > >> > >> > >> > >> ==== Issue tracking ==== > >> > >> Kudu’s bug and feature tracking is hosted on JIRA at: > >> https://issues.cloudera.org/projects/KUDU/summary > >> > >> This JIRA instance contains bugs and development discussion dating back > 2 > >> years prior to Kudu’s open source release and will provide an initial > seed > >> for the ASF JIRA. > >> > >> > >> > >> ==== Community discussion ==== > >> > >> Kudu has several public discussion forums, linked here: > >> http://getkudu.io/community.html > >> > >> > >> > >> ==== Build Infrastructure ==== > >> > >> The Kudu Gerrit instance is configured to only allow patches to be > >> committed after running them through an extensive set of pre-commit > tests > >> and code lints. The project currently makes use of elastic public cloud > >> resources to perform these tests. Until this point, these resources have > >> been internal to Cloudera, though we are currently investing in moving > to a > >> publicly accessible infrastructure. > >> > >> > >> > >> ==== Development practices ==== > >> > >> Given that Kudu is a persistent storage engine, the community has a high > >> quality bar for contributions to its core. We have a firm belief that > high > >> quality is achieved through automation, not manual inspection, and hence > >> put a focus on thorough testing and build infrastructure to ensure that > >> bar. The development community also practices review-then-commit for all > >> changes to ensure that changes are accompanied by appropriate tests, are > >> well commented, etc. > >> > >> Rather than seeing these practices as barriers to contribution, we > believe > >> that a fully automated and standardized review and testing practice > makes > >> it easier for new contributors to have patches accepted. Any new > developer > >> may post a patch to Gerrit using the same workflow as a seasoned > >> contributor, and the same suite of tests will be automatically run. If > the > >> tests pass, a committer can quickly review and commit the contribution > from > >> their web browser. > >> > >> === Meritocracy === > >> > >> We believe strongly in meritocracy in electing committers and PMC > members. > >> We believe that contributions can come in forms other than just code: > for > >> example, one of our initial proposed committers has contributed solely > in > >> the area of project documentation. We will encourage contributions and > >> participation of all types, and ensure that contributors are > appropriately > >> recognized. > >> > >> === Community === > >> > >> Though Kudu is relatively new as an open source project, it has already > >> seen promising growth in its community across several organizations: > >> > >> * '''Cloudera''' is the original development sponsor for Kudu. > >> * '''Xiaomi''' has been helping to develop and optimize Kudu for a new > >> production use case, contributing code, benchmarks, feedback, and > >> conference talks. > >> * '''Intel''' has contributed optimizations related to their hardware > >> technologies. > >> * '''Dropbox''' has been experimenting with Kudu for a machine > monitoring > >> use case, and has been contributing bug reports and product feedback. > >> * '''Dremio''' is working on integration with Apache Drill and exploring > >> using Kudu in a production use case. > >> * Several community-built Docker images, tutorials, and blog posts have > >> sprouted up since Kudu’s release. > >> > >> > >> > >> By bringing Kudu to Apache, we hope to encourage further contribution > from > >> the above organizations as well as to engage new users and contributors > in > >> the community. > >> > >> === Core Developers === > >> > >> Kudu was initially developed as a project at Cloudera. Most of the > >> contributions to date have been by developers employed by Cloudera. > >> > >> > >> > >> Many of the developers are committers or PMC members on other Apache > >> projects. > >> > >> === Alignment === > >> > >> As a project in the big data ecosystem, Kudu is aligned with several > other > >> ASF projects. Kudu includes input/output format integration with Apache > >> Hadoop, and this integration can also provide a bridge to Apache Spark. > We > >> are planning to integrate with Apache Hive in the near future. We also > >> integrate closely with Cloudera Impala, which is also currently being > >> proposed for incubation. We have also scheduled a hackathon with the > Apache > >> Drill team to work on integration with that query engine. > >> > >> == Known Risks == > >> > >> === Orphaned Products === > >> > >> The risk of Kudu being abandoned is low. Cloudera has invested a great > deal > >> in the initial development of the project, and intends to grow its > >> investment over time as Kudu becomes a product adopted by its customer > >> base. Several other organizations are also experimenting with Kudu for > >> production use cases which would live for many years. > >> > >> === Inexperience with Open Source === > >> > >> Kudu has been released in the open for less than two months. However, > from > >> our very first public announcement we have been committed to open-source > >> style development: > >> > >> * our code reviews are fully public and documented on a mailing list > >> * our daily development chatter is in a public chat room > >> * we send out weekly “community status” reports highlighting news and > >> contributions > >> * we published our entire JIRA history and discuss bugs in the open > >> * we published our entire Git commit history, going back three years (no > >> squashing) > >> > >> > >> > >> Several of the initial committers are experienced open source > developers, > >> several being committers and/or PMC members on other ASF projects > (Hadoop, > >> HBase, Thrift, Flume, et al). Those who are not ASF committers have > >> experience on non-ASF open source projects (Kiji, open-vm-tools, et al). > >> > >> === Homogenous Developers === > >> > >> The initial committers are employees or former employees of Cloudera. > >> However, the committers are spread across multiple offices (Palo Alto, > San > >> Francisco, Melbourne), so the team is familiar with working in a > >> distributed environment across varied time zones. > >> > >> > >> > >> The project has received some contributions from developers outside of > >> Cloudera, and is starting to attract a ''user'' community as well. We > hope > >> to continue to encourage contributions from these developers and > community > >> members and grow them into committers after they have had time to > continue > >> their contributions. > >> > >> === Reliance on Salaried Developers === > >> > >> As mentioned above, the majority of development up to this point has > been > >> sponsored by Cloudera. We have seen several community users participate > in > >> discussions who are hobbyists interested in distributed systems and > >> databases, and hope that they will continue their participation in the > >> project going forward. > >> > >> === Relationships with Other Apache Products === > >> > >> Kudu is currently related to the following other Apache projects: > >> > >> * Hadoop: Kudu provides MapReduce input/output formats for integration > >> * Spark: Kudu integrates with Spark via the above-mentioned input > formats, > >> and work is progressing on support for Spark Data Frames and Spark SQL. > >> > >> > >> > >> The Kudu team has reached out to several other Apache projects to start > >> discussing integrations, including Flume, Kafka, Hive, and Drill. > >> > >> > >> > >> Kudu integrates with Impala, which is also being proposed for > incubation. > >> > >> > >> > >> Kudu is already collaborating on ValueVector, a proposed TLP spinning > out > >> from the Apache Drill community. > >> > >> > >> > >> We look forward to continuing to integrate and collaborate with these > >> communities. > >> > >> === An Excessive Fascination with the Apache Brand === > >> > >> Many of the initial committers are already experienced Apache > committers, > >> and understand the true value provided by the Apache Way and the > principles > >> of the ASF. We believe that this development and contribution model is > >> especially appropriate for storage products, where Apache’s > >> community-over-code philosophy ensures long term viability and > >> consensus-based participation. > >> > >> == Documentation == > >> > >> * Documentation is written in AsciiDoc and committed in the Kudu source > >> repository: > >> > >> * https://github.com/cloudera/kudu/tree/master/docs > >> > >> > >> > >> * The Kudu web site is version-controlled on the ‘gh-pages’ branch of > the > >> above repository. > >> > >> * A LaTeX whitepaper is also published, and the source is available > within > >> the same repository. > >> * APIs are documented within the source code as JavaDoc or C++-style > >> documentation comments. > >> * Many design documents are stored within the source code repository as > >> text files next to the code being documented. > >> > >> == Source and Intellectual Property Submission Plan == > >> > >> The Kudu codebase and web site is currently hosted on GitHub and will be > >> transitioned to the ASF repositories during incubation. Kudu is already > >> licensed under the Apache 2.0 license. > >> > >> > >> > >> Some portions of the code are imported from other open source projects > >> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by > authors > >> other than the initial committers. These copyright notices are > maintained > >> in those files as well as a top-level NOTICE.txt file. We believe this > to > >> be permissible under the license terms and ASF policies, and confirmed > via > >> a recent thread on general@incubator.apache.org . > >> > >> > >> > >> The “Kudu” name is not a registered trademark, though before the initial > >> release of the project, we performed a trademark search and Cloudera’s > >> legal counsel deemed it acceptable in the context of a data storage > engine. > >> There exists an unrelated open source project by the same name related > to > >> deployments on Microsoft’s Azure cloud service. We have been in contact > >> with legal counsel from Microsoft and have obtained their approval for > the > >> use of the Kudu name. > >> > >> > >> > >> Cloudera currently owns several domain names related to Kudu ( > getkudu.io, > >> kududb.io, et al) which will be transferred to the ASF and redirected > to > >> the official page during incubation. > >> > >> > >> > >> Portions of Kudu are protected by pending or published patents owned by > >> Cloudera. Given the protections already granted by the Apache License, > we > >> do not anticipate any explicit licensing or transfer of this > intellectual > >> property. > >> > >> == External Dependencies == > >> > >> The full set of dependencies and licenses are listed in > >> https://github.com/cloudera/kudu/blob/master/LICENSE.txt > >> > >> and summarized here: > >> > >> * '''Twitter Bootstrap''': Apache 2.0 > >> * '''d3''': BSD 3-clause > >> * '''epoch JS library''': MIT > >> * '''lz4''': BSD 2-clause > >> * '''gflags''': BSD 3-clause > >> * '''glog''': BSD 3-clause > >> * '''gperftools''': BSD 3-clause > >> * '''libev''': BSD 2-clause > >> * '''squeasel''':MIT license > >> * '''protobuf''': BSD 3-clause > >> * '''rapidjson''': MIT > >> * '''snappy''': BSD 3-clause > >> * '''trace-viewer''': BSD 3-clause > >> * '''zlib''': zlib license > >> * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike) > >> * '''bitshuffle''': MIT > >> * '''boost''': Boost license > >> * '''curl''': MIT > >> * '''libunwind''': MIT > >> * '''nvml''': BSD 3-clause > >> * '''cyrus-sasl''': Cyrus SASL license (BSD-alike) > >> * '''openssl''': OpenSSL License (BSD-alike) > >> > >> * '''Guava''': Apache 2.0 > >> * '''StumbleUpon Async''': BSD > >> * '''Apache Hadoop''': Apache 2.0 > >> * '''Apache log4j''': Apache 2.0 > >> * '''Netty''': Apache 2.0 > >> * '''slf4j''': MIT > >> * '''Apache Commons''': Apache 2.0 > >> * '''murmur''': Apache 2.0 > >> > >> > >> '''Build/test-only dependencies''': > >> > >> * '''CMake''': BSD 3-clause > >> * '''gcovr''': BSD 3-clause > >> * '''gmock''': BSD 3-clause > >> * '''Apache Maven''': Apache 2.0 > >> * '''JUnit''': EPL > >> * '''Mockito''': MIT > >> > >> == Cryptography == > >> > >> Kudu does not currently include any cryptography-related code. > >> > >> == Required Resources == > >> > >> === Mailing lists === > >> > >> * priv...@kudu.incubator.apache.org (PMC) > >> * comm...@kudu.incubator.apache.org (git push emails) > >> * iss...@kudu.incubator.apache.org (JIRA issue feed) > >> * d...@kudu.incubator.apache.org (Gerrit code reviews plus dev > discussion) > >> * u...@kudu.incubator.apache.org (User questions) > >> > >> > >> === Repository === > >> > >> * git://git.apache.org/kudu > >> > >> === Gerrit === > >> > >> We hope to continue using Gerrit for our code review and commit > workflow. > >> The Kudu team has already been in contact with Jake Farrell to start > >> discussions on how Gerrit can fit into the ASF. We know that several > other > >> ASF projects and podlings are also interested in Gerrit. > >> > >> > >> > >> If the Infrastructure team does not have the bandwidth to support > Gerrit, > >> we will continue to support our own instance of Gerrit for Kudu, and > make > >> the necessary integrations such that commits are properly authenticated > and > >> maintain sufficient provenance to uphold the ASF standards (e.g. via the > >> solution adopted by the AsterixDB podling). > >> > >> == Issue Tracking == > >> > >> We would like to import our current JIRA project into the ASF JIRA, such > >> that our historical commit messages and code comments continue to > reference > >> the appropriate bug numbers. > >> > >> == Initial Committers == > >> > >> * Adar Dembo a...@cloudera.com > >> * Alex Feinberg a...@strlen.net > >> * Andrew Wang w...@apache.org > >> * Dan Burkert d...@cloudera.com > >> * David Alves dral...@apache.org > >> * Jean-Daniel Cryans jdcry...@apache.org > >> * Mike Percy mpe...@apache.org > >> * Misty Stanley-Jones mi...@apache.org > >> * Todd Lipcon t...@apache.org > >> > >> The initial list of committers was seeded by listing those contributors > who > >> have contributed 20 or more patches in the last 12 months, indicating > that > >> they are active and have achieved merit through participation on the > >> project. We chose not to include other contributors who either have not > yet > >> contributed a significant number of patches, or whose contributions are > far > >> in the past and we don’t expect to be active within the ASF. > >> > >> == Affiliations == > >> > >> * Adar Dembo - Cloudera > >> * Alex Feinberg - Forward Networks > >> * Andrew Wang - Cloudera > >> * Dan Burkert - Cloudera > >> * David Alves - Cloudera > >> * Jean-Daniel Cryans - Cloudera > >> * Mike Percy - Cloudera > >> * Misty Stanley-Jones - Cloudera > >> * Todd Lipcon - Cloudera > >> > >> == Sponsors == > >> > >> === Champion === > >> > >> * Todd Lipcon > >> > >> === Nominated Mentors === > >> > >> * Jake Farrell - ASF Member and Infra team member, Acquia > >> * Brock Noland - ASF Member, StreamSets > >> * Michael Stack - ASF Member, Cloudera > >> * Jarek Jarcec Cecho - ASF Member, Cloudera > >> * Chris Mattmann - ASF Member, NASA JPL and USC > >> * Julien Le Dem - Incubator PMC, Dremio > >> * Carl Steinbach - ASF Member, LinkedIn > >> > >> === Sponsoring Entity === > >> > >> The Apache Incubator > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >