+1 (non-binding)
On Nov 26, 2015 3:04 PM, "Joe Witt" <joe.w...@gmail.com> wrote:

> +1 (non-binding)
>
> On Wed, Nov 25, 2015 at 5:26 PM, Hitesh Shah <hit...@apache.org> wrote:
> > +1 (binding)
> >
> > — Hitesh
> >
> > On Nov 24, 2015, at 11:32 AM, Todd Lipcon <t...@apache.org> wrote:
> >
> >> Hi all,
> >>
> >> Discussion on the [DISCUSS] thread seems to have wound down, so I'd
> like to
> >> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal
> is
> >> pasted below and also available on the wiki at:
> >> https://wiki.apache.org/incubator/KuduProposal
> >>
> >> The proposal is unchanged since the original version, except for the
> >> addition of Carl Steinbach as a Mentor.
> >>
> >> Please cast your votes:
> >>
> >> [] +1, accept Kudu into the Incubator
> >> [] +/-0, positive/negative non-counted expression of feelings
> >> [] -1, do not accept Kudu into the incubator (please state reasoning)
> >>
> >> Given the US holiday this week, I imagine many folks are traveling or
> >> otherwise offline. So, let's run the vote for a full week rather than
> the
> >> traditional 72 hours. Unless the IPMC objects to the extended voting
> >> period, the vote will close on Tues, Dec 1st at noon PST.
> >>
> >> Thanks
> >> -Todd
> >> -----
> >>
> >> = Kudu Proposal =
> >>
> >> == Abstract ==
> >>
> >> Kudu is a distributed columnar storage engine built for the Apache
> Hadoop
> >> ecosystem.
> >>
> >> == Proposal ==
> >>
> >> Kudu is an open source storage engine for structured data which supports
> >> low-latency random access together with efficient analytical access
> >> patterns. Kudu distributes data using horizontal partitioning and
> >> replicates each partition using Raft consensus, providing low
> >> mean-time-to-recovery and low tail latencies. Kudu is designed within
> the
> >> context of the Apache Hadoop ecosystem and supports many integrations
> with
> >> other data analytics projects both inside and outside of the Apache
> >> Software Foundation.
> >>
> >>
> >>
> >> We propose to incubate Kudu as a project of the Apache Software
> Foundation.
> >>
> >> == Background ==
> >>
> >> In recent years, explosive growth in the amount of data being generated
> and
> >> captured by enterprises has resulted in the rapid adoption of open
> source
> >> technology which is able to store massive data sets at scale and at low
> >> cost. In particular, the Apache Hadoop ecosystem has become a focal
> point
> >> for such “big data” workloads, because many traditional open source
> >> database systems have lagged in offering a scalable alternative.
> >>
> >>
> >>
> >> Structured storage in the Hadoop ecosystem has typically been achieved
> in
> >> two ways: for static data sets, data is typically stored on Apache HDFS
> >> using binary data formats such as Apache Avro or Apache Parquet.
> However,
> >> neither HDFS nor these formats has any provision for updating individual
> >> records, or for efficient random access. Mutable data sets are typically
> >> stored in semi-structured stores such as Apache HBase or Apache
> Cassandra.
> >> These systems allow for low-latency record-level reads and writes, but
> lag
> >> far behind the static file formats in terms of sequential read
> throughput
> >> for applications such as SQL-based analytics or machine learning.
> >>
> >>
> >>
> >> Kudu is a new storage system designed and implemented from the ground
> up to
> >> fill this gap between high-throughput sequential-access storage systems
> >> such as HDFS and low-latency random-access systems such as HBase or
> >> Cassandra. While these existing systems continue to hold advantages in
> some
> >> situations, Kudu offers a “happy medium” alternative that can
> dramatically
> >> simplify the architecture of many common workloads. In particular, Kudu
> >> offers a simple API for row-level inserts, updates, and deletes, while
> >> providing table scans at throughputs similar to Parquet, a commonly-used
> >> columnar format for static data.
> >>
> >>
> >>
> >> More information on Kudu can be found at the existing open source
> project
> >> website: http://getkudu.io and in particular in the Kudu white-paper
> PDF:
> >> http://getkudu.io/kudu.pdf from which the above was excerpted.
> >>
> >> == Rationale ==
> >>
> >> As described above, Kudu fills an important gap in the open source
> storage
> >> ecosystem. After our initial open source project release in September
> 2015,
> >> we have seen a great amount of interest across a diverse set of users
> and
> >> companies. We believe that, as a storage system, it is critical to
> build an
> >> equally diverse set of contributors in the development community. Our
> >> experiences as committers and PMC members on other Apache projects have
> >> taught us the value of diverse communities in ensuring both longevity
> and
> >> high quality for such foundational systems.
> >>
> >> == Initial Goals ==
> >>
> >> * Move the existing codebase, website, documentation, and mailing lists
> to
> >> Apache-hosted infrastructure
> >> * Work with the infrastructure team to implement and approve our code
> >> review, build, and testing workflows in the context of the ASF
> >> * Incremental development and releases per Apache guidelines
> >>
> >> == Current Status ==
> >>
> >> ==== Releases ====
> >>
> >> Kudu has undergone one public release, tagged here
> >> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
> >>
> >> This initial release was not performed in the typical ASF fashion -- no
> >> source tarball was released, but rather only convenience binaries made
> >> available in Cloudera’s repositories. We will adopt the ASF source
> release
> >> process upon joining the incubator.
> >>
> >>
> >> ==== Source ====
> >>
> >> Kudu’s source is currently hosted on GitHub at
> >> https://github.com/cloudera/kudu
> >>
> >> This repository will be transitioned to Apache’s git hosting during
> >> incubation.
> >>
> >>
> >>
> >> ==== Code review ====
> >>
> >> Kudu’s code reviews are currently public and hosted on Gerrit at
> >> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu
> >>
> >> The Kudu developer community is very happy with gerrit and hopes to work
> >> with the Apache Infrastructure team to figure out how we can continue to
> >> use Gerrit within ASF policies.
> >>
> >>
> >>
> >> ==== Issue tracking ====
> >>
> >> Kudu’s bug and feature tracking is hosted on JIRA at:
> >> https://issues.cloudera.org/projects/KUDU/summary
> >>
> >> This JIRA instance contains bugs and development discussion dating back
> 2
> >> years prior to Kudu’s open source release and will provide an initial
> seed
> >> for the ASF JIRA.
> >>
> >>
> >>
> >> ==== Community discussion ====
> >>
> >> Kudu has several public discussion forums, linked here:
> >> http://getkudu.io/community.html
> >>
> >>
> >>
> >> ==== Build Infrastructure ====
> >>
> >> The Kudu Gerrit instance is configured to only allow patches to be
> >> committed after running them through an extensive set of pre-commit
> tests
> >> and code lints. The project currently makes use of elastic public cloud
> >> resources to perform these tests. Until this point, these resources have
> >> been internal to Cloudera, though we are currently investing in moving
> to a
> >> publicly accessible infrastructure.
> >>
> >>
> >>
> >> ==== Development practices ====
> >>
> >> Given that Kudu is a persistent storage engine, the community has a high
> >> quality bar for contributions to its core. We have a firm belief that
> high
> >> quality is achieved through automation, not manual inspection, and hence
> >> put a focus on thorough testing and build infrastructure to ensure that
> >> bar. The development community also practices review-then-commit for all
> >> changes to ensure that changes are accompanied by appropriate tests, are
> >> well commented, etc.
> >>
> >> Rather than seeing these practices as barriers to contribution, we
> believe
> >> that a fully automated and standardized review and testing practice
> makes
> >> it easier for new contributors to have patches accepted. Any new
> developer
> >> may post a patch to Gerrit using the same workflow as a seasoned
> >> contributor, and the same suite of tests will be automatically run. If
> the
> >> tests pass, a committer can quickly review and commit the contribution
> from
> >> their web browser.
> >>
> >> === Meritocracy ===
> >>
> >> We believe strongly in meritocracy in electing committers and PMC
> members.
> >> We believe that contributions can come in forms other than just code:
> for
> >> example, one of our initial proposed committers has contributed solely
> in
> >> the area of project documentation. We will encourage contributions and
> >> participation of all types, and ensure that contributors are
> appropriately
> >> recognized.
> >>
> >> === Community ===
> >>
> >> Though Kudu is relatively new as an open source project, it has already
> >> seen promising growth in its community across several organizations:
> >>
> >> * '''Cloudera''' is the original development sponsor for Kudu.
> >> * '''Xiaomi''' has been helping to develop and optimize Kudu for a new
> >> production use case, contributing code, benchmarks, feedback, and
> >> conference talks.
> >> * '''Intel''' has contributed optimizations related to their hardware
> >> technologies.
> >> * '''Dropbox''' has been experimenting with Kudu for a machine
> monitoring
> >> use case, and has been contributing bug reports and product feedback.
> >> * '''Dremio''' is working on integration with Apache Drill and exploring
> >> using Kudu in a production use case.
> >> * Several community-built Docker images, tutorials, and blog posts have
> >> sprouted up since Kudu’s release.
> >>
> >>
> >>
> >> By bringing Kudu to Apache, we hope to encourage further contribution
> from
> >> the above organizations as well as to engage new users and contributors
> in
> >> the community.
> >>
> >> === Core Developers ===
> >>
> >> Kudu was initially developed as a project at Cloudera. Most of the
> >> contributions to date have been by developers employed by Cloudera.
> >>
> >>
> >>
> >> Many of the developers are committers or PMC members on other Apache
> >> projects.
> >>
> >> === Alignment ===
> >>
> >> As a project in the big data ecosystem, Kudu is aligned with several
> other
> >> ASF projects. Kudu includes input/output format integration with Apache
> >> Hadoop, and this integration can also provide a bridge to Apache Spark.
> We
> >> are planning to integrate with Apache Hive in the near future. We also
> >> integrate closely with Cloudera Impala, which is also currently being
> >> proposed for incubation. We have also scheduled a hackathon with the
> Apache
> >> Drill team to work on integration with that query engine.
> >>
> >> == Known Risks ==
> >>
> >> === Orphaned Products ===
> >>
> >> The risk of Kudu being abandoned is low. Cloudera has invested a great
> deal
> >> in the initial development of the project, and intends to grow its
> >> investment over time as Kudu becomes a product adopted by its customer
> >> base. Several other organizations are also experimenting with Kudu for
> >> production use cases which would live for many years.
> >>
> >> === Inexperience with Open Source ===
> >>
> >> Kudu has been released in the open for less than two months. However,
> from
> >> our very first public announcement we have been committed to open-source
> >> style development:
> >>
> >> * our code reviews are fully public and documented on a mailing list
> >> * our daily development chatter is in a public chat room
> >> * we send out weekly “community status” reports highlighting news and
> >> contributions
> >> * we published our entire JIRA history and discuss bugs in the open
> >> * we published our entire Git commit history, going back three years (no
> >> squashing)
> >>
> >>
> >>
> >> Several of the initial committers are experienced open source
> developers,
> >> several being committers and/or PMC members on other ASF projects
> (Hadoop,
> >> HBase, Thrift, Flume, et al). Those who are not ASF committers have
> >> experience on non-ASF open source projects (Kiji, open-vm-tools, et al).
> >>
> >> === Homogenous Developers ===
> >>
> >> The initial committers are employees or former employees of Cloudera.
> >> However, the committers are spread across multiple offices (Palo Alto,
> San
> >> Francisco, Melbourne), so the team is familiar with working in a
> >> distributed environment across varied time zones.
> >>
> >>
> >>
> >> The project has received some contributions from developers outside of
> >> Cloudera, and is starting to attract a ''user'' community as well. We
> hope
> >> to continue to encourage contributions from these developers and
> community
> >> members and grow them into committers after they have had time to
> continue
> >> their contributions.
> >>
> >> === Reliance on Salaried Developers ===
> >>
> >> As mentioned above, the majority of development up to this point has
> been
> >> sponsored by Cloudera. We have seen several community users participate
> in
> >> discussions who are hobbyists interested in distributed systems and
> >> databases, and hope that they will continue their participation in the
> >> project going forward.
> >>
> >> === Relationships with Other Apache Products ===
> >>
> >> Kudu is currently related to the following other Apache projects:
> >>
> >> * Hadoop: Kudu provides MapReduce input/output formats for integration
> >> * Spark: Kudu integrates with Spark via the above-mentioned input
> formats,
> >> and work is progressing on support for Spark Data Frames and Spark SQL.
> >>
> >>
> >>
> >> The Kudu team has reached out to several other Apache projects to start
> >> discussing integrations, including Flume, Kafka, Hive, and Drill.
> >>
> >>
> >>
> >> Kudu integrates with Impala, which is also being proposed for
> incubation.
> >>
> >>
> >>
> >> Kudu is already collaborating on ValueVector, a proposed TLP spinning
> out
> >> from the Apache Drill community.
> >>
> >>
> >>
> >> We look forward to continuing to integrate and collaborate with these
> >> communities.
> >>
> >> === An Excessive Fascination with the Apache Brand ===
> >>
> >> Many of the initial committers are already experienced Apache
> committers,
> >> and understand the true value provided by the Apache Way and the
> principles
> >> of the ASF. We believe that this development and contribution model is
> >> especially appropriate for storage products, where Apache’s
> >> community-over-code philosophy ensures long term viability and
> >> consensus-based participation.
> >>
> >> == Documentation ==
> >>
> >> * Documentation is written in AsciiDoc and committed in the Kudu source
> >> repository:
> >>
> >> * https://github.com/cloudera/kudu/tree/master/docs
> >>
> >>
> >>
> >> * The Kudu web site is version-controlled on the ‘gh-pages’ branch of
> the
> >> above repository.
> >>
> >> * A LaTeX whitepaper is also published, and the source is available
> within
> >> the same repository.
> >> * APIs are documented within the source code as JavaDoc or C++-style
> >> documentation comments.
> >> * Many design documents are stored within the source code repository as
> >> text files next to the code being documented.
> >>
> >> == Source and Intellectual Property Submission Plan ==
> >>
> >> The Kudu codebase and web site is currently hosted on GitHub and will be
> >> transitioned to the ASF repositories during incubation. Kudu is already
> >> licensed under the Apache 2.0 license.
> >>
> >>
> >>
> >> Some portions of the code are imported from other open source projects
> >> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by
> authors
> >> other than the initial committers. These copyright notices are
> maintained
> >> in those files as well as a top-level NOTICE.txt file. We believe this
> to
> >> be permissible under the license terms and ASF policies, and confirmed
> via
> >> a recent thread on general@incubator.apache.org .
> >>
> >>
> >>
> >> The “Kudu” name is not a registered trademark, though before the initial
> >> release of the project, we performed a trademark search and Cloudera’s
> >> legal counsel deemed it acceptable in the context of a data storage
> engine.
> >> There exists an unrelated open source project by the same name related
> to
> >> deployments on Microsoft’s Azure cloud service. We have been in contact
> >> with legal counsel from Microsoft and have obtained their approval for
> the
> >> use of the Kudu name.
> >>
> >>
> >>
> >> Cloudera currently owns several domain names related to Kudu (
> getkudu.io,
> >> kududb.io, et al) which will be transferred to the ASF and redirected
> to
> >> the official page during incubation.
> >>
> >>
> >>
> >> Portions of Kudu are protected by pending or published patents owned by
> >> Cloudera. Given the protections already granted by the Apache License,
> we
> >> do not anticipate any explicit licensing or transfer of this
> intellectual
> >> property.
> >>
> >> == External Dependencies ==
> >>
> >> The full set of dependencies and licenses are listed in
> >> https://github.com/cloudera/kudu/blob/master/LICENSE.txt
> >>
> >> and summarized here:
> >>
> >> * '''Twitter Bootstrap''': Apache 2.0
> >> * '''d3''': BSD 3-clause
> >> * '''epoch JS library''': MIT
> >> * '''lz4''': BSD 2-clause
> >> * '''gflags''': BSD 3-clause
> >> * '''glog''': BSD 3-clause
> >> * '''gperftools''': BSD 3-clause
> >> * '''libev''': BSD 2-clause
> >> * '''squeasel''':MIT license
> >> * '''protobuf''': BSD 3-clause
> >> * '''rapidjson''': MIT
> >> * '''snappy''': BSD 3-clause
> >> * '''trace-viewer''': BSD 3-clause
> >> * '''zlib''': zlib license
> >> * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike)
> >> * '''bitshuffle''': MIT
> >> * '''boost''': Boost license
> >> * '''curl''': MIT
> >> * '''libunwind''': MIT
> >> * '''nvml''': BSD 3-clause
> >> * '''cyrus-sasl''': Cyrus SASL license (BSD-alike)
> >> * '''openssl''': OpenSSL License (BSD-alike)
> >>
> >> * '''Guava''': Apache 2.0
> >> * '''StumbleUpon Async''': BSD
> >> * '''Apache Hadoop''': Apache 2.0
> >> * '''Apache log4j''': Apache 2.0
> >> * '''Netty''': Apache 2.0
> >> * '''slf4j''': MIT
> >> * '''Apache Commons''': Apache 2.0
> >> * '''murmur''': Apache 2.0
> >>
> >>
> >> '''Build/test-only dependencies''':
> >>
> >> * '''CMake''': BSD 3-clause
> >> * '''gcovr''': BSD 3-clause
> >> * '''gmock''': BSD 3-clause
> >> * '''Apache Maven''': Apache 2.0
> >> * '''JUnit''': EPL
> >> * '''Mockito''': MIT
> >>
> >> == Cryptography ==
> >>
> >> Kudu does not currently include any cryptography-related code.
> >>
> >> == Required Resources ==
> >>
> >> === Mailing lists ===
> >>
> >> * priv...@kudu.incubator.apache.org (PMC)
> >> * comm...@kudu.incubator.apache.org (git push emails)
> >> * iss...@kudu.incubator.apache.org (JIRA issue feed)
> >> * d...@kudu.incubator.apache.org (Gerrit code reviews plus dev
> discussion)
> >> * u...@kudu.incubator.apache.org (User questions)
> >>
> >>
> >> === Repository ===
> >>
> >> * git://git.apache.org/kudu
> >>
> >> === Gerrit ===
> >>
> >> We hope to continue using Gerrit for our code review and commit
> workflow.
> >> The Kudu team has already been in contact with Jake Farrell to start
> >> discussions on how Gerrit can fit into the ASF. We know that several
> other
> >> ASF projects and podlings are also interested in Gerrit.
> >>
> >>
> >>
> >> If the Infrastructure team does not have the bandwidth to support
> Gerrit,
> >> we will continue to support our own instance of Gerrit for Kudu, and
> make
> >> the necessary integrations such that commits are properly authenticated
> and
> >> maintain sufficient provenance to uphold the ASF standards (e.g. via the
> >> solution adopted by the AsterixDB podling).
> >>
> >> == Issue Tracking ==
> >>
> >> We would like to import our current JIRA project into the ASF JIRA, such
> >> that our historical commit messages and code comments continue to
> reference
> >> the appropriate bug numbers.
> >>
> >> == Initial Committers ==
> >>
> >> * Adar Dembo a...@cloudera.com
> >> * Alex Feinberg a...@strlen.net
> >> * Andrew Wang w...@apache.org
> >> * Dan Burkert d...@cloudera.com
> >> * David Alves dral...@apache.org
> >> * Jean-Daniel Cryans jdcry...@apache.org
> >> * Mike Percy mpe...@apache.org
> >> * Misty Stanley-Jones mi...@apache.org
> >> * Todd Lipcon t...@apache.org
> >>
> >> The initial list of committers was seeded by listing those contributors
> who
> >> have contributed 20 or more patches in the last 12 months, indicating
> that
> >> they are active and have achieved merit through participation on the
> >> project. We chose not to include other contributors who either have not
> yet
> >> contributed a significant number of patches, or whose contributions are
> far
> >> in the past and we don’t expect to be active within the ASF.
> >>
> >> == Affiliations ==
> >>
> >> * Adar Dembo - Cloudera
> >> * Alex Feinberg - Forward Networks
> >> * Andrew Wang - Cloudera
> >> * Dan Burkert - Cloudera
> >> * David Alves - Cloudera
> >> * Jean-Daniel Cryans - Cloudera
> >> * Mike Percy - Cloudera
> >> * Misty Stanley-Jones - Cloudera
> >> * Todd Lipcon - Cloudera
> >>
> >> == Sponsors ==
> >>
> >> === Champion ===
> >>
> >> * Todd Lipcon
> >>
> >> === Nominated Mentors ===
> >>
> >> * Jake Farrell - ASF Member and Infra team member, Acquia
> >> * Brock Noland - ASF Member, StreamSets
> >> * Michael Stack - ASF Member, Cloudera
> >> * Jarek Jarcec Cecho - ASF Member, Cloudera
> >> * Chris Mattmann - ASF Member, NASA JPL and USC
> >> * Julien Le Dem - Incubator PMC, Dremio
> >> * Carl Steinbach - ASF Member, LinkedIn
> >>
> >> === Sponsoring Entity ===
> >>
> >> The Apache Incubator
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Reply via email to