Re: [DISCUSS] HAWQ Incubation Proposal

Ted Dunning Thu, 20 Aug 2015 21:03:57 -0700

Drill is implemented entirely in Java.

This isn't core to the proposal, but it would be better corrected.




On Thu, Aug 20, 2015 at 8:33 PM, 김영우 (Youngwoo Kim) <warwit...@gmail.com>
wrote:

> Hi Roman,
>
> Great news!
>
> BTW, it might be a invalid URL for the proposal. Should be
> https://wiki.apache.org/incubator/HAWQProposal ?
>
> Thanks,
> Youngwoo
>
> On Fri, Aug 21, 2015 at 12:14 PM, Roman Shaposhnik <r...@apache.org> wrote:
>
> > Hi!
> >
> > I would like to start a discussion on accepting HAWQ
> > into ASF Incubator. The proposal is available at:
> >     https://wiki.apache.org/incubator/ApexProposal
> > and is also attached to the end of this email.
> >
> > Please note, that this proposal is very complementary
> > to the desire of HAWQ's sister project (MADlib) to
> > join ASF Incubator:
> >     http://madlib.net/pipermail/user/2015-August/
> >     http://madlib.net/pipermail/devel/2015-August/
> > I've volunteered to help MADlib community and we're
> > currently working on a separate proposal to be submitted
> > later next week. If you're interested in monitoring progress
> > of that please see updates to:
> >      https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal
> > and later:
> >      https://wiki.apache.org/incubator/MADlibProposal
> >
> > Thanks in advance for your time and help.
> >
> > Thanks,
> > Roman.
> >
> > == Abstract ==
> >
> > HAWQ is an advanced enterprise SQL on Hadoop analytic engine built
> > around a robust and high-performance massively-parallel processing
> > (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ.
> >
> > HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating
> > with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as
> > Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and
> > managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL
> > compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP
> > extensions) and supports open database connectivity (ODBC) and Java
> > database connectivity (JDBC), as well. Most business intelligence,
> > data analysis and data visualization tools work with HAWQ out of the
> > box without the need for specialized drivers.
> >
> > A unique aspect of HAWQ is its integration of statistical and machine
> > learning capabilities that can be natively invoked from SQL or (in the
> > context of PL/Python, PL/Java or PL/R) in massively parallel modes and
> > applied to large data sets across a Hadoop cluster. These capabilities
> > are provided through MADlib – an existing open source, parallel
> > machine-learning library. Given the close ties between the two
> > development communities, the MADlib community has expressed interest
> > in joining HAWQ on its journey into the ASF Incubator and will be
> > submitting a separate, concurrent proposal.
> >
> > HAWQ will provide more robust and higher performing options for Hadoop
> > environments that demand best-in-class data analytics for business
> > critical purposes. HAWQ is implemented in C and C++.
> >
> > == Proposal ==
> > The goal of this proposal is to bring the core of Pivotal Software,
> > Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software
> > Foundation (ASF) in order to build a vibrant, diverse and
> > self-governed open source community around the technology. Pivotal has
> > agreed to transfer the brand name "HAWQ" to Apache Software Foundation
> > and will stop using HAWQ to refer to this software if the project gets
> > accepted into the ASF Incubator under the name of "Apache HAWQ
> > (incubating)". Pivotal will continue to market and sell an analytic
> > engine product that includes Apache HAWQ (incubating). While HAWQ is
> > our primary choice for a name of the project, in anticipation of any
> > potential issues with PODLINGNAMESEARCH we have come up with two
> > alternative names: (1) Hornet; or (2) Grove.
> >
> > Pivotal is submitting this proposal to donate the HAWQ source code and
> > associated artifacts (documentation, web site content, wiki, etc.) to
> > the Apache Software Foundation Incubator under the Apache License,
> > Version 2.0 and is asking Incubator PMC to establish an open source
> > community.
> >
> > == Background ==
> > While the ecosystem of open source SQL-on-Hadoop solutions is fairly
> > developed by now, HAWQ has several unique features that will set it
> > apart from existing ASF and non-ASF projects. HAWQ made its debut in
> > 2013 as a closed source product leveraging a decade's worth of product
> > development effort invested in Greenplum DatabaseⓇ. Since then HAWQ
> > has rapidly gained a solid customer base and became available on
> > non-Pivotal distributions of Hadoop.
> > In 2015 HAWQ still leverages the rock solid foundation of Greenplum
> > Database, while at the same time embracing elasticity and resource
> > management native to Hadoop applications. This allows HAWQ to provide
> > superior SQL on Hadoop performance, scalability and coverage while
> > also providing massively-parallel machine learning capabilities and
> > support for native Hadoop file formats. In addition, HAWQ's advanced
> > features include support for complex joins, rich and compliant SQL
> > dialect and industry-differentiating data federation capabilities.
> > Dynamic pipelining and pluggable query optimizer architecture enable
> > HAWQ to perform queries on Hadoop with the speed and scalability
> > required for enterprise data warehouse (EDW) workloads. HAWQ provides
> > strong support for low-latency analytic SQL queries, coupled with
> > massively parallel machine learning capabilities. This enables
> > discovery-based analysis of large data sets and rapid, iterative
> > development of data analytics applications that apply deep machine
> > learning – significantly shortening data-driven innovation cycles for
> > the enterprise.
> >
> > Hundreds of companies and thousands of servers are running
> > mission-critical applications today on HAWQ managing over PBs of data.
> >
> > == Rationale ==
> > Hadoop and HDFS-based data management architectures continue their
> > expansion into the enterprise. As the amount of data stored on Hadoop
> > clusters grows, unlocking the analytics capabilities and democratizing
> > access to that treasure trove of data becomes one of the key concerns.
> > While Hadoop has no shortage of purposefully designed analytical
> > frameworks, the easiest and most cost-effective way to onboard the
> > largest amount of data consumers is provided by offering SQL APIs for
> > data retrieval at scale. Of course, given the high velocity of
> > innovation happening in the underlying Hadoop ecosystem, any
> > SQL-on-Hadoop solution has to keep up with the community. We strongly
> > believe that in the Big Data space, this can be optimally achieved
> > through a vibrant, diverse, self-governed community collectively
> > innovating around a single codebase while at the same time
> > cross-pollinating with various other data management communities.
> > Apache Software Foundation is the ideal place to meet those ambitious
> > goals. We also believe that our initial experience of bringing Pivotal
> > GemfireⓇ into ASF as Apache Geode (incubating) could be leveraged thus
> > improving the chances of HAWQ becoming a vibrant Apache community.
> >
> > == Initial Goals ==
> > Our initial goals are to bring HAWQ into the ASF, transition internal
> > engineering processes into the open, and foster a collaborative
> > development model according to the "Apache Way." Pivotal and its
> > partners plan to develop new functionality in an open,
> > community-driven way. To get there, the existing internal build, test
> > and release processes will be refactored to support open development.
> >
> > == Current Status ==
> > Currently, the project code base is commercially licensed and is not
> > available to the general public. The documentation and wiki pages are
> > available at FIXME. Although Pivotal HAWQ was developed as a
> > proprietary, closed-source product, its roots are in the PostgreSQL
> > community and the internal engineering practices adopted by the
> > development team lend themselves well to an open, collaborative and
> > meritocratic environment.
> >
> > The Pivotal HAWQ team has always focused on building a robust end user
> > community of paying and non-paying customers. The existing
> > documentation along with StackOverflow and other similar forums are
> > expected to facilitate conversions between our existing users so as to
> > transform them into an active community of HAWQ members, stakeholders
> > and developers.
> >
> > === Meritocracy ===
> > Our proposed list of initial committers include the current HAWQ R&D
> > team, Pivotal Field Engineers, and several existing partners. This
> > group will form a base for the broader community we will invite to
> > collaborate on the codebase. We intend to radically expand the initial
> > developer and user community by running the project in accordance with
> > the "Apache Way". Users and new contributors will be treated with
> > respect and welcomed. By participating in the community and providing
> > quality patches/support that move the project forward, contributors
> > will earn merit. They also will be encouraged to provide non-code
> > contributions (documentation, events, community management, etc.) and
> > will gain merit for doing so. Those with a proven support and quality
> > track record will be encouraged to become committers.
> >
> > === Community ===
> > If HAWQ is accepted for incubation, the primary initial goal will be
> > transitioning the core community towards embracing the Apache Way of
> > project governance. We would solicit major existing contributors to
> > become committers on the project from the start.
> >
> > === Core Developers ===
> >
> > A few of HAWQ's core developers are skilled in working as part of
> > openly governed Apache communities (mainly around Hadoop ecosystem).
> > That said, most of the core developers are currently NOT affiliated
> > with the ASF and would require new ICLAs before committing to the
> > project.
> >
> > === Alignment ===
> > The following existing ASF projects can be considered when reviewing
> > HAWQ proposal:
> >
> > Apache Hadoop is a distributed storage and processing framework for
> > very large datasets, focusing primarily on batch processing for
> > analytic purposes. HAWQ builds on top of two key pieces of Hadoop:
> > YARN and HDFS. HAWQ's community roadmap includes plans for
> > contributing Hadoop around HDFS features and increasing support for C
> > and C++ clients.
> >
> > Apache Spark™ is a fast engine for processing large datasets,
> > typically from a Hadoop cluster, and performing batch, streaming,
> > interactive, or machine learning workloads.  Recently, Apache Spark
> > has embraced SQL-like APIs around DataFrames at its core. Because of
> > that we would expect a level of collaboration between the two projects
> > when it comes to query optimization and exposing HAWQ tables to Spark
> > analytical pipelines.
> >
> > Apache Hive™ is a data warehouse software that facilitates querying
> > and managing large datasets residing in distributed storage. Hive
> > provides a mechanism to project structure onto this data and query the
> > data using a SQL-like language called HiveQL. Hive is also providing
> > HCatalog capabilities as table and storage management layer for
> > Hadoop, enabling users with different data processing tools to more
> > easily define structure for the data on the grid. Currently the core
> > Hive and HAWQ are viewed as complimentary solutions, but we expect
> > close integration with HCatalog given its dominant position for
> > metadata management on the Hadoop clusters.
> >
> > Apache Drill is a schema-free SQL query engine for Hadoop, NoSQL and
> > Cloud Storage. Drill is similar to HAWQ but focuses on slightly
> > different areas (FIXME). Given Drill's implementation based on C and
> > C++ and and overall architecture there could be quite a lot of
> > collaboration focused on lower level building blocks.
> >
> > Apache Phoenix is a high performance relational database layer over
> > HBase for low latency applications. Given Phoenix's exclusive focus on
> > HBase for its data management backend and its overall architecture
> > around HBase's co-processors, it is unlikely that there will be much
> > collaboration between the two projects.
> >
> > == Known Risks ==
> > Development has been sponsored mostly by a single company (or its
> > predecessors) thus far and coordinated mainly by the core Pivotal HAWQ
> > team.
> >
> > For the project to fully transition to the Apache Way governance
> > model, development must shift towards the meritocracy-centric model of
> > growing a community of contributors balanced with the needs for
> > extreme stability and core implementation coherency.
> >
> > The tools and development practices in place for the Pivotal HAWQ
> > product are compatible with the ASF infrastructure and thus we do not
> > anticipate any on-boarding pains.
> >
> > The project currently includes a modified version of PostgreSQL 8.3
> > source code. Given the ASF's position that the PostgreSQL License is
> > compatible with the Apache License version 2.0, we do NOT anticipate
> > any issues with licensing the code base. However, any new capabilities
> > developed by the HAWQ team once part of the ASF would need to be
> > consumed by the PostgreSQL community under the Apache License version
> > 2.0.
> >
> > === Orphaned products ===
> > Pivotal is fully committed to maintaining its position as one of the
> > leading providers of SQL-on-Hadoop solutions and the corresponding
> > Pivotal commercial product will continue to be based on the HAWQ
> > project. Moreover, Pivotal has a vested interest in making HAWQ
> > successful by driving its close integration with both existing
> > projects contributed by Pivotal including Apache Geode (incubating)
> > and MADlib (which is requesting Incubation), and sister ASF projects.
> > We expect this to further reduces the risk of orphaning the product.
> >
> > === Inexperience with Open Source ===
> > Pivotal has embraced open source software since its formation by
> > employing contributors/committers and by shepherding open source
> > projects like Cloud Foundry, Spring, RabbitMQ and MADlib. Individuals
> > working at Pivotal have experience with the formation of vibrant
> > communities around open technologies with the Cloud Foundry
> > Foundation, and continuing with the creation of a community around
> > Apache Geode (incubating).  Although some of the initial committers
> > have not had the experience of developing entirely open source,
> > community-driven projects, we expect to bring to bear the open
> > development practices that have proven successful on longstanding
> > Pivotal open source projects to the HAWQ community.  Additionally,
> > several ASF veterans have agreed to mentor the project and are listed
> > in this proposal. The project will rely on their collective guidance
> > and wisdom to quickly transition the entire team of initial committers
> > towards practicing the Apache Way.
> >
> > === Homogeneous Developers ===
> > While most of the initial committers are employed by Pivotal, we have
> > already seen a healthy level of interest from existing customers and
> > partners. We intend to convert that interest directly into
> > participation and will be investing in activities to recruit
> > additional committers from other companies.
> >
> > === Reliance on Salaried Developers ===
> > Most of the contributors are paid to work in the Big Data space. While
> > they might wander from their current employers, they are unlikely to
> > venture far from their core expertise and thus will continue to be
> > engaged with the project regardless of their current employers.
> >
> > === Relationships with Other Apache Products ===
> > As mentioned in the Alignment section, HAWQ may consider various
> > degrees of integration and code exchange with Apache Hadoop, Apache
> > Spark, Apache Hive and Apache Drill projects. We expect integration
> > points to be inside and outside the project. We look forward to
> > collaborating with these communities as well as other communities
> > under the Apache umbrella.
> >
> > === An Excessive Fascination with the Apache Brand ===
> > While we intend to leverage the Apache ‘branding’ when talking to
> > other projects as testament of our project’s ‘neutrality’, we have no
> > plans for making use of Apache brand in press releases nor posting
> > billboards advertising acceptance of HAWQ into Apache Incubator.
> >
> > == Documentation ==
> > The documentation is currently available at http://hawq.docs.pivotal.io/
> >
> > == Initial Source ==
> > Initial source code will be available immediately after Incubator PMC
> > approves HAWQ joining the Incubator and will be licensed under the
> > Apache License v2.
> >
> > == Source and Intellectual Property Submission Plan ==
> > As soon as HAWQ is approved to join the Incubator, the source code
> > will be transitioned via an exhibit to Pivotal's current Software
> > Grant Agreement onto ASF infrastructure and in turn made available
> > under the Apache License, version 2.0.  We know of no legal
> > encumberments that would inhibit the transfer of source code to the
> > ASF.
> >
> > == External Dependencies ==
> >
> > Runtime dependencies:
> >   * gimli (BSD)
> >   * openldap (The OpenLDAP Public License)
> >   * openssl (OpenSSL License and the Original SSLeay License, BSD style)
> >   * proj (MIT)
> >   * yaml (Creative Commons Attribution 2.0 License)
> >   * python (Python Software Foundation License Version 2)
> >   * apr-util (Apache Version 2.0)
> >   * bzip2 (BSD-style License)
> >   * curl (MIT/X Derivate License)
> >   * gperf (GPL Version 3)
> >   * protobuf (Google)
> >   * libevent (BSD)
> >   * json-c (https://github.com/json-c/json-c/blob/master/COPYING)
> >   * krb5 (MIT)
> >   * pcre (BSD)
> >   * libedit (BSD)
> >   * libxml2 (MIT)
> >   * zlib (Permissive Free Software License)
> >   * libgsasl (LGPL Version 2.1)
> >   * thrift (Apache Version 2.0)
> >   * snappy (Apache Version 2.0 (up to 1.0.1)/New BSD)
> >   * libuuid-2.26 (LGPL Version 2)
> >   * apache hadoop (Apache Version 2.0)
> >   * apache avro (Apache Version 2.0)
> >   * glog (BSD)
> >   * googlemock (BSD)
> >
> > Build only dependencies:
> >   * ant (Apache Version 2.0)
> >   * maven (Apache Version 2.0)
> >   * cmake (BSD)
> >
> > Test only dependencies:
> >   * googletest (BSD)
> >
> > Cryptography N/A
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >   * priv...@hawq.incubator.apache.org (moderated subscriptions)
> >   * comm...@hawq.incubator.apache.org
> >   * d...@hawq.incubator.apache.org
> >   * iss...@hawq.incubator.apache.org
> >   * u...@hawq.incubator.apache.org
> >
> > === Git Repository ===
> > https://git-wip-us.apache.org/repos/asf/incubator-hawq.git
> >
> > === Issue Tracking ===
> > JIRA Project HAWQ (HAWQ)
> >
> > === Other Resources ===
> >
> > Means of setting up regular builds for HAWQ on builds.apache.org will
> > require integration with Docker support.
> >
> > == Initial Committers ==
> >   * Lirong Jian
> >   * Hubert Huan Zhang
> >   * Radar Da Lei
> >   * Ivan Yanqing Weng
> >   * Zhanwei Wang
> >   * Yi Jin
> >   * Lili Ma
> >   * Jiali Yao
> >   * Zhenglin Tao
> >   * Ruilong Huo
> >   * Ming Li
> >   * Wen Lin
> >   * Lei Chang
> >   * Alexander V Denissov
> >   * Newton Alex
> >   * Oleksandr Diachenko
> >   * Jun Aoki
> >   * Bhuvnesh Chaudhary
> >   * Vineet Goel
> >   * Shivram Mani
> >   * Noa Horn
> >   * Sujeet S Varakhedi
> >   * Junwei (Jimmy) Da
> >   * Ting (Goden) Yao
> >   * Mohammad F (Foyzur) Rahman
> >   * Entong Shen
> >   * George C Caragea
> >   * Amr El-Helw
> >   * Mohamed F Soliman
> >   * Venkatesh (Venky) Raghavan
> >   * Carlos Garcia
> >   * Zixi (Jesse) Zhang
> >   * Michael P Schubert
> >   * C.J. Jameson
> >   * Jacob Frank
> >   * Ben Calegari
> >   * Shoabe Shariff
> >   * Rob Day-Reynolds
> >   * Mel S Kiyama
> >   * Charles Alan Litzell
> >   * David Yozie
> >   * Caleb Welton
> >   * Parham Parvizi
> >   * Dan Baskette
> >   * Christian Tzolov
> >   * Tushar Pednekar
> >   * Greg Chase
> >   * Chloe Jackson
> >   * Michael Nixon
> >   * Roman Shaposhnik
> >   * Alan Gates
> >   * Owen O'Malley
> >   * Thejas Nair
> >   * Don Bosco Durai
> >   * Konstantin Boudnik
> >   * Sergey Soldatov
> >   * Atri Sharma
> >
> > == Affiliations ==
> >   * Barclays:  Atri Sharma
> >   * Hortonworks: Alan Gates, Owen O'Malley, Thejas Nair, Don Bosco Durai
> >   * WANDisco: Konstantin Boudnik, Sergey Soldatov
> >   * Pivotal: everyone else on this proposal
> >
> > == Sponsors ==
> >
> > === Champion ===
> > Roman Shaposhnik
> >
> > === Nominated Mentors ===
> >
> > The initial mentors are listed below:
> >   * Alan Gates - Apache Member, Hortonworks
> >   * Owen O'Malley - Apache Member, Hortonworks
> >   * Thejas Nair - Apache Member, Hortonworks
> >   * Konstantin Boudnik - Apache Member, WANDisco
> >   * Roman Shaposhnik - Apache Member, Pivotal
> >
> > === Sponsoring Entity ===
> > We would like to propose Apache incubator to sponsor this project.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>

Re: [DISCUSS] HAWQ Incubation Proposal

Reply via email to