Hi Ted! this is one of those blind spot things I can't even explain. I was so sure Drill was implemented in C/C++ that I never bothered to check. I'm now curios as to who may have implanted that false memory (did you guys talk about when it was first getting designed?).
But anyway, more to the point: great feedback. That section really doesn't make much sense if Drill is 100% Java. The wiki has been updated. Thanks, Roman. On Thu, Aug 20, 2015 at 9:02 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > Drill is implemented entirely in Java. > > This isn't core to the proposal, but it would be better corrected. > > > > On Thu, Aug 20, 2015 at 8:33 PM, 김영우 (Youngwoo Kim) <warwit...@gmail.com> > wrote: > >> Hi Roman, >> >> Great news! >> >> BTW, it might be a invalid URL for the proposal. Should be >> https://wiki.apache.org/incubator/HAWQProposal ? >> >> Thanks, >> Youngwoo >> >> On Fri, Aug 21, 2015 at 12:14 PM, Roman Shaposhnik <r...@apache.org> wrote: >> >> > Hi! >> > >> > I would like to start a discussion on accepting HAWQ >> > into ASF Incubator. The proposal is available at: >> > https://wiki.apache.org/incubator/ApexProposal >> > and is also attached to the end of this email. >> > >> > Please note, that this proposal is very complementary >> > to the desire of HAWQ's sister project (MADlib) to >> > join ASF Incubator: >> > http://madlib.net/pipermail/user/2015-August/ >> > http://madlib.net/pipermail/devel/2015-August/ >> > I've volunteered to help MADlib community and we're >> > currently working on a separate proposal to be submitted >> > later next week. If you're interested in monitoring progress >> > of that please see updates to: >> > https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal >> > and later: >> > https://wiki.apache.org/incubator/MADlibProposal >> > >> > Thanks in advance for your time and help. >> > >> > Thanks, >> > Roman. >> > >> > == Abstract == >> > >> > HAWQ is an advanced enterprise SQL on Hadoop analytic engine built >> > around a robust and high-performance massively-parallel processing >> > (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ. >> > >> > HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating >> > with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as >> > Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and >> > managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL >> > compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP >> > extensions) and supports open database connectivity (ODBC) and Java >> > database connectivity (JDBC), as well. Most business intelligence, >> > data analysis and data visualization tools work with HAWQ out of the >> > box without the need for specialized drivers. >> > >> > A unique aspect of HAWQ is its integration of statistical and machine >> > learning capabilities that can be natively invoked from SQL or (in the >> > context of PL/Python, PL/Java or PL/R) in massively parallel modes and >> > applied to large data sets across a Hadoop cluster. These capabilities >> > are provided through MADlib – an existing open source, parallel >> > machine-learning library. Given the close ties between the two >> > development communities, the MADlib community has expressed interest >> > in joining HAWQ on its journey into the ASF Incubator and will be >> > submitting a separate, concurrent proposal. >> > >> > HAWQ will provide more robust and higher performing options for Hadoop >> > environments that demand best-in-class data analytics for business >> > critical purposes. HAWQ is implemented in C and C++. >> > >> > == Proposal == >> > The goal of this proposal is to bring the core of Pivotal Software, >> > Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software >> > Foundation (ASF) in order to build a vibrant, diverse and >> > self-governed open source community around the technology. Pivotal has >> > agreed to transfer the brand name "HAWQ" to Apache Software Foundation >> > and will stop using HAWQ to refer to this software if the project gets >> > accepted into the ASF Incubator under the name of "Apache HAWQ >> > (incubating)". Pivotal will continue to market and sell an analytic >> > engine product that includes Apache HAWQ (incubating). While HAWQ is >> > our primary choice for a name of the project, in anticipation of any >> > potential issues with PODLINGNAMESEARCH we have come up with two >> > alternative names: (1) Hornet; or (2) Grove. >> > >> > Pivotal is submitting this proposal to donate the HAWQ source code and >> > associated artifacts (documentation, web site content, wiki, etc.) to >> > the Apache Software Foundation Incubator under the Apache License, >> > Version 2.0 and is asking Incubator PMC to establish an open source >> > community. >> > >> > == Background == >> > While the ecosystem of open source SQL-on-Hadoop solutions is fairly >> > developed by now, HAWQ has several unique features that will set it >> > apart from existing ASF and non-ASF projects. HAWQ made its debut in >> > 2013 as a closed source product leveraging a decade's worth of product >> > development effort invested in Greenplum DatabaseⓇ. Since then HAWQ >> > has rapidly gained a solid customer base and became available on >> > non-Pivotal distributions of Hadoop. >> > In 2015 HAWQ still leverages the rock solid foundation of Greenplum >> > Database, while at the same time embracing elasticity and resource >> > management native to Hadoop applications. This allows HAWQ to provide >> > superior SQL on Hadoop performance, scalability and coverage while >> > also providing massively-parallel machine learning capabilities and >> > support for native Hadoop file formats. In addition, HAWQ's advanced >> > features include support for complex joins, rich and compliant SQL >> > dialect and industry-differentiating data federation capabilities. >> > Dynamic pipelining and pluggable query optimizer architecture enable >> > HAWQ to perform queries on Hadoop with the speed and scalability >> > required for enterprise data warehouse (EDW) workloads. HAWQ provides >> > strong support for low-latency analytic SQL queries, coupled with >> > massively parallel machine learning capabilities. This enables >> > discovery-based analysis of large data sets and rapid, iterative >> > development of data analytics applications that apply deep machine >> > learning – significantly shortening data-driven innovation cycles for >> > the enterprise. >> > >> > Hundreds of companies and thousands of servers are running >> > mission-critical applications today on HAWQ managing over PBs of data. >> > >> > == Rationale == >> > Hadoop and HDFS-based data management architectures continue their >> > expansion into the enterprise. As the amount of data stored on Hadoop >> > clusters grows, unlocking the analytics capabilities and democratizing >> > access to that treasure trove of data becomes one of the key concerns. >> > While Hadoop has no shortage of purposefully designed analytical >> > frameworks, the easiest and most cost-effective way to onboard the >> > largest amount of data consumers is provided by offering SQL APIs for >> > data retrieval at scale. Of course, given the high velocity of >> > innovation happening in the underlying Hadoop ecosystem, any >> > SQL-on-Hadoop solution has to keep up with the community. We strongly >> > believe that in the Big Data space, this can be optimally achieved >> > through a vibrant, diverse, self-governed community collectively >> > innovating around a single codebase while at the same time >> > cross-pollinating with various other data management communities. >> > Apache Software Foundation is the ideal place to meet those ambitious >> > goals. We also believe that our initial experience of bringing Pivotal >> > GemfireⓇ into ASF as Apache Geode (incubating) could be leveraged thus >> > improving the chances of HAWQ becoming a vibrant Apache community. >> > >> > == Initial Goals == >> > Our initial goals are to bring HAWQ into the ASF, transition internal >> > engineering processes into the open, and foster a collaborative >> > development model according to the "Apache Way." Pivotal and its >> > partners plan to develop new functionality in an open, >> > community-driven way. To get there, the existing internal build, test >> > and release processes will be refactored to support open development. >> > >> > == Current Status == >> > Currently, the project code base is commercially licensed and is not >> > available to the general public. The documentation and wiki pages are >> > available at FIXME. Although Pivotal HAWQ was developed as a >> > proprietary, closed-source product, its roots are in the PostgreSQL >> > community and the internal engineering practices adopted by the >> > development team lend themselves well to an open, collaborative and >> > meritocratic environment. >> > >> > The Pivotal HAWQ team has always focused on building a robust end user >> > community of paying and non-paying customers. The existing >> > documentation along with StackOverflow and other similar forums are >> > expected to facilitate conversions between our existing users so as to >> > transform them into an active community of HAWQ members, stakeholders >> > and developers. >> > >> > === Meritocracy === >> > Our proposed list of initial committers include the current HAWQ R&D >> > team, Pivotal Field Engineers, and several existing partners. This >> > group will form a base for the broader community we will invite to >> > collaborate on the codebase. We intend to radically expand the initial >> > developer and user community by running the project in accordance with >> > the "Apache Way". Users and new contributors will be treated with >> > respect and welcomed. By participating in the community and providing >> > quality patches/support that move the project forward, contributors >> > will earn merit. They also will be encouraged to provide non-code >> > contributions (documentation, events, community management, etc.) and >> > will gain merit for doing so. Those with a proven support and quality >> > track record will be encouraged to become committers. >> > >> > === Community === >> > If HAWQ is accepted for incubation, the primary initial goal will be >> > transitioning the core community towards embracing the Apache Way of >> > project governance. We would solicit major existing contributors to >> > become committers on the project from the start. >> > >> > === Core Developers === >> > >> > A few of HAWQ's core developers are skilled in working as part of >> > openly governed Apache communities (mainly around Hadoop ecosystem). >> > That said, most of the core developers are currently NOT affiliated >> > with the ASF and would require new ICLAs before committing to the >> > project. >> > >> > === Alignment === >> > The following existing ASF projects can be considered when reviewing >> > HAWQ proposal: >> > >> > Apache Hadoop is a distributed storage and processing framework for >> > very large datasets, focusing primarily on batch processing for >> > analytic purposes. HAWQ builds on top of two key pieces of Hadoop: >> > YARN and HDFS. HAWQ's community roadmap includes plans for >> > contributing Hadoop around HDFS features and increasing support for C >> > and C++ clients. >> > >> > Apache Spark™ is a fast engine for processing large datasets, >> > typically from a Hadoop cluster, and performing batch, streaming, >> > interactive, or machine learning workloads. Recently, Apache Spark >> > has embraced SQL-like APIs around DataFrames at its core. Because of >> > that we would expect a level of collaboration between the two projects >> > when it comes to query optimization and exposing HAWQ tables to Spark >> > analytical pipelines. >> > >> > Apache Hive™ is a data warehouse software that facilitates querying >> > and managing large datasets residing in distributed storage. Hive >> > provides a mechanism to project structure onto this data and query the >> > data using a SQL-like language called HiveQL. Hive is also providing >> > HCatalog capabilities as table and storage management layer for >> > Hadoop, enabling users with different data processing tools to more >> > easily define structure for the data on the grid. Currently the core >> > Hive and HAWQ are viewed as complimentary solutions, but we expect >> > close integration with HCatalog given its dominant position for >> > metadata management on the Hadoop clusters. >> > >> > Apache Drill is a schema-free SQL query engine for Hadoop, NoSQL and >> > Cloud Storage. Drill is similar to HAWQ but focuses on slightly >> > different areas (FIXME). Given Drill's implementation based on C and >> > C++ and and overall architecture there could be quite a lot of >> > collaboration focused on lower level building blocks. >> > >> > Apache Phoenix is a high performance relational database layer over >> > HBase for low latency applications. Given Phoenix's exclusive focus on >> > HBase for its data management backend and its overall architecture >> > around HBase's co-processors, it is unlikely that there will be much >> > collaboration between the two projects. >> > >> > == Known Risks == >> > Development has been sponsored mostly by a single company (or its >> > predecessors) thus far and coordinated mainly by the core Pivotal HAWQ >> > team. >> > >> > For the project to fully transition to the Apache Way governance >> > model, development must shift towards the meritocracy-centric model of >> > growing a community of contributors balanced with the needs for >> > extreme stability and core implementation coherency. >> > >> > The tools and development practices in place for the Pivotal HAWQ >> > product are compatible with the ASF infrastructure and thus we do not >> > anticipate any on-boarding pains. >> > >> > The project currently includes a modified version of PostgreSQL 8.3 >> > source code. Given the ASF's position that the PostgreSQL License is >> > compatible with the Apache License version 2.0, we do NOT anticipate >> > any issues with licensing the code base. However, any new capabilities >> > developed by the HAWQ team once part of the ASF would need to be >> > consumed by the PostgreSQL community under the Apache License version >> > 2.0. >> > >> > === Orphaned products === >> > Pivotal is fully committed to maintaining its position as one of the >> > leading providers of SQL-on-Hadoop solutions and the corresponding >> > Pivotal commercial product will continue to be based on the HAWQ >> > project. Moreover, Pivotal has a vested interest in making HAWQ >> > successful by driving its close integration with both existing >> > projects contributed by Pivotal including Apache Geode (incubating) >> > and MADlib (which is requesting Incubation), and sister ASF projects. >> > We expect this to further reduces the risk of orphaning the product. >> > >> > === Inexperience with Open Source === >> > Pivotal has embraced open source software since its formation by >> > employing contributors/committers and by shepherding open source >> > projects like Cloud Foundry, Spring, RabbitMQ and MADlib. Individuals >> > working at Pivotal have experience with the formation of vibrant >> > communities around open technologies with the Cloud Foundry >> > Foundation, and continuing with the creation of a community around >> > Apache Geode (incubating). Although some of the initial committers >> > have not had the experience of developing entirely open source, >> > community-driven projects, we expect to bring to bear the open >> > development practices that have proven successful on longstanding >> > Pivotal open source projects to the HAWQ community. Additionally, >> > several ASF veterans have agreed to mentor the project and are listed >> > in this proposal. The project will rely on their collective guidance >> > and wisdom to quickly transition the entire team of initial committers >> > towards practicing the Apache Way. >> > >> > === Homogeneous Developers === >> > While most of the initial committers are employed by Pivotal, we have >> > already seen a healthy level of interest from existing customers and >> > partners. We intend to convert that interest directly into >> > participation and will be investing in activities to recruit >> > additional committers from other companies. >> > >> > === Reliance on Salaried Developers === >> > Most of the contributors are paid to work in the Big Data space. While >> > they might wander from their current employers, they are unlikely to >> > venture far from their core expertise and thus will continue to be >> > engaged with the project regardless of their current employers. >> > >> > === Relationships with Other Apache Products === >> > As mentioned in the Alignment section, HAWQ may consider various >> > degrees of integration and code exchange with Apache Hadoop, Apache >> > Spark, Apache Hive and Apache Drill projects. We expect integration >> > points to be inside and outside the project. We look forward to >> > collaborating with these communities as well as other communities >> > under the Apache umbrella. >> > >> > === An Excessive Fascination with the Apache Brand === >> > While we intend to leverage the Apache ‘branding’ when talking to >> > other projects as testament of our project’s ‘neutrality’, we have no >> > plans for making use of Apache brand in press releases nor posting >> > billboards advertising acceptance of HAWQ into Apache Incubator. >> > >> > == Documentation == >> > The documentation is currently available at http://hawq.docs.pivotal.io/ >> > >> > == Initial Source == >> > Initial source code will be available immediately after Incubator PMC >> > approves HAWQ joining the Incubator and will be licensed under the >> > Apache License v2. >> > >> > == Source and Intellectual Property Submission Plan == >> > As soon as HAWQ is approved to join the Incubator, the source code >> > will be transitioned via an exhibit to Pivotal's current Software >> > Grant Agreement onto ASF infrastructure and in turn made available >> > under the Apache License, version 2.0. We know of no legal >> > encumberments that would inhibit the transfer of source code to the >> > ASF. >> > >> > == External Dependencies == >> > >> > Runtime dependencies: >> > * gimli (BSD) >> > * openldap (The OpenLDAP Public License) >> > * openssl (OpenSSL License and the Original SSLeay License, BSD style) >> > * proj (MIT) >> > * yaml (Creative Commons Attribution 2.0 License) >> > * python (Python Software Foundation License Version 2) >> > * apr-util (Apache Version 2.0) >> > * bzip2 (BSD-style License) >> > * curl (MIT/X Derivate License) >> > * gperf (GPL Version 3) >> > * protobuf (Google) >> > * libevent (BSD) >> > * json-c (https://github.com/json-c/json-c/blob/master/COPYING) >> > * krb5 (MIT) >> > * pcre (BSD) >> > * libedit (BSD) >> > * libxml2 (MIT) >> > * zlib (Permissive Free Software License) >> > * libgsasl (LGPL Version 2.1) >> > * thrift (Apache Version 2.0) >> > * snappy (Apache Version 2.0 (up to 1.0.1)/New BSD) >> > * libuuid-2.26 (LGPL Version 2) >> > * apache hadoop (Apache Version 2.0) >> > * apache avro (Apache Version 2.0) >> > * glog (BSD) >> > * googlemock (BSD) >> > >> > Build only dependencies: >> > * ant (Apache Version 2.0) >> > * maven (Apache Version 2.0) >> > * cmake (BSD) >> > >> > Test only dependencies: >> > * googletest (BSD) >> > >> > Cryptography N/A >> > >> > == Required Resources == >> > >> > === Mailing lists === >> > * priv...@hawq.incubator.apache.org (moderated subscriptions) >> > * comm...@hawq.incubator.apache.org >> > * d...@hawq.incubator.apache.org >> > * iss...@hawq.incubator.apache.org >> > * u...@hawq.incubator.apache.org >> > >> > === Git Repository === >> > https://git-wip-us.apache.org/repos/asf/incubator-hawq.git >> > >> > === Issue Tracking === >> > JIRA Project HAWQ (HAWQ) >> > >> > === Other Resources === >> > >> > Means of setting up regular builds for HAWQ on builds.apache.org will >> > require integration with Docker support. >> > >> > == Initial Committers == >> > * Lirong Jian >> > * Hubert Huan Zhang >> > * Radar Da Lei >> > * Ivan Yanqing Weng >> > * Zhanwei Wang >> > * Yi Jin >> > * Lili Ma >> > * Jiali Yao >> > * Zhenglin Tao >> > * Ruilong Huo >> > * Ming Li >> > * Wen Lin >> > * Lei Chang >> > * Alexander V Denissov >> > * Newton Alex >> > * Oleksandr Diachenko >> > * Jun Aoki >> > * Bhuvnesh Chaudhary >> > * Vineet Goel >> > * Shivram Mani >> > * Noa Horn >> > * Sujeet S Varakhedi >> > * Junwei (Jimmy) Da >> > * Ting (Goden) Yao >> > * Mohammad F (Foyzur) Rahman >> > * Entong Shen >> > * George C Caragea >> > * Amr El-Helw >> > * Mohamed F Soliman >> > * Venkatesh (Venky) Raghavan >> > * Carlos Garcia >> > * Zixi (Jesse) Zhang >> > * Michael P Schubert >> > * C.J. Jameson >> > * Jacob Frank >> > * Ben Calegari >> > * Shoabe Shariff >> > * Rob Day-Reynolds >> > * Mel S Kiyama >> > * Charles Alan Litzell >> > * David Yozie >> > * Caleb Welton >> > * Parham Parvizi >> > * Dan Baskette >> > * Christian Tzolov >> > * Tushar Pednekar >> > * Greg Chase >> > * Chloe Jackson >> > * Michael Nixon >> > * Roman Shaposhnik >> > * Alan Gates >> > * Owen O'Malley >> > * Thejas Nair >> > * Don Bosco Durai >> > * Konstantin Boudnik >> > * Sergey Soldatov >> > * Atri Sharma >> > >> > == Affiliations == >> > * Barclays: Atri Sharma >> > * Hortonworks: Alan Gates, Owen O'Malley, Thejas Nair, Don Bosco Durai >> > * WANDisco: Konstantin Boudnik, Sergey Soldatov >> > * Pivotal: everyone else on this proposal >> > >> > == Sponsors == >> > >> > === Champion === >> > Roman Shaposhnik >> > >> > === Nominated Mentors === >> > >> > The initial mentors are listed below: >> > * Alan Gates - Apache Member, Hortonworks >> > * Owen O'Malley - Apache Member, Hortonworks >> > * Thejas Nair - Apache Member, Hortonworks >> > * Konstantin Boudnik - Apache Member, WANDisco >> > * Roman Shaposhnik - Apache Member, Pivotal >> > >> > === Sponsoring Entity === >> > We would like to propose Apache incubator to sponsor this project. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> > For additional commands, e-mail: general-h...@incubator.apache.org >> > >> > >> --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org