+1 (binding) Regards JB
On 05/07/2018 21:22, Dave Fisher wrote: > Hi All, > > I would like to start a VOTE to bring the Doris project as an Apache > incubator podling. > > The ASF voting rules are described: > > https://www.apache.org/foundation/voting.html > > A vote for accepting a new Apache Incubator podling is a majority vote > for which only Incubator PMC member votes are binding. > > This vote will run for at least 72 hours. Please VOTE as follows > [] +1 Accept Doris into the Apache Incubator > [] +0 Abstain. > [] -1 Do not accept Doris into the Apache Incubator because ... > > The proposal is listed below, but you can also access it on the wiki: > > https://wiki.apache.org/incubator/DorisProposal > > Best regards, > Dave > > = Apache Doris = > > == Abstract == > > Doris is a MPP-based interactive SQL data warehousing for reporting and > analysis. > > == Proposal == > > We propose to contribute the Doris codebase and associated artifacts > (e.g. documentation, web-site content etc.) to the Apache Software > Foundation, and aim to build an open community around Doris’s continued > development in the ‘Apache Way’. > > === Overview of Doris === > > Doris’s implementation consists of two daemons: Frontend (FE) and > Backend (BE). > > **Frontend daemon** consists of query coordinator and catalog manager. > Query coordinator is responsible for receiving users’ sql queries, > compiling queries and managing queries execution. Catalog manager is > responsible for managing metadata such as databases, tables, partitions, > replicas and etc. Several frontend daemons could be deployed to > guarantee fault-tolerance, and load balancing. > > **Backend daemon** stores the data and executes the query fragments. > Many backend daemons could also be deployed to provide scalability and > fault-tolerance. > > A typical Doris cluster generally composes of several frontend daemons > and dozens to hundreds of backend daemons. > > Users can use MySQL client tools to connect any frontend daemon to > submit SQL query. Frontend receives the query and compiles it into query > plans executable by the Backend. Then Frontend sends the query plan > fragments to Backend. Backend will build a query execution DAG. Data is > fetched and pipelined into the DAG. The final result response is sent to > client via Frontend. The distribution of query fragment execution takes > minimizing data movement and maximizing scan locality as the main goal. > > == Background == > > At Baidu, Prior to Doris, different tools were deployed to solve diverse > requirements in many ways. And when a use case requires the simultaneous > availability of capabilities that cannot all be provided by a single > tool, users were forced to build hybrid architectures that stitch > multiple tools together, but we believe that they shouldn’t need to > accept such inherent complexity. A storage system built to provide great > performance across a broad range of workloads provides a more elegant > solution to the problems that hybrid architectures aim to solve. Doris > is the solution. > > Doris is designed to be a simple and single tightly coupled system, not > depending on other systems. Doris provides high concurrent low latency > point query performance, but also provides high throughput queries of > ad-hoc analysis. Doris provides bulk-batch data loading, but also > provides near real-time mini-batch data loading. Doris also provides > high availability, reliability, fault tolerance, and scalability. > > == Rationale == > > Doris mainly integrates the technology of Google Mesa and Apache Impala. > > Mesa is a highly scalable analytic data storage system that stores > critical measurement data related to Google's Internet advertising > business. Mesa is designed to satisfy complex and challenging set of > users’ and systems’ requirements, including near real-time data > ingestion and query ability, as well as high availability, reliability, > fault tolerance, and scalability for large data and query volumes. > > Impala is a modern, open-source MPP SQL engine architected from the > ground up for the Hadoop data processing environment. At present, by > virtue of its superior performance and rich functionality, Impala has > been comparable to many commercial MPP database query engine. Mesa can > satisfy the needs of many of our storage requirements, however Mesa > itself does not provide a SQL query engine; Impala is a very good MPP > SQL query engine, but the lack of a perfect distributed storage engine. > So in the end we chose the combination of these two technologies. > > Learning from Mesa’s data model, we developed a distributed storage > engine. Unlike Mesa, this storage engine does not rely on any > distributed file system. Then we deeply integrate this storage engine > with Impala query engine. Query compiling, query execution coordination > and catalog management of storage engine are integrated to be frontend > daemon; query execution and data storage are integrated to be backend > daemon. With this integration, we implemented a single, full-featured, > high performance state the art of MPP database, as well as maintaining > the simplicity. > > == Current Status == > > Doris has been an open source project on GitHub > (https://github.com/baidu/palo). > > === Meritocracy === > > Doris has been deployed in production at Baidu and is applying more than > 200 lines of business. It has demonstrated great performance benefits > and has proved to be a better way for reporting and analysis based big > data. Still We look forward to growing a rich user and developer community. > > === Community === > > Doris seeks to develop developer and user communities during incubation. > > Doris makes use of Apache Impala. It was identified during early review > of the proposal that the Doris community will need to work with Impala > to define a suitable API. > > === Core Developers === > > * Ruyue Ma (https://github.com/maruyue, maruyue@baidu dot com) > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail dot com) > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu dot com) > * De Li(https://github.com/lide-reed, mailtolide@sina dot com) > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu dot com) > * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu dot com) > * Bin Lin (https://github.com/lingbin, lingbinlb@gmail dot com) > > === Alignment === > > Doris is related to several other Apache projects: > > * Doris can also read data stored in Apache Hadoop clusters powered by > the HDFS filesystem. > * Doris is closely integrated with Impala, which has graduated from > Apache Incubator. > * Doris uses Apache Thrift as its RPC and serialization framework of > choice. > > == Known Risks == > > === Orphaned Products === > > The core developers of Doris team plan to work full time on this > project. There is very little risk of Doris getting orphaned since at > least one large company (Baidu) is extensively using it in their > production. For example, currently there are more than 200 use cases > using Doris in production. Furthermore, since Doris was open sourced at > the beginning of October 2017, it has received more than 660 stars and > been forked nearly 170 times. We plan to extend and diversify this > community further through Apache. > > === Inexperience with Open Source === > > The core developers are all active users and followers of open source. > They are already committers and contributors to the Doris Github > project. All have been involved with the source code that has been > released under an open source license, and several of them also have > experience developing code in an open source environment. Though the > core set of Developers do not have Apache Open Source experience, there > are plans to onboard individuals with Apache open source experience on > to the project. > > === Homogenous Developers === > > The most of core developers are from Baidu, but after Doris was open > sourced, Doris received a lot of bug fixes and enhancements from other > developers not working at Baidu. > > === Reliance on Salaried Developers === > > Baidu invested in Doris as the OLAP solution and some of its key > engineers are working full time on the project. In addition, since there > is a growing Big Data need for scalable OLAP solutions, we look forward > to other Apache developers and researchers to contribute to the project. > Also key to addressing the risk associated with relying on Salaried > developers from a single entity is to increase the diversity of the > contributors and actively lobby for Domain experts in the BI space to > contribute. Apache Doris intends to do this. > > === An Excessive Fascination with the Apache Brand === > > Doris is proposing to enter incubation at Apache in order to help > efforts to diversify the committer-base, not so much to capitalize on > the Apache brand. The Doris project is in production use already inside > Baidu, but is not expected to be an Baidu product for external > customers. As such, the Doris project is not seeking to use the Apache > brand as a marketing tool. > > == Documentation == > > Information about Doris can be found at https://github.com/baidu/palo. > The following links provide more information about Doris in open source: > > * Doris wiki site: https://github.com/baidu/palo/wiki > * Codebase at Github: https://github.com/baidu/palo > * Issue Tracking: https://github.com/baidu/palo/issues > * Overview: https://github.com/baidu/Doris/wiki/palo-Overview > * FAQ: https://github.com/baidu/palo/wiki/palo-FAQ > > == Initial Source == > > Doris has been under development since 2017 by a team of engineers at > Baidu Inc. It is currently hosted on Github.com <http://Github.com> > under an Apache license at https://github.com/baidu/palo. > > == External Dependencies == > > Doris has the following external dependencies. > > * Google gflags (BSD) > * Google glog (BSD) > * Apache Thrift (Apache Software License v2.0) > * Apache Commons (Apache Software License v2.0) > * Boost (Boost Software License) > * rapidjson (Tencent) > * Google RE2 (BSD-style) > * lz4 (BSD) > * snappy (BSD) > * Twitter Bootstrap (Apache Software License v2.0) > * d3 (BSD) > * LLVM (BSD-like) > > Build and test dependencies: > > * Apache Ant (Apache Software License v2.0) > * Apache Maven (Apache Software License v2.0) > * cmake (BSD) > * clang (BSD) > * Google gtest (Apache Software License v2.0) > > == Required Resources == > > === Mailing List === > > There are currently no mailing lists. The usual mailing lists are > expected to be set up when entering incubation: > > * priv...@doris.incubator.apache.org > <mailto:priv...@doris.incubator.apache.org> > * d...@doris.incubator.apache.org <mailto:d...@doris.incubator.apache.org> > * comm...@doris.incubator.apache.org > <mailto:comm...@doris.incubator.apache.org> > > === Subversion Directory === > > Upon entering incubation, we want to move (or copy) the existing repo > from https://github.com/baidu/palo to Apache infrastructure at > https://github.com/apache/incubator-doris. > > === Issue Tracking === > > Doris currently uses GitHub to track issues. Would like to continue to > do so while we discuss migration possibilities with the ASF Infra committee. > > === Other Resources === > > The existing code already has unit tests so we will make use of existing > Apache continuous testing infrastructure. The resulting load should not > be very large. > > == Initial Committers == > > * Ruyue Ma (https://github.com/maruyue, maruyue@baidu dot com) > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail dot com) > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu dot com) > * De Li(https://github.com/lide-reed, mailtolide@sina dot com) > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu dot com) > * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu dot com) > * Bin Lin (https://github.com/lingbin, lingbinlb@gmail dot com) > * Sijie Guo (guosijie@gmail dot com) > * Zheng Shao (zs...@apache.org <mailto:zs...@apache.org>) > > == Affiliations == > > The initial committers are employees of Baidu Inc.. > > == Sponsors == > > === Champion === > > * Dave Fisher, w...@apache.org <mailto:w...@apache.org> > > === Nominated Mentors === > > * Luke Han, luke...@apache.org <mailto:luke...@apache.org> > * Dave Fisher, w...@apache.org <mailto:w...@apache.org> > * Willem Jiang, ningji...@apache.org <mailto:ningji...@apache.org> > > === Sponsoring Entity === > > We are requesting the Incubator to sponsor this project. > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org