Hi, I'm willing to be the Mentor. Please count me in.
Willem Jiang Twitter: willemjiang Weibo: 姜宁willem On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <dave2w...@comcast.net> wrote: > Hi - > > I’m willing to Champion and Mentor. I have a couple of comments inline. > I’ll look at dependency licenses later today. It’s early for me. > > > > On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <l...@baidu.com> wrote: > > > > Hi all, > > > > I am Reed, as a developer worked with the team for Palo (a MPP-based > interactive SQL data warehousing). > > https://github.com/baidu/palo/wiki/Palo-Overview > > > > We propose to contribute Palo as an Apache Incubator project, and > > we are still looking for possible Champion if anyone would like to > volunteer. Thanks a lot. > > > > Best Regards, > > Reed > > > > =================== > > The draft of the proposal as below: > > > > #Apache Palo > > > > ##Abstract > > > > Palo is a MPP-based interactive SQL data warehousing for reporting and > analysis. > > > > ##Proposal > > > > We propose to contribute the Palo codebase and associated artifacts > (e.g. documentation, web-site content etc.) to the Apache Software > Foundation with the intent of forming a productive, meritocratic and open > community around Palo’s continued development, according to the ‘Apache > Way’. > > > > Baidu owns several trademarks regarding Palo, and proposes to transfer > ownership of those trademarks in full to the ASF. > > > > ###Overview of Palo > > > > Palo’s implementation consists of two daemons: Frontend (FE) and Backend > (BE). > > > > **Frontend daemon** consists of query coordinator and catalog manager. > Query coordinator is responsible for receiving users’ sql queries, > compiling queries and managing queries execution. Catalog manager is > responsible for managing metadata such as databases, tables, partitions, > replicas and etc. Several frontend daemons could be deployed to guarantee > fault-tolerance, and load balancing. > > > > **Backend daemon** stores the data and executes the query fragments. > Many backend daemons could also be deployed to provide scalability and > fault-tolerance. > > > > A typical Palo cluster generally composes of several frontend daemons > and dozens to hundreds of backend daemons. > > > > Users can use MySQL client tools to connect any frontend daemon to > submit SQL query. Frontend receives the query and compiles it into query > plans executable by the Backend. Then Frontend sends the query plan > fragments to Backend. Backend will build a query execution DAG. Data is > fetched and pipelined into the DAG. The final result response is sent to > client via Frontend. The distribution of query fragment execution takes > minimizing data movement and maximizing scan locality as the main goal. > > > > ##Background > > > > At Baidu, Prior to Palo, different tools were deployed to solve diverse > requirements in many ways. And when a use case requires the simultaneous > availability of capabilities that cannot all be provided by a single tool, > users were forced to build hybrid architectures that stitch multiple tools > together, but we believe that they shouldn’t need to accept such inherent > complexity. A storage system built to provide great performance across a > broad range of workloads provides a more elegant solution to the problems > that hybrid architectures aim to solve. Palo is the solution. > > > > Palo is designed to be a simple and single tightly coupled system, not > depending on other systems. Palo provides high concurrent low latency point > query performance, but also provides high throughput queries of ad-hoc > analysis. Palo provides bulk-batch data loading, but also provides near > real-time mini-batch data loading. Palo also provides high availability, > reliability, fault tolerance, and scalability. > > > > ##Rationale > > > > Palo mainly integrates the technology of Google Mesa and Apache Impala. > > > > Mesa is a highly scalable analytic data storage system that stores > critical measurement data related to Google's Internet advertising > business. Mesa is designed to satisfy complex and challenging set of users’ > and systems’ requirements, including near real-time data ingestion and > query ability, as well as high availability, reliability, fault tolerance, > and scalability for large data and query volumes. > > > > Impala is a modern, open-source MPP SQL engine architected from the > ground up for the Hadoop data processing environment. At present, by virtue > of its superior performance and rich functionality, Impala has been > comparable to many commercial MPP database query engine. Mesa can satisfy > the needs of many of our storage requirements, however Mesa itself does not > provide a SQL query engine; Impala is a very good MPP SQL query engine, but > the lack of a perfect distributed storage engine. So in the end we chose > the combination of these two technologies. > > > > Learning from Mesa’s data model, we developed a distributed storage > engine. Unlike Mesa, this storage engine does not rely on any distributed > file system. Then we deeply integrate this storage engine with Impala query > engine. Query compiling, query execution coordination and catalog > management of storage engine are integrated to be frontend daemon; query > execution and data storage are integrated to be backend daemon. With this > integration, we implemented a single, full-featured, high performance state > the art of MPP database, as well as maintaining the simplicity. > > > > ##Current Status > > > > Palo has been an open source project on GitHub ( > https://github.com/baidu/palo). > > > > ###Meritocracy > > > > Palo has been deployed in production at Baidu and is applying more than > 200 lines of business. It has demonstrated great performance benefits and > has proved to be a better way for reporting and analysis based big data. > Still We look forward to growing a rich user and developer community. > > > > ###Community > > > > Palo seeks to develop developer and user communities during incubation. > > > > ###Core Developers > > > > * Ruyue Ma (https://github.com/maruyue, maru...@baidu.com<mailto:maruy > u...@baidu.com>) > > * Chun Zhao (https://github.com/imay, buaa.zh...@gmail.com<mailto:bu > aa.zh...@gmail.com>) > > * Mingyu Chen (https://github.com/morningman,chenmin...@baidu.com) > > * De Li(https://github.com/lide-reed, mailtol...@sina.com)<mailto:ma > iltol...@sina.com%EF%BC%89> > > * Hao Chen (https://github.com/chenhao7253886, chenha...@baidu.com > <mailto:chenha...@baidu.com>) > > * Chaoyong Li (https://github.com/cyongli, lichaoy...@baidu.com<mailto: > lichaoy...@baidu.com>) > > * Bin Lin (https://github.com/lingbin, lingbi...@gmail.com<mailto:lin > gbi...@gmail.com>) > > > > ###Alignment > > > > Palo is related to several other Apache projects: > > > > * Palo can also read data stored in Apache Hadoop clusters powered by > the HDFS filesystem. > > * Palo is closely integrated with Impala, which is also being proposed > to the Incubator. > > Apache Impala has completed Incubation. Jim Apple is VP, Impala. > > > * Palo uses Apache Thrift as its RPC and serialization framework of > choice. > > > > ##Known Risks > > > > ###Orphaned Products > > > > The core developers of Palo team plan to work full time on this project. > There is very little risk of Palo getting orphaned since at least one large > company (Baidu) is extensively using it in their production. For example, > currently there are more than 200 use cases using Palo in production. > Furthermore, since Palo was open sourced at the beginning of October 2017, > it has received more than 660 stars and been forked nearly 170 times. We > plan to extend and diversify this community further through Apache. > > > > ###Inexperience with Open Source > > > > The core developers are all active users and followers of open source. > They are already committers and contributors to the Palo Github project. > All have been involved with the source code that has been released under an > open source license, and several of them also have experience developing > code in an open source environment. Though the core set of Developers do > not have Apache Open Source experience, there are plans to onboard > individuals with Apache open source experience on to the project. > > > > ###Homogenous Developers > > > > The most of core developers are from Baidu, but after Palo was open > sourced, Palo received a lot of bug fixes and enhancements from other > developers not working at Baidu. > > > > ###Reliance on Salaried Developers > > > > Baidu invested in Palo as the OLAP solution and some of its key > engineers are working full time on the project. In addition, since there is > a growing Big Data need for scalable OLAP solutions, we look forward to > other Apache developers and researchers to contribute to the project. Also > key to addressing the risk associated with relying on Salaried developers > from a single entity is to increase the diversity of the contributors and > actively lobby for Domain experts in the BI space to contribute. Apache > Palo intends to do this. > > > > ###An Excessive Fascination with the Apache Brand > > > > Palo is proposing to enter incubation at Apache in order to help efforts > to diversify the committer-base, not so much to capitalize on the Apache > brand. The Palo project is in production use already inside Baidu, but is > not expected to be an Baidu product for external customers. As such, the > Palo project is not seeking to use the Apache brand as a marketing tool. > > > > ##Documentation > > > > Information about Palo can be found at https://github.com/baidu/palo. > The following links provide more information about Palo in open source: > > > > * Palo wiki site: https://github.com/baidu/palo/wiki > > * Codebase at Github: https://github.com/baidu/palo > > * Issue Tracking: https://github.com/baidu/palo/issues > > * Overview: https://github.com/baidu/palo/wiki/Palo-Overview > > * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ > > > > ##Initial Source > > > > Palo has been under development since 2017 by a team of engineers at > Baidu Inc. It is currently hosted on Github.com under an Apache license at > https://github.com/baidu/palo. > > > > ##External Dependencies > > > > Palo has the following external dependencies. > > > > * Google gflags (BSD) > > * Google glog (BSD) > > * Apache Thrift (Apache Software License v2.0) > > * Apache Commons (Apache Software License v2.0) > > * Boost (Boost Software License) > > * OpenLdap (OpenLDAP Software License) > > * rapidjson (Tencent) > > * Google RE2 (BSD-style) > > * lz4 (BSD) > > * snappy (BSD) > > * cyrus-sasl (CMU License) > > * Twitter Bootstrap (Apache Software License v2.0) > > * d3 (BSD) > > * LLVM (BSD-like) > > > > Build and test dependencies: > > > > * ant (Apache Software License v2.0) > > * Apache Maven (Apache Software License v2.0) > > * cmake (BSD) > > * clang (BSD) > > * Google gtest (Apache Software License v2.0) > > > > ##Required Resources > > > > ###Mailing List > > > > There are currently no mailing lists. The usual mailing lists are > expected to be set up when entering incubation: > > > > priv...@palo.incubator.apache.org<mailto:private@palo. > incubator.apache.org> > > d...@palo.incubator.apache.org<mailto:d...@palo.incubator.apache.org> > > comm...@palo.incubator.apache.org<mailto:commits@palo. > incubator.apache.org> > > > > ###Subversion Directory > > > > Upon entering incubation: https://github.com/baidu/palo. > > After incubation, we want to move the existing repo from > https://github.com/baidu/palo to Apache infrastructure. > > > > ###Issue Tracking > > > > Palo currently uses GitHub to track issues. Would like to continue to do > so while we discuss migration possibilities with the ASF Infra committee. > > > > ###Other Resources > > > > The existing code already has unit tests so we will make use of existing > Apache continuous testing infrastructure. The resulting load should not be > very large. > > > > ##Initial Committers > > > > * Ruyue Ma (https://github.com/maruyue, maru...@baidu.com<mailto:maruy > u...@baidu.com>) > > * Chun Zhao (https://github.com/imay, buaa.zh...@gmail.com<mailto:bu > aa.zh...@gmail.com>) > > * Mingyu Chen (https://github.com/morningman,chenmin...@baidu.com) > > * De Li(https://github.com/lide-reed, mailtol...@sina.com)<mailto:ma > iltol...@sina.com%EF%BC%89> > > * Hao Chen (https://github.com/chenhao7253886, chenha...@baidu.com > <mailto:chenha...@baidu.com>) > > * Chaoyong Li (https://github.com/cyongli, lichaoy...@baidu.com<mailto: > lichaoy...@baidu.com>) > > * Bin Lin (https://github.com/lingbin, lingbi...@gmail.com<mailto:lin > gbi...@gmail.com>) > > > > ##Affiliations > > > > The initial committers are employees of Baidu Inc.. The nominated > mentors are employees of TODO. > > > > ##Sponsors > > > > ###Champion > > > > TODO > > > > ###Nominated Mentors > > > > * sijie guo, guosi...@gmail.com<mailto:guosi...@gmail.com> > > * Luke Han, luke...@apache.org<mailto:luke...@apache.org> > > * Zheng Shao, zs...@apache.org<mailto:zs...@apache.org> > > Mentors must be members of the IPMC and almost always Members of the ASF. > > At this moment only Luke Han is qualified. > > Regards, > Dave > > > > > ###Sponsoring Entity > > > > We are requesting the Incubator to sponsor this project. > >