Hi Dave, Thank you very much your help and warmly welcome you as Palo’s Champion and Mentor. About licenses, we known as far as following:
------ 1. aes/* mysql-5.6 GPL v2.1 2. util/mysql_dtoa.cpp Percona Server for MySQL GPL 3. http/mongoose.h mongoose MIT License ------ We will resolve the ASAP. 在 2018/6/8 下午8:59, "Dave Fisher" <dave2w...@comcast.net> 写入: >Hi - > >I’m willing to Champion and Mentor. I have a couple of comments inline. >I’ll look at dependency licenses later today. It’s early for me. > > >> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <l...@baidu.com> wrote: >> >> Hi all, >> >> I am Reed, as a developer worked with the team for Palo (a MPP-based >>interactive SQL data warehousing). >> https://github.com/baidu/palo/wiki/Palo-Overview >> >> We propose to contribute Palo as an Apache Incubator project, and >> we are still looking for possible Champion if anyone would like to >>volunteer. Thanks a lot. >> >> Best Regards, >> Reed >> >> =================== >> The draft of the proposal as below: >> >> #Apache Palo >> >> ##Abstract >> >> Palo is a MPP-based interactive SQL data warehousing for reporting and >>analysis. >> >> ##Proposal >> >> We propose to contribute the Palo codebase and associated artifacts >>(e.g. documentation, web-site content etc.) to the Apache Software >>Foundation with the intent of forming a productive, meritocratic and >>open community around Palo’s continued development, according to the >>‘Apache Way’. >> >> Baidu owns several trademarks regarding Palo, and proposes to transfer >>ownership of those trademarks in full to the ASF. >> >> ###Overview of Palo >> >> Palo’s implementation consists of two daemons: Frontend (FE) and >>Backend (BE). >> >> **Frontend daemon** consists of query coordinator and catalog manager. >>Query coordinator is responsible for receiving users’ sql queries, >>compiling queries and managing queries execution. Catalog manager is >>responsible for managing metadata such as databases, tables, partitions, >>replicas and etc. Several frontend daemons could be deployed to >>guarantee fault-tolerance, and load balancing. >> >> **Backend daemon** stores the data and executes the query fragments. >>Many backend daemons could also be deployed to provide scalability and >>fault-tolerance. >> >> A typical Palo cluster generally composes of several frontend daemons >>and dozens to hundreds of backend daemons. >> >> Users can use MySQL client tools to connect any frontend daemon to >>submit SQL query. Frontend receives the query and compiles it into query >>plans executable by the Backend. Then Frontend sends the query plan >>fragments to Backend. Backend will build a query execution DAG. Data is >>fetched and pipelined into the DAG. The final result response is sent to >>client via Frontend. The distribution of query fragment execution takes >>minimizing data movement and maximizing scan locality as the main goal. >> >> ##Background >> >> At Baidu, Prior to Palo, different tools were deployed to solve diverse >>requirements in many ways. And when a use case requires the simultaneous >>availability of capabilities that cannot all be provided by a single >>tool, users were forced to build hybrid architectures that stitch >>multiple tools together, but we believe that they shouldn’t need to >>accept such inherent complexity. A storage system built to provide great >>performance across a broad range of workloads provides a more elegant >>solution to the problems that hybrid architectures aim to solve. Palo is >>the solution. >> >> Palo is designed to be a simple and single tightly coupled system, not >>depending on other systems. Palo provides high concurrent low latency >>point query performance, but also provides high throughput queries of >>ad-hoc analysis. Palo provides bulk-batch data loading, but also >>provides near real-time mini-batch data loading. Palo also provides high >>availability, reliability, fault tolerance, and scalability. >> >> ##Rationale >> >> Palo mainly integrates the technology of Google Mesa and Apache Impala. >> >> Mesa is a highly scalable analytic data storage system that stores >>critical measurement data related to Google's Internet advertising >>business. Mesa is designed to satisfy complex and challenging set of >>users’ and systems’ requirements, including near real-time data >>ingestion and query ability, as well as high availability, reliability, >>fault tolerance, and scalability for large data and query volumes. >> >> Impala is a modern, open-source MPP SQL engine architected from the >>ground up for the Hadoop data processing environment. At present, by >>virtue of its superior performance and rich functionality, Impala has >>been comparable to many commercial MPP database query engine. Mesa can >>satisfy the needs of many of our storage requirements, however Mesa >>itself does not provide a SQL query engine; Impala is a very good MPP >>SQL query engine, but the lack of a perfect distributed storage engine. >>So in the end we chose the combination of these two technologies. >> >> Learning from Mesa’s data model, we developed a distributed storage >>engine. Unlike Mesa, this storage engine does not rely on any >>distributed file system. Then we deeply integrate this storage engine >>with Impala query engine. Query compiling, query execution coordination >>and catalog management of storage engine are integrated to be frontend >>daemon; query execution and data storage are integrated to be backend >>daemon. With this integration, we implemented a single, full-featured, >>high performance state the art of MPP database, as well as maintaining >>the simplicity. >> >> ##Current Status >> >> Palo has been an open source project on GitHub >>(https://github.com/baidu/palo). >> >> ###Meritocracy >> >> Palo has been deployed in production at Baidu and is applying more than >>200 lines of business. It has demonstrated great performance benefits >>and has proved to be a better way for reporting and analysis based big >>data. Still We look forward to growing a rich user and developer >>community. >> >> ###Community >> >> Palo seeks to develop developer and user communities during incubation. >> >> ###Core Developers >> >> * Ruyue Ma (https://github.com/maruyue, >>maru...@baidu.com<mailto:maru...@baidu.com>) >> * Chun Zhao (https://github.com/imay, >>buaa.zh...@gmail.com<mailto:buaa.zh...@gmail.com>) >> * Mingyu Chen (https://github.com/morningman,chenmin...@baidu.com) >> * De Li(https://github.com/lide-reed, >>mailtol...@sina.com)<mailto:mailtol...@sina.com%EF%BC%89> >> * Hao Chen (https://github.com/chenhao7253886, >>chenha...@baidu.com<mailto:chenha...@baidu.com>) >> * Chaoyong Li (https://github.com/cyongli, >>lichaoy...@baidu.com<mailto:lichaoy...@baidu.com>) >> * Bin Lin (https://github.com/lingbin, >>lingbi...@gmail.com<mailto:lingbi...@gmail.com>) >> >> ###Alignment >> >> Palo is related to several other Apache projects: >> >> * Palo can also read data stored in Apache Hadoop clusters powered by >>the HDFS filesystem. >> * Palo is closely integrated with Impala, which is also being proposed >>to the Incubator. > >Apache Impala has completed Incubation. Jim Apple is VP, Impala. > >> * Palo uses Apache Thrift as its RPC and serialization framework of >>choice. >> >> ##Known Risks >> >> ###Orphaned Products >> >> The core developers of Palo team plan to work full time on this >>project. There is very little risk of Palo getting orphaned since at >>least one large company (Baidu) is extensively using it in their >>production. For example, currently there are more than 200 use cases >>using Palo in production. Furthermore, since Palo was open sourced at >>the beginning of October 2017, it has received more than 660 stars and >>been forked nearly 170 times. We plan to extend and diversify this >>community further through Apache. >> >> ###Inexperience with Open Source >> >> The core developers are all active users and followers of open source. >>They are already committers and contributors to the Palo Github project. >>All have been involved with the source code that has been released under >>an open source license, and several of them also have experience >>developing code in an open source environment. Though the core set of >>Developers do not have Apache Open Source experience, there are plans to >>onboard individuals with Apache open source experience on to the project. >> >> ###Homogenous Developers >> >> The most of core developers are from Baidu, but after Palo was open >>sourced, Palo received a lot of bug fixes and enhancements from other >>developers not working at Baidu. >> >> ###Reliance on Salaried Developers >> >> Baidu invested in Palo as the OLAP solution and some of its key >>engineers are working full time on the project. In addition, since there >>is a growing Big Data need for scalable OLAP solutions, we look forward >>to other Apache developers and researchers to contribute to the project. >>Also key to addressing the risk associated with relying on Salaried >>developers from a single entity is to increase the diversity of the >>contributors and actively lobby for Domain experts in the BI space to >>contribute. Apache Palo intends to do this. >> >> ###An Excessive Fascination with the Apache Brand >> >> Palo is proposing to enter incubation at Apache in order to help >>efforts to diversify the committer-base, not so much to capitalize on >>the Apache brand. The Palo project is in production use already inside >>Baidu, but is not expected to be an Baidu product for external >>customers. As such, the Palo project is not seeking to use the Apache >>brand as a marketing tool. >> >> ##Documentation >> >> Information about Palo can be found at https://github.com/baidu/palo. >>The following links provide more information about Palo in open source: >> >> * Palo wiki site: https://github.com/baidu/palo/wiki >> * Codebase at Github: https://github.com/baidu/palo >> * Issue Tracking: https://github.com/baidu/palo/issues >> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview >> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ >> >> ##Initial Source >> >> Palo has been under development since 2017 by a team of engineers at >>Baidu Inc. It is currently hosted on Github.com under an Apache license >>at https://github.com/baidu/palo. >> >> ##External Dependencies >> >> Palo has the following external dependencies. >> >> * Google gflags (BSD) >> * Google glog (BSD) >> * Apache Thrift (Apache Software License v2.0) >> * Apache Commons (Apache Software License v2.0) >> * Boost (Boost Software License) >> * OpenLdap (OpenLDAP Software License) >> * rapidjson (Tencent) >> * Google RE2 (BSD-style) >> * lz4 (BSD) >> * snappy (BSD) >> * cyrus-sasl (CMU License) >> * Twitter Bootstrap (Apache Software License v2.0) >> * d3 (BSD) >> * LLVM (BSD-like) >> >> Build and test dependencies: >> >> * ant (Apache Software License v2.0) >> * Apache Maven (Apache Software License v2.0) >> * cmake (BSD) >> * clang (BSD) >> * Google gtest (Apache Software License v2.0) >> >> ##Required Resources >> >> ###Mailing List >> >> There are currently no mailing lists. The usual mailing lists are >>expected to be set up when entering incubation: >> >> >>priv...@palo.incubator.apache.org<mailto:priv...@palo.incubator.apache.or >>g> >> d...@palo.incubator.apache.org<mailto:d...@palo.incubator.apache.org> >> >>comm...@palo.incubator.apache.org<mailto:comm...@palo.incubator.apache.or >>g> >> >> ###Subversion Directory >> >> Upon entering incubation: https://github.com/baidu/palo. >> After incubation, we want to move the existing repo from >>https://github.com/baidu/palo to Apache infrastructure. >> >> ###Issue Tracking >> >> Palo currently uses GitHub to track issues. Would like to continue to >>do so while we discuss migration possibilities with the ASF Infra >>committee. >> >> ###Other Resources >> >> The existing code already has unit tests so we will make use of >>existing Apache continuous testing infrastructure. The resulting load >>should not be very large. >> >> ##Initial Committers >> >> * Ruyue Ma (https://github.com/maruyue, >>maru...@baidu.com<mailto:maru...@baidu.com>) >> * Chun Zhao (https://github.com/imay, >>buaa.zh...@gmail.com<mailto:buaa.zh...@gmail.com>) >> * Mingyu Chen (https://github.com/morningman,chenmin...@baidu.com) >> * De Li(https://github.com/lide-reed, >>mailtol...@sina.com)<mailto:mailtol...@sina.com%EF%BC%89> >> * Hao Chen (https://github.com/chenhao7253886, >>chenha...@baidu.com<mailto:chenha...@baidu.com>) >> * Chaoyong Li (https://github.com/cyongli, >>lichaoy...@baidu.com<mailto:lichaoy...@baidu.com>) >> * Bin Lin (https://github.com/lingbin, >>lingbi...@gmail.com<mailto:lingbi...@gmail.com>) >> >> ##Affiliations >> >> The initial committers are employees of Baidu Inc.. The nominated >>mentors are employees of TODO. >> >> ##Sponsors >> >> ###Champion >> >> TODO >> >> ###Nominated Mentors >> >> * sijie guo, guosi...@gmail.com<mailto:guosi...@gmail.com> >> * Luke Han, luke...@apache.org<mailto:luke...@apache.org> >> * Zheng Shao, zs...@apache.org<mailto:zs...@apache.org> > >Mentors must be members of the IPMC and almost always Members of the ASF. > >At this moment only Luke Han is qualified. > >Regards, >Dave > >> >> ###Sponsoring Entity >> >> We are requesting the Incubator to sponsor this project. >