Thank you Willem, warmly welcome. 在 2018/6/8 下午11:03, "Willem Jiang" <willem.ji...@gmail.com> 写入:
>Hi, > >I'm willing to be the Mentor. >Please count me in. > > > >Willem Jiang > >Twitter: willemjiang >Weibo: 姜宁willem > >On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <dave2w...@comcast.net> wrote: > >> Hi - >> >> I’m willing to Champion and Mentor. I have a couple of comments inline. >> I’ll look at dependency licenses later today. It’s early for me. >> >> >> > On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <l...@baidu.com> wrote: >> > >> > Hi all, >> > >> > I am Reed, as a developer worked with the team for Palo (a MPP-based >> interactive SQL data warehousing). >> > https://github.com/baidu/palo/wiki/Palo-Overview >> > >> > We propose to contribute Palo as an Apache Incubator project, and >> > we are still looking for possible Champion if anyone would like to >> volunteer. Thanks a lot. >> > >> > Best Regards, >> > Reed >> > >> > =================== >> > The draft of the proposal as below: >> > >> > #Apache Palo >> > >> > ##Abstract >> > >> > Palo is a MPP-based interactive SQL data warehousing for reporting and >> analysis. >> > >> > ##Proposal >> > >> > We propose to contribute the Palo codebase and associated artifacts >> (e.g. documentation, web-site content etc.) to the Apache Software >> Foundation with the intent of forming a productive, meritocratic and >>open >> community around Palo’s continued development, according to the ‘Apache >> Way’. >> > >> > Baidu owns several trademarks regarding Palo, and proposes to transfer >> ownership of those trademarks in full to the ASF. >> > >> > ###Overview of Palo >> > >> > Palo’s implementation consists of two daemons: Frontend (FE) and >>Backend >> (BE). >> > >> > **Frontend daemon** consists of query coordinator and catalog manager. >> Query coordinator is responsible for receiving users’ sql queries, >> compiling queries and managing queries execution. Catalog manager is >> responsible for managing metadata such as databases, tables, partitions, >> replicas and etc. Several frontend daemons could be deployed to >>guarantee >> fault-tolerance, and load balancing. >> > >> > **Backend daemon** stores the data and executes the query fragments. >> Many backend daemons could also be deployed to provide scalability and >> fault-tolerance. >> > >> > A typical Palo cluster generally composes of several frontend daemons >> and dozens to hundreds of backend daemons. >> > >> > Users can use MySQL client tools to connect any frontend daemon to >> submit SQL query. Frontend receives the query and compiles it into query >> plans executable by the Backend. Then Frontend sends the query plan >> fragments to Backend. Backend will build a query execution DAG. Data is >> fetched and pipelined into the DAG. The final result response is sent to >> client via Frontend. The distribution of query fragment execution takes >> minimizing data movement and maximizing scan locality as the main goal. >> > >> > ##Background >> > >> > At Baidu, Prior to Palo, different tools were deployed to solve >>diverse >> requirements in many ways. And when a use case requires the simultaneous >> availability of capabilities that cannot all be provided by a single >>tool, >> users were forced to build hybrid architectures that stitch multiple >>tools >> together, but we believe that they shouldn’t need to accept such >>inherent >> complexity. A storage system built to provide great performance across a >> broad range of workloads provides a more elegant solution to the >>problems >> that hybrid architectures aim to solve. Palo is the solution. >> > >> > Palo is designed to be a simple and single tightly coupled system, not >> depending on other systems. Palo provides high concurrent low latency >>point >> query performance, but also provides high throughput queries of ad-hoc >> analysis. Palo provides bulk-batch data loading, but also provides near >> real-time mini-batch data loading. Palo also provides high availability, >> reliability, fault tolerance, and scalability. >> > >> > ##Rationale >> > >> > Palo mainly integrates the technology of Google Mesa and Apache >>Impala. >> > >> > Mesa is a highly scalable analytic data storage system that stores >> critical measurement data related to Google's Internet advertising >> business. Mesa is designed to satisfy complex and challenging set of >>users’ >> and systems’ requirements, including near real-time data ingestion and >> query ability, as well as high availability, reliability, fault >>tolerance, >> and scalability for large data and query volumes. >> > >> > Impala is a modern, open-source MPP SQL engine architected from the >> ground up for the Hadoop data processing environment. At present, by >>virtue >> of its superior performance and rich functionality, Impala has been >> comparable to many commercial MPP database query engine. Mesa can >>satisfy >> the needs of many of our storage requirements, however Mesa itself does >>not >> provide a SQL query engine; Impala is a very good MPP SQL query engine, >>but >> the lack of a perfect distributed storage engine. So in the end we chose >> the combination of these two technologies. >> > >> > Learning from Mesa’s data model, we developed a distributed storage >> engine. Unlike Mesa, this storage engine does not rely on any >>distributed >> file system. Then we deeply integrate this storage engine with Impala >>query >> engine. Query compiling, query execution coordination and catalog >> management of storage engine are integrated to be frontend daemon; query >> execution and data storage are integrated to be backend daemon. With >>this >> integration, we implemented a single, full-featured, high performance >>state >> the art of MPP database, as well as maintaining the simplicity. >> > >> > ##Current Status >> > >> > Palo has been an open source project on GitHub ( >> https://github.com/baidu/palo). >> > >> > ###Meritocracy >> > >> > Palo has been deployed in production at Baidu and is applying more >>than >> 200 lines of business. It has demonstrated great performance benefits >>and >> has proved to be a better way for reporting and analysis based big data. >> Still We look forward to growing a rich user and developer community. >> > >> > ###Community >> > >> > Palo seeks to develop developer and user communities during >>incubation. >> > >> > ###Core Developers >> > >> > * Ruyue Ma (https://github.com/maruyue, maru...@baidu.com<mailto:maruy >> u...@baidu.com>) >> > * Chun Zhao (https://github.com/imay, buaa.zh...@gmail.com<mailto:bu >> aa.zh...@gmail.com>) >> > * Mingyu Chen (https://github.com/morningman,chenmin...@baidu.com) >> > * De Li(https://github.com/lide-reed, mailtol...@sina.com)<mailto:ma >> iltol...@sina.com%EF%BC%89> >> > * Hao Chen (https://github.com/chenhao7253886, chenha...@baidu.com >> <mailto:chenha...@baidu.com>) >> > * Chaoyong Li (https://github.com/cyongli, >>lichaoy...@baidu.com<mailto: >> lichaoy...@baidu.com>) >> > * Bin Lin (https://github.com/lingbin, lingbi...@gmail.com<mailto:lin >> gbi...@gmail.com>) >> > >> > ###Alignment >> > >> > Palo is related to several other Apache projects: >> > >> > * Palo can also read data stored in Apache Hadoop clusters powered by >> the HDFS filesystem. >> > * Palo is closely integrated with Impala, which is also being proposed >> to the Incubator. >> >> Apache Impala has completed Incubation. Jim Apple is VP, Impala. >> >> > * Palo uses Apache Thrift as its RPC and serialization framework of >> choice. >> > >> > ##Known Risks >> > >> > ###Orphaned Products >> > >> > The core developers of Palo team plan to work full time on this >>project. >> There is very little risk of Palo getting orphaned since at least one >>large >> company (Baidu) is extensively using it in their production. For >>example, >> currently there are more than 200 use cases using Palo in production. >> Furthermore, since Palo was open sourced at the beginning of October >>2017, >> it has received more than 660 stars and been forked nearly 170 times. We >> plan to extend and diversify this community further through Apache. >> > >> > ###Inexperience with Open Source >> > >> > The core developers are all active users and followers of open source. >> They are already committers and contributors to the Palo Github project. >> All have been involved with the source code that has been released >>under an >> open source license, and several of them also have experience developing >> code in an open source environment. Though the core set of Developers do >> not have Apache Open Source experience, there are plans to onboard >> individuals with Apache open source experience on to the project. >> > >> > ###Homogenous Developers >> > >> > The most of core developers are from Baidu, but after Palo was open >> sourced, Palo received a lot of bug fixes and enhancements from other >> developers not working at Baidu. >> > >> > ###Reliance on Salaried Developers >> > >> > Baidu invested in Palo as the OLAP solution and some of its key >> engineers are working full time on the project. In addition, since >>there is >> a growing Big Data need for scalable OLAP solutions, we look forward to >> other Apache developers and researchers to contribute to the project. >>Also >> key to addressing the risk associated with relying on Salaried >>developers >> from a single entity is to increase the diversity of the contributors >>and >> actively lobby for Domain experts in the BI space to contribute. Apache >> Palo intends to do this. >> > >> > ###An Excessive Fascination with the Apache Brand >> > >> > Palo is proposing to enter incubation at Apache in order to help >>efforts >> to diversify the committer-base, not so much to capitalize on the Apache >> brand. The Palo project is in production use already inside Baidu, but >>is >> not expected to be an Baidu product for external customers. As such, the >> Palo project is not seeking to use the Apache brand as a marketing tool. >> > >> > ##Documentation >> > >> > Information about Palo can be found at https://github.com/baidu/palo. >> The following links provide more information about Palo in open source: >> > >> > * Palo wiki site: https://github.com/baidu/palo/wiki >> > * Codebase at Github: https://github.com/baidu/palo >> > * Issue Tracking: https://github.com/baidu/palo/issues >> > * Overview: https://github.com/baidu/palo/wiki/Palo-Overview >> > * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ >> > >> > ##Initial Source >> > >> > Palo has been under development since 2017 by a team of engineers at >> Baidu Inc. It is currently hosted on Github.com under an Apache license >>at >> https://github.com/baidu/palo. >> > >> > ##External Dependencies >> > >> > Palo has the following external dependencies. >> > >> > * Google gflags (BSD) >> > * Google glog (BSD) >> > * Apache Thrift (Apache Software License v2.0) >> > * Apache Commons (Apache Software License v2.0) >> > * Boost (Boost Software License) >> > * OpenLdap (OpenLDAP Software License) >> > * rapidjson (Tencent) >> > * Google RE2 (BSD-style) >> > * lz4 (BSD) >> > * snappy (BSD) >> > * cyrus-sasl (CMU License) >> > * Twitter Bootstrap (Apache Software License v2.0) >> > * d3 (BSD) >> > * LLVM (BSD-like) >> > >> > Build and test dependencies: >> > >> > * ant (Apache Software License v2.0) >> > * Apache Maven (Apache Software License v2.0) >> > * cmake (BSD) >> > * clang (BSD) >> > * Google gtest (Apache Software License v2.0) >> > >> > ##Required Resources >> > >> > ###Mailing List >> > >> > There are currently no mailing lists. The usual mailing lists are >> expected to be set up when entering incubation: >> > >> > priv...@palo.incubator.apache.org<mailto:private@palo. >> incubator.apache.org> >> > d...@palo.incubator.apache.org<mailto:d...@palo.incubator.apache.org> >> > comm...@palo.incubator.apache.org<mailto:commits@palo. >> incubator.apache.org> >> > >> > ###Subversion Directory >> > >> > Upon entering incubation: https://github.com/baidu/palo. >> > After incubation, we want to move the existing repo from >> https://github.com/baidu/palo to Apache infrastructure. >> > >> > ###Issue Tracking >> > >> > Palo currently uses GitHub to track issues. Would like to continue to >>do >> so while we discuss migration possibilities with the ASF Infra >>committee. >> > >> > ###Other Resources >> > >> > The existing code already has unit tests so we will make use of >>existing >> Apache continuous testing infrastructure. The resulting load should not >>be >> very large. >> > >> > ##Initial Committers >> > >> > * Ruyue Ma (https://github.com/maruyue, maru...@baidu.com<mailto:maruy >> u...@baidu.com>) >> > * Chun Zhao (https://github.com/imay, buaa.zh...@gmail.com<mailto:bu >> aa.zh...@gmail.com>) >> > * Mingyu Chen (https://github.com/morningman,chenmin...@baidu.com) >> > * De Li(https://github.com/lide-reed, mailtol...@sina.com)<mailto:ma >> iltol...@sina.com%EF%BC%89> >> > * Hao Chen (https://github.com/chenhao7253886, chenha...@baidu.com >> <mailto:chenha...@baidu.com>) >> > * Chaoyong Li (https://github.com/cyongli, >>lichaoy...@baidu.com<mailto: >> lichaoy...@baidu.com>) >> > * Bin Lin (https://github.com/lingbin, lingbi...@gmail.com<mailto:lin >> gbi...@gmail.com>) >> > >> > ##Affiliations >> > >> > The initial committers are employees of Baidu Inc.. The nominated >> mentors are employees of TODO. >> > >> > ##Sponsors >> > >> > ###Champion >> > >> > TODO >> > >> > ###Nominated Mentors >> > >> > * sijie guo, guosi...@gmail.com<mailto:guosi...@gmail.com> >> > * Luke Han, luke...@apache.org<mailto:luke...@apache.org> >> > * Zheng Shao, zs...@apache.org<mailto:zs...@apache.org> >> >> Mentors must be members of the IPMC and almost always Members of the >>ASF. >> >> At this moment only Luke Han is qualified. >> >> Regards, >> Dave >> >> > >> > ###Sponsoring Entity >> > >> > We are requesting the Incubator to sponsor this project. >> >>