Hi Todd, Thank you for your response.
It is serious mistake to replace Oracle license to Apache when updating license with a script. We have not check carefully, actually, those file no longer been used. So I removed them and made a new commit. https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4180b30bf b7 Best Regards, Reed 在 2018/6/9 上午12:37, "Todd Lipcon" <t...@cloudera.com> 写入: >On Fri, Jun 8, 2018 at 9:18 AM, Tim Armstrong <tarmstr...@cloudera.com> >wrote: > >> > Meanwhile we found Impala is a very good MPP SQL query engine, so we >> integrated >> them together. >> >> Palo didn't integrate with Impala, it forked Impala's codebase and >>embedded >> it in its own repository. I don't remember any attempts from the Palo >>team >> to engage with the Impala community or attempt to work with us to >> contribute any improvements. >> >> It looks like Palo is still pulling in new code from Impala. E.g. this >> commit includes a bunch of code I wrote as part of IMPALA-3200: >> https://github.com/baidu/palo/commit/2419384e8a211f10e7636afc6d3423 >> 700ba22b5a#diff-1c501d9a8b5c3d1d1cce48d5e1fb0edf >> >> The code isn't owned by any individual, I contributed it to Apache and >>it's >> free for anyone to do what they want to do with it, but pulling in >> improvements from other projects without any attempt to attribute it or >> contribute improvements back seems contrary to the Apache way. >> > >+1. Also briefly browsing the code I found suspicious commits like this >one: >https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f1 >82e > >... in which a GPL license copyright by Oracle was "fixed" to be an Apache >license copyright Baidu. > >So if this project does enter incubation I think we should be extra >careful >to audit the origins of all of the source code. > >-Todd > > >> On Fri, Jun 8, 2018 at 9:12 AM, Todd Lipcon <t...@cloudera.com> wrote: >> >> > On Thu, Jun 7, 2018 at 11:55 PM, Li,De(BDG) <l...@baidu.com> wrote: >> > >> > > Hi, Jim >> > > >> > > Thank you for your response. >> > > Actually, we start Palo in several years ago, and that time we >> developed >> > > the storage engine based on Mesa technology. >> > > Meanwhile we found Impala is a very good MPP SQL query engine, so we >> > > integrated them together. >> > > >> > >> > From what I can tell of the Palo source, it's not so much an >>integration >> as >> > a copied-and-modified codebase, right? i.e Palo does not use Impala >>as a >> > dependency, but rather shares a lot of code from the Impala project >>that >> > has since diverged. >> > >> > >> > > >> > > With this integration, the goal of Palo is to implement a single, >> > > full-featured, mysql protocol compatible data warehousing. >> > > >> > >> > That sounds pretty similar to the goals of the Impala project. Impala >> isn't >> > MySQL-compatible at the moment but that seems more like a particular >> > feature that could be added rather than a distinct identity of the >> project. >> > Otherwise, Impala's goal is to be a full featured data warehouse >>engine >> as >> > well. >> > >> > Generally Apache has no rules against multiple projects fulfilling >> similar >> > goals or use cases, even when those projects might compete. However I >> think >> > it would be relatively unusual to incubate a project that appears to >>be >> > derived from a fork of an existing project, at least without first >> > considering whether the additional feature set could be contributed >>back >> to >> > the existing community. >> > >> > -Todd >> > >> > >> > > 在 2018/6/8 下午1:55, "Jim Apple" <jbap...@apache.org> 写入: >> > > >> > > >Hello! As a contributor to Impala, I’d be interested in hearing >> thoughts >> > > >from the Palo community about integration between Impala and Palo. >> > > > >> > > >For instance, are there any apparent design goals of Impala that >>the >> > Palo >> > > >community thinks are fundamentally incompatible with Palo? >> > > > >> > > >Thanks, >> > > >Jim >> > > > >> > > >On 2018/06/08 04:45:32, "Li,De(BDG)" <l...@baidu.com> wrote: >> > > >> Hi all, >> > > >> >> > > >> I am Reed, as a developer worked with the team for Palo (a >>MPP-based >> > > >>interactive SQL data warehousing). >> > > >> https://github.com/baidu/palo/wiki/Palo-Overview >> > > >> >> > > >> We propose to contribute Palo as an Apache Incubator project, and >> > > >> we are still looking for possible Champion if anyone would like >>to >> > > >>volunteer. Thanks a lot. >> > > >> >> > > >> Best Regards, >> > > >> Reed >> > > >> >> > > >> =================== >> > > >> The draft of the proposal as below: >> > > >> >> > > >> #Apache Palo >> > > >> >> > > >> ##Abstract >> > > >> >> > > >> Palo is a MPP-based interactive SQL data warehousing for >>reporting >> and >> > > >>analysis. >> > > >> >> > > >> ##Proposal >> > > >> >> > > >> We propose to contribute the Palo codebase and associated >>artifacts >> > > >>(e.g. documentation, web-site content etc.) to the Apache Software >> > > >>Foundation with the intent of forming a productive, meritocratic >>and >> > > >>open community around Palo’s continued development, according to >>the >> > > >>‘Apache Way’. >> > > >> >> > > >> Baidu owns several trademarks regarding Palo, and proposes to >> transfer >> > > >>ownership of those trademarks in full to the ASF. >> > > >> >> > > >> ###Overview of Palo >> > > >> >> > > >> Palo’s implementation consists of two daemons: Frontend (FE) and >> > > >>Backend (BE). >> > > >> >> > > >> **Frontend daemon** consists of query coordinator and catalog >> manager. >> > > >>Query coordinator is responsible for receiving users’ sql queries, >> > > >>compiling queries and managing queries execution. Catalog manager >>is >> > > >>responsible for managing metadata such as databases, tables, >> > partitions, >> > > >>replicas and etc. Several frontend daemons could be deployed to >> > > >>guarantee fault-tolerance, and load balancing. >> > > >> >> > > >> **Backend daemon** stores the data and executes the query >>fragments. >> > > >>Many backend daemons could also be deployed to provide scalability >> and >> > > >>fault-tolerance. >> > > >> >> > > >> A typical Palo cluster generally composes of several frontend >> daemons >> > > >>and dozens to hundreds of backend daemons. >> > > >> >> > > >> Users can use MySQL client tools to connect any frontend daemon >>to >> > > >>submit SQL query. Frontend receives the query and compiles it into >> > query >> > > >>plans executable by the Backend. Then Frontend sends the query >>plan >> > > >>fragments to Backend. Backend will build a query execution DAG. >>Data >> is >> > > >>fetched and pipelined into the DAG. The final result response is >>sent >> > to >> > > >>client via Frontend. The distribution of query fragment execution >> takes >> > > >>minimizing data movement and maximizing scan locality as the main >> goal. >> > > >> >> > > >> ##Background >> > > >> >> > > >> At Baidu, Prior to Palo, different tools were deployed to solve >> > diverse >> > > >>requirements in many ways. And when a use case requires the >> > simultaneous >> > > >>availability of capabilities that cannot all be provided by a >>single >> > > >>tool, users were forced to build hybrid architectures that stitch >> > > >>multiple tools together, but we believe that they shouldn’t need >>to >> > > >>accept such inherent complexity. A storage system built to provide >> > great >> > > >>performance across a broad range of workloads provides a more >>elegant >> > > >>solution to the problems that hybrid architectures aim to solve. >>Palo >> > is >> > > >>the solution. >> > > >> >> > > >> Palo is designed to be a simple and single tightly coupled >>system, >> not >> > > >>depending on other systems. Palo provides high concurrent low >>latency >> > > >>point query performance, but also provides high throughput >>queries of >> > > >>ad-hoc analysis. Palo provides bulk-batch data loading, but also >> > > >>provides near real-time mini-batch data loading. Palo also >>provides >> > high >> > > >>availability, reliability, fault tolerance, and scalability. >> > > >> >> > > >> ##Rationale >> > > >> >> > > >> Palo mainly integrates the technology of Google Mesa and Apache >> > Impala. >> > > >> >> > > >> Mesa is a highly scalable analytic data storage system that >>stores >> > > >>critical measurement data related to Google's Internet advertising >> > > >>business. Mesa is designed to satisfy complex and challenging set >>of >> > > >>users’ and systems’ requirements, including near real-time data >> > > >>ingestion and query ability, as well as high availability, >> reliability, >> > > >>fault tolerance, and scalability for large data and query volumes. >> > > >> >> > > >> Impala is a modern, open-source MPP SQL engine architected from >>the >> > > >>ground up for the Hadoop data processing environment. At present, >>by >> > > >>virtue of its superior performance and rich functionality, Impala >>has >> > > >>been comparable to many commercial MPP database query engine. Mesa >> can >> > > >>satisfy the needs of many of our storage requirements, however >>Mesa >> > > >>itself does not provide a SQL query engine; Impala is a very good >>MPP >> > > >>SQL query engine, but the lack of a perfect distributed storage >> engine. >> > > >>So in the end we chose the combination of these two technologies. >> > > >> >> > > >> Learning from Mesa’s data model, we developed a distributed >>storage >> > > >>engine. Unlike Mesa, this storage engine does not rely on any >> > > >>distributed file system. Then we deeply integrate this storage >>engine >> > > >>with Impala query engine. Query compiling, query execution >> coordination >> > > >>and catalog management of storage engine are integrated to be >> frontend >> > > >>daemon; query execution and data storage are integrated to be >>backend >> > > >>daemon. With this integration, we implemented a single, >> full-featured, >> > > >>high performance state the art of MPP database, as well as >> maintaining >> > > >>the simplicity. >> > > >> >> > > >> ##Current Status >> > > >> >> > > >> Palo has been an open source project on GitHub >> > > >>(https://github.com/baidu/palo). >> > > >> >> > > >> ###Meritocracy >> > > >> >> > > >> Palo has been deployed in production at Baidu and is applying >>more >> > than >> > > >>200 lines of business. It has demonstrated great performance >>benefits >> > > >>and has proved to be a better way for reporting and analysis based >> big >> > > >>data. Still We look forward to growing a rich user and developer >> > > >>community. >> > > >> >> > > >> ###Community >> > > >> >> > > >> Palo seeks to develop developer and user communities during >> > incubation. >> > > >> >> > > >> ###Core Developers >> > > >> >> > > >> * Ruyue Ma (https://github.com/maruyue, >> > > >>maru...@baidu.com<mailto:maru...@baidu.com>) >> > > >> * Chun Zhao (https://github.com/imay, >> > > >>buaa.zh...@gmail.com<mailto:buaa.zh...@gmail.com>) >> > > >> * Mingyu Chen >>(https://github.com/morningman,chenmin...@baidu.com) >> > > >> * De Li(https://github.com/lide-reed, >> > > >>mailtol...@sina.com)<mailto:mailtol...@sina.com%EF%BC%89> >> > > >> * Hao Chen (https://github.com/chenhao7253886, >> > > >>chenha...@baidu.com<mailto:chenha...@baidu.com>) >> > > >> * Chaoyong Li (https://github.com/cyongli, >> > > >>lichaoy...@baidu.com<mailto:lichaoy...@baidu.com>) >> > > >> * Bin Lin (https://github.com/lingbin, >> > > >>lingbi...@gmail.com<mailto:lingbi...@gmail.com>) >> > > >> >> > > >> ###Alignment >> > > >> >> > > >> Palo is related to several other Apache projects: >> > > >> >> > > >> * Palo can also read data stored in Apache Hadoop clusters >>powered >> by >> > > >>the HDFS filesystem. >> > > >> * Palo is closely integrated with Impala, which is also being >> proposed >> > > >>to the Incubator. >> > > >> * Palo uses Apache Thrift as its RPC and serialization framework >>of >> > > >>choice. >> > > >> >> > > >> ##Known Risks >> > > >> >> > > >> ###Orphaned Products >> > > >> >> > > >> The core developers of Palo team plan to work full time on this >> > > >>project. There is very little risk of Palo getting orphaned since >>at >> > > >>least one large company (Baidu) is extensively using it in their >> > > >>production. For example, currently there are more than 200 use >>cases >> > > >>using Palo in production. Furthermore, since Palo was open >>sourced at >> > > >>the beginning of October 2017, it has received more than 660 stars >> and >> > > >>been forked nearly 170 times. We plan to extend and diversify this >> > > >>community further through Apache. >> > > >> >> > > >> ###Inexperience with Open Source >> > > >> >> > > >> The core developers are all active users and followers of open >> source. >> > > >>They are already committers and contributors to the Palo Github >> > project. >> > > >>All have been involved with the source code that has been released >> > under >> > > >>an open source license, and several of them also have experience >> > > >>developing code in an open source environment. Though the core >>set of >> > > >>Developers do not have Apache Open Source experience, there are >>plans >> > to >> > > >>onboard individuals with Apache open source experience on to the >> > project. >> > > >> >> > > >> ###Homogenous Developers >> > > >> >> > > >> The most of core developers are from Baidu, but after Palo was >>open >> > > >>sourced, Palo received a lot of bug fixes and enhancements from >>other >> > > >>developers not working at Baidu. >> > > >> >> > > >> ###Reliance on Salaried Developers >> > > >> >> > > >> Baidu invested in Palo as the OLAP solution and some of its key >> > > >>engineers are working full time on the project. In addition, since >> > there >> > > >>is a growing Big Data need for scalable OLAP solutions, we look >> forward >> > > >>to other Apache developers and researchers to contribute to the >> > project. >> > > >>Also key to addressing the risk associated with relying on >>Salaried >> > > >>developers from a single entity is to increase the diversity of >>the >> > > >>contributors and actively lobby for Domain experts in the BI >>space to >> > > >>contribute. Apache Palo intends to do this. >> > > >> >> > > >> ###An Excessive Fascination with the Apache Brand >> > > >> >> > > >> Palo is proposing to enter incubation at Apache in order to help >> > > >>efforts to diversify the committer-base, not so much to >>capitalize on >> > > >>the Apache brand. The Palo project is in production use already >> inside >> > > >>Baidu, but is not expected to be an Baidu product for external >> > > >>customers. As such, the Palo project is not seeking to use the >>Apache >> > > >>brand as a marketing tool. >> > > >> >> > > >> ##Documentation >> > > >> >> > > >> Information about Palo can be found at >> https://github.com/baidu/palo. >> > > >>The following links provide more information about Palo in open >> source: >> > > >> >> > > >> * Palo wiki site: https://github.com/baidu/palo/wiki >> > > >> * Codebase at Github: https://github.com/baidu/palo >> > > >> * Issue Tracking: https://github.com/baidu/palo/issues >> > > >> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview >> > > >> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ >> > > >> >> > > >> ##Initial Source >> > > >> >> > > >> Palo has been under development since 2017 by a team of >>engineers at >> > > >>Baidu Inc. It is currently hosted on Github.com under an Apache >> license >> > > >>at https://github.com/baidu/palo. >> > > >> >> > > >> ##External Dependencies >> > > >> >> > > >> Palo has the following external dependencies. >> > > >> >> > > >> * Google gflags (BSD) >> > > >> * Google glog (BSD) >> > > >> * Apache Thrift (Apache Software License v2.0) >> > > >> * Apache Commons (Apache Software License v2.0) >> > > >> * Boost (Boost Software License) >> > > >> * OpenLdap (OpenLDAP Software License) >> > > >> * rapidjson (Tencent) >> > > >> * Google RE2 (BSD-style) >> > > >> * lz4 (BSD) >> > > >> * snappy (BSD) >> > > >> * cyrus-sasl (CMU License) >> > > >> * Twitter Bootstrap (Apache Software License v2.0) >> > > >> * d3 (BSD) >> > > >> * LLVM (BSD-like) >> > > >> >> > > >> Build and test dependencies: >> > > >> >> > > >> * ant (Apache Software License v2.0) >> > > >> * Apache Maven (Apache Software License v2.0) >> > > >> * cmake (BSD) >> > > >> * clang (BSD) >> > > >> * Google gtest (Apache Software License v2.0) >> > > >> >> > > >> ##Required Resources >> > > >> >> > > >> ###Mailing List >> > > >> >> > > >> There are currently no mailing lists. The usual mailing lists are >> > > >>expected to be set up when entering incubation: >> > > >> >> > > >> >> > > >>priv...@palo.incubator.apache.org<mailto:private@ >> > > palo.incubator.apache.or >> > > >>g> >> > > >> >>d...@palo.incubator.apache.org<mailto:d...@palo.incubator.apache.org> >> > > >> >> > > >>comm...@palo.incubator.apache.org<mailto:commits@ >> > > palo.incubator.apache.or >> > > >>g> >> > > >> >> > > >> ###Subversion Directory >> > > >> >> > > >> Upon entering incubation: https://github.com/baidu/palo. >> > > >> After incubation, we want to move the existing repo from >> > > >>https://github.com/baidu/palo to Apache infrastructure. >> > > >> >> > > >> ###Issue Tracking >> > > >> >> > > >> Palo currently uses GitHub to track issues. Would like to >>continue >> to >> > > >>do so while we discuss migration possibilities with the ASF Infra >> > > >>committee. >> > > >> >> > > >> ###Other Resources >> > > >> >> > > >> The existing code already has unit tests so we will make use of >> > > >>existing Apache continuous testing infrastructure. The resulting >>load >> > > >>should not be very large. >> > > >> >> > > >> ##Initial Committers >> > > >> >> > > >> * Ruyue Ma (https://github.com/maruyue, >> > > >>maru...@baidu.com<mailto:maru...@baidu.com>) >> > > >> * Chun Zhao (https://github.com/imay, >> > > >>buaa.zh...@gmail.com<mailto:buaa.zh...@gmail.com>) >> > > >> * Mingyu Chen >>(https://github.com/morningman,chenmin...@baidu.com) >> > > >> * De Li(https://github.com/lide-reed, >> > > >>mailtol...@sina.com)<mailto:mailtol...@sina.com%EF%BC%89> >> > > >> * Hao Chen (https://github.com/chenhao7253886, >> > > >>chenha...@baidu.com<mailto:chenha...@baidu.com>) >> > > >> * Chaoyong Li (https://github.com/cyongli, >> > > >>lichaoy...@baidu.com<mailto:lichaoy...@baidu.com>) >> > > >> * Bin Lin (https://github.com/lingbin, >> > > >>lingbi...@gmail.com<mailto:lingbi...@gmail.com>) >> > > >> >> > > >> ##Affiliations >> > > >> >> > > >> The initial committers are employees of Baidu Inc.. The nominated >> > > >>mentors are employees of TODO. >> > > >> >> > > >> ##Sponsors >> > > >> >> > > >> ###Champion >> > > >> >> > > >> TODO >> > > >> >> > > >> ###Nominated Mentors >> > > >> >> > > >> * sijie guo, guosi...@gmail.com<mailto:guosi...@gmail.com> >> > > >> * Luke Han, luke...@apache.org<mailto:luke...@apache.org> >> > > >> * Zheng Shao, zs...@apache.org<mailto:zs...@apache.org> >> > > >> >> > > >> ###Sponsoring Entity >> > > >> >> > > >> We are requesting the Incubator to sponsor this project. >> > > >> >> > > > >> > > >>>--------------------------------------------------------------------- >> > > >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> > > >For additional commands, e-mail: general-h...@incubator.apache.org >> > > > >> > > >> > > >> > > >>--------------------------------------------------------------------- >> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> > > For additional commands, e-mail: general-h...@incubator.apache.org >> > > >> > >> > >> > >> > -- >> > Todd Lipcon >> > Software Engineer, Cloudera >> > >> > > > >-- >Todd Lipcon >Software Engineer, Cloudera