Love to see this in the incubator as well. +1 Tim
On Tue, Feb 13, 2018 at 4:22 PM, Kevin A. McGrail <kevin.mcgr...@mcgrail.com> wrote: > Agreed. It could use more mentors from ASF which I'm too overloaded to help > with but I'd be inclined to +1 this. Do you have some thoughts on getting > more community people outside of LI and Uber to help? > > On 2/13/2018 7:07 PM, Dave Fisher wrote: >> >> Noir or Blanc? Gris or Grigio? What’s the vintage? >> >> All kidding aside this looks interesting. >> >> Regards, >> Dave >> >> Sent from my iPhone >> >>> On Feb 13, 2018, at 12:10 AM, kishore g <g.kish...@gmail.com> wrote: >>> >>> Hello, >>> >>> I would like to propose Pinot as an Apache Incubator project. The >>> proposal >>> is available as a draft at >>> https://wiki.apache.org/incubator/PinotProposal. I >>> have also included the text of the proposal below. >>> >>> Any feedback from the community is much appreciated. >>> >>> Regards, >>> Kishore G >>> >>> = Pinot Proposal = >>> >>> == Abstract == >>> >>> Pinot is a distributed columnar storage engine that can ingest data in >>> real-time and serve analytical queries at low latency. There are two >>> modes >>> of data ingestion - batch and/or realtime. Batch mode allows users to >>> generate pinot segments externally using systems such as Hadoop. These >>> segments can be uploaded into Pinot via simple curl calls. Pinot can >>> ingest >>> data in near real-time from streaming sources such as Kafka. Data >>> ingested >>> into Pinot is stored in a columnar format. Pinot provides a SQL like >>> interface (PQL) that supports filters, aggregations, and group by >>> operations. It does not support joins by design, in order to guarantee >>> predictable latency. It leverages other Apache projects such as >>> Zookeeper, >>> Kafka, and Helix, along with many libraries from the ASF. >>> >>> == Proposal == >>> >>> Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of the >>> development happens at LinkedIn with other contributions from Uber and >>> Slack. We believe that being a part of Apache Software Foundation will >>> improve the diversity and help form a strong community around the >>> project. >>> >>> LinkedIn submits this proposal to donate the code base to Apache Software >>> Foundation. The code is already under Apache License 2.0. Code and the >>> documentation are hosted on Github. >>> * Code: http://github.com/linkedin/pinot >>> * Documentation: https://github.com/linkedin/pinot/wiki >>> >>> >>> == Background == >>> >>> LinkedIn, similar to other companies, has many applications that provide >>> rich real-time insights to members and customers (internal and external). >>> The workload characteristics for these applications vary a lot. Some >>> internal applications simply need ad-hoc query capabilities with >>> sub-second >>> to multiple seconds latency. But external site facing applications >>> require >>> strong SLA even very high workloads. Prior to Pinot, LinkedIn had >>> multiple >>> solutions depending on the workload generated by the application and this >>> was inefficient. Pinot was developed to be the one single platform that >>> addresses all classes of applications. Today at LinkedIn, Pinot powers >>> more >>> than 50 site facing products with workload ranging from few queries per >>> second to 1000’s of queries per second while maintaining the 99th >>> percentile latency which can be as low as few milliseconds. All internal >>> dashboards at LinkedIn are powered by Pinot. >>> >>> == Rationale == >>> >>> We believe that requirement to develop rich real-time analytic >>> applications >>> is applicable to other organizations. Both Pinot and the interested >>> communities would benefit from this work being openly available. >>> >>> == Current Status == >>> >>> Pinot is currently open sourced under the Apache License Version 2.0 and >>> available at github.com/linkedin/pinot. All the development is done using >>> GitHub Pull Requests. We cut releases on a weekly basis and deploy it at >>> LinkedIn. mp-0.1.468 is the latest release tag that is deployed in >>> production. >>> >>> == Meritocracy == >>> >>> Following the Apache meritocracy model, we intend to build an open and >>> diverse community around Pinot. We will encourage the community to >>> contribute to discussion and codebase. >>> >>> == Community == >>> >>> Pinot is currently used extensively at LinkedIn and Uber. Several >>> companies >>> have expressed interest in the project. We hope to extend the contributor >>> base significantly by bringing Pinot into Apache. >>> >>> == Core Developers == >>> >>> Pinot was started by engineers at LinkedIn, and now has committers from >>> Uber. >>> >>> == Alignment == >>> >>> Apache is the most natural home for taking Pinot forward. Pinot leverages >>> several existing Apache Projects such as Kafka, Helix, Zookeeper, and >>> Avro. >>> As Pinot gains adoption, we plan to add support for the ORC and Parquet >>> formats, as well as adding integration with Yarn and Mesos. >>> >>> == Known Risks == >>> >>> === Orphaned Products === >>> >>> The risk of the Pinot project being abandoned is minimal. The teams at >>> LinkedIn and Uber are highly incentivized to continue development of >>> Pinot >>> as it is a critical part of their infrastructure. >>> >>> === Inexperience with Open Source === >>> >>> Post open sourcing, Pinot was completely developed on GitHub. All the >>> current developers on Pinot are well aware of the open source development >>> process. However, most of the developers are new to the Apache process. >>> Kishore Gopalakrishna, one of the lead developers in Pinot, is VP and >>> committer of the Apache Helix project. >>> >>> === Homogenous Developers === >>> >>> The current core developers are all from LinkedIn and Uber. However, we >>> hope to establish a developer community that includes contributors from >>> several corporations and we are actively encouraging new contributors via >>> the mailing lists and public presentations of Pinot. >>> >>> === Reliance on Salaried Developers === >>> >>> It is expected that Pinot development will occur on both salaried time >>> and >>> on volunteer time, after hours. The majority of initial committers are >>> paid >>> by their employer to contribute to this project. However, they are all >>> passionate about the project, and we are confident that the project will >>> continue even if no salaried developers contribute to the project. We are >>> committed to recruiting additional committers including non-salaried >>> developers. >>> >>> === Relationships with Other Apache Products === >>> >>> As mentioned earlier, Pinot uses several Apache Projects such as Kafka to >>> ingest data in real-time, Zookeeper and Helix for cluster management. >>> Pinot >>> also uses Maven for build and release. We foresee adding support for the >>> Parquet and ORC formats. Adding the ability to deploy on Yarn and Mesos >>> clusters is another interesting project we might pursue. >>> >>> === An Excessive Fascination with the Apache Brand === >>> >>> While we respect the reputation of the Apache brand and have no doubts >>> that >>> it will attract contributors and users, we believe ASF is the right home >>> for Pinot to foster a great community that will lead to a better outcome >>> in >>> the long term. >>> >>> == Documentation == >>> >>> * Code: https://github.com/linkedin/pinot/ >>> * Documentation: https://github.com/linkedin/pinot/wiki >>> * User group: https://groups.google.com/forum/#!forum/pinot_users >>> >>> == Initial Source == >>> >>> The current Pinot codebase is hosted on Github and licensed under the >>> Apache License V2. The source tree is self contained and relies on Maven >>> as >>> its build and dependency resolution mechanism. >>> >>> == External Dependencies == >>> >>> All dependencies in Pinot have licenses that are compatible with Apache >>> License V2, except for the org.json library, which will be removed prior >>> to >>> Apache incubation. The list below summarizes the external dependencies of >>> Pinot grouped by license and ASF license category. >>> >>> Dependencies from the ASF Category A >>> === Apache License 2.0 === >>> * com.101tec:zkclient:0.7 >>> * com.alibaba:fastjson:1.1.24 >>> * com.clearspring.analytics:stream:2.7.0 >>> * com.fasterxml.jackson.core:jackson-annotations:2.8.0 >>> * com.fasterxml.jackson.core:jackson-core:2.8.0 >>> * com.fasterxml.jackson.core:jackson-databind:2.8.0 >>> * com.google.code.findbugs:jsr305:3.0.0 >>> * com.google.guava:guava:19 >>> * com.ning:async-http-client:1.9.21 >>> * com.yammer.metrics:metrics-core:2.2.0 >>> * commons-beanutils:commons-beanutils:1.8.3 >>> * commons-cli:commons-cli:1.2 >>> * commons-codec:commons-codec:1.6 >>> * commons-configuration:commons-configuration:1.6 >>> * commons-fileupload:commons-fileupload:1.2.2 >>> * commons-httpclient:commons-httpclient:3.1 >>> * commons-io:commons-io:2.1 >>> * commons-validator:commons-validator:1.4.0 >>> * io.netty:netty-all:4.1.4.Final >>> * io.swagger:swagger-jaxrs:1.5.10 >>> * io.swagger:swagger-jersey2-jaxrs:1.5.10 >>> * it.unimi.dsi:fastutil:6.5.16 >>> * joda-time:joda-time:2 >>> * log4j:log4j:1.2.17 >>> * me.lemire.integercompression:JavaFastPFOR:0.0.13 >>> * nl.jqno.equalsverifier:equalsverifier:1.7.2 >>> * org.apache.avro:avro:1.7.6 >>> * org.apache.commons:commons-compress:1.9 >>> * org.apache.commons:commons-lang3:3.5 >>> * org.apache.commons:commons-math:2.1 >>> * org.apache.hadoop:hadoop-client:2.7.0 >>> * org.apache.hadoop:hadoop-common:2.7.0 >>> * org.apache.helix:helix-core:0.6.8 >>> * org.apache.httpcomponents:httpclient:4.1.3 >>> * org.apache.httpcomponents:httpclient:4.2.5 >>> * org.apache.httpcomponents:httpcore:4.2.5 >>> * org.apache.httpcomponents:httpmime:4.2.5 >>> * org.apache.kafka:kafka_2.10:0.9.0.1 >>> * org.apache.thrift:libthrift:0.9.1 >>> * org.apache.zookeeper:zookeeper:3.4.9 >>> * org.codehaus.jackson:jackson-core-asl:1.9.6 >>> * org.codehaus.jackson:jackson-mapper-asl:1.9.6 >>> * org.json:json:20080701 >>> * org.roaringbitmap:RoaringBitmap:0.5.10 >>> * org.testng:testng:6.0.1 >>> * org.twitter4j:twitter4j-core:4.0.3 >>> * org.webjars:swagger-ui:2.2.2 >>> * org.xerial.larray:larray:0.2.1 >>> * org.yaml:snakeyaml:1.16 >>> * xml-apis:xml-apis:1.0.b2 >>> === Dual license (Apache License 2.0 + LGPL 2.1), using under the Apache >>> License === >>> * org.codehaus.jackson:jackson-jaxrs:1.9.6 >>> * org.codehaus.jackson:jackson-xc:1.9.6 >>> === BSD === >>> * com.jcabi:jcabi-log:0.17.1 >>> * org.antlr:antlr4-annotations:4.3 >>> * org.antlr:antlr4-runtime:4.3 >>> === MIT === >>> * com.github.nkzawa:socket.io-client:0.5.1 >>> * org.mockito:mockito-core:2.10.0 >>> * org.slf4j:slf4j-api:1.7.7 >>> * org.slf4j:slf4j-log4j12:1.7.7 >>> >>> === Dependencies from the ASF Category B === >>> Dual license (CDDL 1.1 + GPL 2 w/ CPE), using under the CDDL >>> * com.sun.jersey:jersey-client:1.19.2 >>> * javax.servlet:javax.servlet-api:3.0.1 >>> * org.glassfish.jersey.containers:jersey-container-grizzly2-http:2.23 >>> * org.glassfish.jersey.core:jersey-common:2.23 >>> * org.glassfish.jersey.core:jersey-server:2.23 >>> * org.glassfish.jersey.media:jersey-media-json-jackson:2.24 >>> * org.glassfish.jersey.media:jersey-media-multipart:2.23 >>> >>> === Dependencies from the ASF Category X === >>> JSON License >>> * org.json:json:20080701 (to be removed before Apache incubation) >>> >>> >>> == Cryptography == >>> >>> None >>> >>> == Required Resources == >>> >>> === Mailing lists === >>> >>> * pinot-private (with moderated subscriptions) >>> * pinot-user >>> * pinot-dev >>> * pinot-commits >>> >>> === Git repository === >>> >>> * git://git.apache.org/pinot >>> * https://git-wip-us.apache.org/repos/asf/incubator-pinot.git >>> >>> === Issue Tracking === >>> >>> A JIRA Issue tracker (PINOT) >>> >>> === Other Resources === >>> >>> The existing code already has unit and integration tests and we use >>> travis >>> to test the patch before committing it to master. We would like to have >>> an >>> instance of Jenkins to achieve similar functionality. >>> >>> == Initial Committers == >>> >>> * Kishore Gopalakrishna >>> * Ravi Aringunram >>> * Jean-François Im >>> * Mayank Shrivastava >>> * Subbu Subramaniam >>> * Adwait Tumbde >>> * Xiaotian Jiang >>> * Jennifer Dai >>> * Seunghyun Lee >>> * Xiang Fu >>> * Dhaval Patel >>> * Neha Pawar >>> * Alex Pucher >>> * Yen-Jung Chang >>> >>> >>> >>> == Affiliations == >>> >>> * Kishore Gopalakrishna (LinkedIn) >>> * Ravi Aringunram (LinkedIn) >>> * Jean-François Im (LinkedIn) >>> * Mayank Shrivastava (LinkedIn) >>> * Subbu Subramaniam (LinkedIn) >>> * Adwait Tumbde (LinkedIn) >>> * Xiaotian Jiang (LinkedIn) >>> * Jennifer Dai (LinkedIn) >>> * Seunghyun Lee (LinkedIn) >>> * Xiang Fu (Uber) >>> * Dhaval Patel (Uber) >>> * Neha Pawar (LinkedIn) >>> * Alex Pucher (LinkedIn) >>> * Yen-Jung Chang (LinkedIn) >>> >>> == Sponsors == >>> >>> === Champion === >>> >>> * Olivier Lamy < olamy at apache dot org> >>> >>> === Nominated Mentors === >>> >>> * Olivier Lamy <olamy at apache dot org> >>> >>> === Sponsoring Entity === >>> >>> The Apache Incubator >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org