Hi JB, Curious to know about how it compares to Apache Crunch? Constructs looks very familiar (had used Crunch long ago)
Thoughts? - Ashish On Fri, Jan 22, 2016 at 6:33 AM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Seshu, > > I blogged about Apache Dataflow proposal: > http://blog.nanthrax.net/2016/01/introducing-apache-dataflow/ > > You can see in the "what's next ?" section that new runners, skins and > sources are on our roadmap. Definitely, a storm runner could be part of > this. > > Regards > JB > > > On 01/22/2016 03:31 PM, Adunuthula, Seshu wrote: >> >> Awesome to see CloudDataFlow coming to Apache. The Stream Processing area >> has been in general fragmented with a variety of solutions, hoping the >> community galvanizes around Apache Data Flow. >> >> We are still in the "Apache Storm" world, Any chance for folks building a >> "Storm Runner²? >> >> >> On 1/20/16, 9:39 AM, "James Malone" <jamesmal...@google.com.INVALID> >> wrote: >> >>>> Great proposal. I like that your proposal includes a well presented >>>> roadmap, but I don't see any goals that directly address building a >>>> larger >>>> community. Y'all have any ideas around outreach that will help with >>>> adoption? >>>> >>> >>> Thank you and fair point. We have a few additional ideas which we can put >>> into the Community section. >>> >>> >>>> >>>> As a start, I recommend y'all add a section to the proposal on the wiki >>>> page for "Additional Interested Contributors" so that folks who want to >>>> sign up to participate in the project can do so without requesting >>>> additions to the initial committer list. >>>> >>>> >>> This is a great idea and I think it makes a lot of sense to add an >>> "Additional >>> Interested Contributors" section to the proposal. >>> >>> >>>> On Wed, Jan 20, 2016 at 10:32 AM, James Malone < >>>> jamesmal...@google.com.invalid> wrote: >>>> >>>>> Hello everyone, >>>>> >>>>> Attached to this message is a proposed new project - Apache Dataflow, >>>> >>>> a >>>>> >>>>> unified programming model for data processing and integration. >>>>> >>>>> The text of the proposal is included below. Additionally, the >>>> >>>> proposal is >>>>> >>>>> in draft form on the wiki where we will make any required changes: >>>>> >>>>> https://wiki.apache.org/incubator/DataflowProposal >>>>> >>>>> We look forward to your feedback and input. >>>>> >>>>> Best, >>>>> >>>>> James >>>>> >>>>> ---- >>>>> >>>>> = Apache Dataflow = >>>>> >>>>> == Abstract == >>>>> >>>>> Dataflow is an open source, unified model and set of language-specific >>>> >>>> SDKs >>>>> >>>>> for defining and executing data processing workflows, and also data >>>>> ingestion and integration flows, supporting Enterprise Integration >>>> >>>> Patterns >>>>> >>>>> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines >>>> >>>> simplify >>>>> >>>>> the mechanics of large-scale batch and streaming data processing and >>>> >>>> can >>>>> >>>>> run on a number of runtimes like Apache Flink, Apache Spark, and >>>> >>>> Google >>>>> >>>>> Cloud Dataflow (a cloud service). Dataflow also brings DSL in >>>> >>>> different >>>>> >>>>> languages, allowing users to easily implement their data integration >>>>> processes. >>>>> >>>>> == Proposal == >>>>> >>>>> Dataflow is a simple, flexible, and powerful system for distributed >>>> >>>> data >>>>> >>>>> processing at any scale. Dataflow provides a unified programming >>>> >>>> model, a >>>>> >>>>> software development kit to define and construct data processing >>>> >>>> pipelines, >>>>> >>>>> and runners to execute Dataflow pipelines in several runtime engines, >>>> >>>> like >>>>> >>>>> Apache Spark, Apache Flink, or Google Cloud Dataflow. Dataflow can be >>>> >>>> used >>>>> >>>>> for a variety of streaming or batch data processing goals including >>>> >>>> ETL, >>>>> >>>>> stream analysis, and aggregate computation. The underlying programming >>>>> model for Dataflow provides MapReduce-like parallelism, combined with >>>>> support for powerful data windowing, and fine-grained correctness >>>> >>>> control. >>>>> >>>>> >>>>> == Background == >>>>> >>>>> Dataflow started as a set of Google projects focused on making data >>>>> processing easier, faster, and less costly. The Dataflow model is a >>>>> successor to MapReduce, FlumeJava, and Millwheel inside Google and is >>>>> focused on providing a unified solution for batch and stream >>>> >>>> processing. >>>>> >>>>> These projects on which Dataflow is based have been published in >>>> >>>> several >>>>> >>>>> papers made available to the public: >>>>> >>>>> * MapReduce - http://research.google.com/archive/mapreduce.html >>>>> >>>>> * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf >>>>> >>>>> * FlumeJava - http://notes.stephenholiday.com/FlumeJava.pdf >>>>> >>>>> * MillWheel - http://research.google.com/pubs/pub41378.html >>>>> >>>>> Dataflow was designed from the start to provide a portable programming >>>>> layer. When you define a data processing pipeline with the Dataflow >>>> >>>> model, >>>>> >>>>> you are creating a job which is capable of being processed by any >>>> >>>> number >>>> of >>>>> >>>>> Dataflow processing engines. Several engines have been developed to >>>> >>>> run >>>>> >>>>> Dataflow pipelines in other open source runtimes, including a Dataflow >>>>> runner for Apache Flink and Apache Spark. There is also a ³direct >>>> >>>> runner², >>>>> >>>>> for execution on the developer machine (mainly for dev/debug >>>> >>>> purposes). >>>>> >>>>> Another runner allows a Dataflow program to run on a managed service, >>>>> Google Cloud Dataflow, in Google Cloud Platform. The Dataflow Java >>>> >>>> SDK is >>>>> >>>>> already available on GitHub, and independent from the Google Cloud >>>> >>>> Dataflow >>>>> >>>>> service. Another Python SDK is currently in active development. >>>>> >>>>> In this proposal, the Dataflow SDKs, model, and a set of runners will >>>> >>>> be >>>>> >>>>> submitted as an OSS project under the ASF. The runners which are a >>>> >>>> part >>>> of >>>>> >>>>> this proposal include those for Spark (from Cloudera), Flink (from >>>> >>>> data >>>>> >>>>> Artisans), and local development (from Google); the Google Cloud >>>> >>>> Dataflow >>>>> >>>>> service runner is not included in this proposal. Further references to >>>>> Dataflow will refer to the Dataflow model, SDKs, and runners which >>>> >>>> are a >>>>> >>>>> part of this proposal (Apache Dataflow) only. The initial submission >>>> >>>> will >>>>> >>>>> contain the already-released Java SDK; Google intends to submit the >>>> >>>> Python >>>>> >>>>> SDK later in the incubation process. The Google Cloud Dataflow service >>>> >>>> will >>>>> >>>>> continue to be one of many runners for Dataflow, built on Google Cloud >>>>> Platform, to run Dataflow pipelines. Necessarily, Cloud Dataflow will >>>>> develop against the Apache project additions, updates, and changes. >>>> >>>> Google >>>>> >>>>> Cloud Dataflow will become one user of Apache Dataflow and will >>>> >>>> participate >>>>> >>>>> in the project openly and publicly. >>>>> >>>>> The Dataflow programming model has been designed with simplicity, >>>>> scalability, and speed as key tenants. In the Dataflow model, you only >>>> >>>> need >>>>> >>>>> to think about four top-level concepts when constructing your data >>>>> processing job: >>>>> >>>>> * Pipelines - The data processing job made of a series of computations >>>>> including input, processing, and output >>>>> >>>>> * PCollections - Bounded (or unbounded) datasets which represent the >>>> >>>> input, >>>>> >>>>> intermediate and output data in pipelines >>>>> >>>>> * PTransforms - A data processing step in a pipeline in which one or >>>> >>>> more >>>>> >>>>> PCollections are an input and output >>>>> >>>>> * I/O Sources and Sinks - APIs for reading and writing data which are >>>> >>>> the >>>>> >>>>> roots and endpoints of the pipeline >>>>> >>>>> == Rationale == >>>>> >>>>> With Dataflow, Google intended to develop a framework which allowed >>>>> developers to be maximally productive in defining the processing, and >>>> >>>> then >>>>> >>>>> be able to execute the program at various levels of >>>>> latency/cost/completeness without re-architecting or re-writing it. >>>> >>>> This >>>>> >>>>> goal was informed by Google¹s past experience developing several >>>> >>>> models, >>>>> >>>>> frameworks, and tools useful for large-scale and distributed data >>>>> processing. While Google has previously published papers describing >>>> >>>> some >>>> of >>>>> >>>>> its technologies, Google decided to take a different approach with >>>>> Dataflow. Google open-sourced the SDK and model alongside >>>> >>>> commercialization >>>>> >>>>> of the idea and ahead of publishing papers on the topic. As a result, >>>> >>>> a >>>>> >>>>> number of open source runtimes exist for Dataflow, such as the Apache >>>> >>>> Flink >>>>> >>>>> and Apache Spark runners. >>>>> >>>>> We believe that submitting Dataflow as an Apache project will provide >>>> >>>> an >>>>> >>>>> immediate, worthwhile, and substantial contribution to the open source >>>>> community. As an incubating project, we believe Dataflow will have a >>>> >>>> better >>>>> >>>>> opportunity to provide a meaningful contribution to OSS and also >>>> >>>> integrate >>>>> >>>>> with other Apache projects. >>>>> >>>>> In the long term, we believe Dataflow can be a powerful abstraction >>>> >>>> layer >>>>> >>>>> for data processing. By providing an abstraction layer for data >>>> >>>> pipelines >>>>> >>>>> and processing, data workflows can be increasingly portable, >>>> >>>> resilient to >>>>> >>>>> breaking changes in tooling, and compatible across many execution >>>> >>>> engines, >>>>> >>>>> runtimes, and open source projects. >>>>> >>>>> == Initial Goals == >>>>> >>>>> We are breaking our initial goals into immediate (< 2 months), >>>> >>>> short-term >>>>> >>>>> (2-4 months), and intermediate-term (> 4 months). >>>>> >>>>> Our immediate goals include the following: >>>>> >>>>> * Plan for reconciling the Dataflow Java SDK and various runners into >>>> >>>> one >>>>> >>>>> project >>>>> >>>>> * Plan for refactoring the existing Java SDK for better extensibility >>>> >>>> by >>>>> >>>>> SDK and runner writers >>>>> >>>>> * Validating all dependencies are ASL 2.0 or compatible >>>>> >>>>> * Understanding and adapting to the Apache development process >>>>> >>>>> Our short-term goals include: >>>>> >>>>> * Moving the newly-merged lists, and build utilities to Apache >>>>> >>>>> * Start refactoring codebase and move code to Apache Git repo >>>>> >>>>> * Continue development of new features, functions, and fixes in the >>>>> Dataflow Java SDK, and Dataflow runners >>>>> >>>>> * Cleaning up the Dataflow SDK sources and crafting a roadmap and plan >>>> >>>> for >>>>> >>>>> how to include new major ideas, modules, and runtimes >>>>> >>>>> * Establishment of easy and clear build/test framework for Dataflow >>>> >>>> and >>>>> >>>>> associated runtimes; creation of testing, rollback, and validation >>>> >>>> policy >>>>> >>>>> >>>>> * Analysis and design for work needed to make Dataflow a better data >>>>> processing abstraction layer for multiple open source frameworks and >>>>> environments >>>>> >>>>> Finally, we have a number of intermediate-term goals: >>>>> >>>>> * Roadmapping, planning, and execution of integrations with other OSS >>>> >>>> and >>>>> >>>>> non-OSS projects/products >>>>> >>>>> * Inclusion of additional SDK for Python, which is under active >>>> >>>> development >>>>> >>>>> >>>>> == Current Status == >>>>> >>>>> === Meritocracy === >>>>> >>>>> Dataflow was initially developed based on ideas from many employees >>>> >>>> within >>>>> >>>>> Google. As an ASL OSS project on GitHub, the Dataflow SDK has received >>>>> contributions from data Artisans, Cloudera Labs, and other individual >>>>> developers. As a project under incubation, we are committed to >>>> >>>> expanding >>>>> >>>>> our effort to build an environment which supports a meritocracy. We >>>> >>>> are >>>>> >>>>> focused on engaging the community and other related projects for >>>> >>>> support >>>>> >>>>> and contributions. Moreover, we are committed to ensure contributors >>>> >>>> and >>>>> >>>>> committers to Dataflow come from a broad mix of organizations through >>>> >>>> a >>>>> >>>>> merit-based decision process during incubation. We believe strongly in >>>> >>>> the >>>>> >>>>> Dataflow model and are committed to growing an inclusive community of >>>>> Dataflow contributors. >>>>> >>>>> === Community === >>>>> >>>>> The core of the Dataflow Java SDK has been developed by Google for use >>>> >>>> with >>>>> >>>>> Google Cloud Dataflow. Google has active community engagement in the >>>> >>>> SDK >>>>> >>>>> GitHub repository ( >>>> >>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK >>>>> >>>>> ), >>>>> on Stack Overflow ( >>>>> http://stackoverflow.com/questions/tagged/google-cloud-dataflow) and >>>> >>>> has >>>>> >>>>> had contributions from a number of organizations and indivuduals. >>>>> >>>>> Everyday, Cloud Dataflow is actively used by a number of organizations >>>> >>>> and >>>>> >>>>> institutions for batch and stream processing of data. We believe >>>> >>>> acceptance >>>>> >>>>> will allow us to consolidate existing Dataflow-related work, grow the >>>>> Dataflow community, and deepen connections between Dataflow and other >>>> >>>> open >>>>> >>>>> source projects. >>>>> >>>>> === Core Developers === >>>>> >>>>> The core developers for Dataflow and the Dataflow runners are: >>>>> >>>>> * Frances Perry >>>>> >>>>> * Tyler Akidau >>>>> >>>>> * Davor Bonaci >>>>> >>>>> * Luke Cwik >>>>> >>>>> * Ben Chambers >>>>> >>>>> * Kenn Knowles >>>>> >>>>> * Dan Halperin >>>>> >>>>> * Daniel Mills >>>>> >>>>> * Mark Shields >>>>> >>>>> * Craig Chambers >>>>> >>>>> * Maximilian Michels >>>>> >>>>> * Tom White >>>>> >>>>> * Josh Wills >>>>> >>>>> === Alignment === >>>>> >>>>> The Dataflow SDK can be used to create Dataflow pipelines which can be >>>>> executed on Apache Spark or Apache Flink. Dataflow is also related to >>>> >>>> other >>>>> >>>>> Apache projects, such as Apache Crunch. We plan on expanding >>>> >>>> functionality >>>>> >>>>> for Dataflow runners, support for additional domain specific >>>> >>>> languages, >>>> and >>>>> >>>>> increased portability so Dataflow is a powerful abstraction layer for >>>> >>>> data >>>>> >>>>> processing. >>>>> >>>>> == Known Risks == >>>>> >>>>> === Orphaned Products === >>>>> >>>>> The Dataflow SDK is presently used by several organizations, from >>>> >>>> small >>>>> >>>>> startups to Fortune 100 companies, to construct production pipelines >>>> >>>> which >>>>> >>>>> are executed in Google Cloud Dataflow. Google has a long-term >>>> >>>> commitment >>>> to >>>>> >>>>> advance the Dataflow SDK; moreover, Dataflow is seeing increasing >>>> >>>> interest, >>>>> >>>>> development, and adoption from organizations outside of Google. >>>>> >>>>> === Inexperience with Open Source === >>>>> >>>>> Google believes strongly in open source and the exchange of >>>> >>>> information >>>> to >>>>> >>>>> advance new ideas and work. Examples of this commitment are active OSS >>>>> projects such as Chromium (https://www.chromium.org) and Kubernetes ( >>>>> http://kubernetes.io/). With Dataflow, we have tried to be >>>> >>>> increasingly >>>>> >>>>> open and forward-looking; we have published a paper in the VLDB >>>> >>>> conference >>>>> >>>>> describing the Dataflow model ( >>>>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf) and were quick to >>>> >>>> release >>>>> >>>>> the Dataflow SDK as open source software with the launch of Cloud >>>> >>>> Dataflow. >>>>> >>>>> Our submission to the Apache Software Foundation is a logical >>>> >>>> extension >>>> of >>>>> >>>>> our commitment to open source software. >>>>> >>>>> === Homogeneous Developers === >>>>> >>>>> The majority of committers in this proposal belong to Google due to >>>> >>>> the >>>>> >>>>> fact that Dataflow has emerged from several internal Google projects. >>>> >>>> This >>>>> >>>>> proposal also includes committers outside of Google who are actively >>>>> involved with other Apache projects, such as Hadoop, Flink, and Spark. >>>> >>>> We >>>>> >>>>> expect our entry into incubation will allow us to expand the number of >>>>> individuals and organizations participating in Dataflow development. >>>>> Additionally, separation of the Dataflow SDK from Google Cloud >>>> >>>> Dataflow >>>>> >>>>> allows us to focus on the open source SDK and model and do what is >>>> >>>> best >>>> for >>>>> >>>>> this project. >>>>> >>>>> === Reliance on Salaried Developers === >>>>> >>>>> The Dataflow SDK and Dataflow runners have been developed primarily by >>>>> salaried developers supporting the Google Cloud Dataflow project. >>>> >>>> While >>>> the >>>>> >>>>> Dataflow SDK and Cloud Dataflow have been developed by different teams >>>> >>>> (and >>>>> >>>>> this proposal would reinforce that separation) we expect our initial >>>> >>>> set >>>> of >>>>> >>>>> developers will still primarily be salaried. Contribution has not been >>>>> exclusively from salaried developers, however. For example, the >>>> >>>> contrib >>>>> >>>>> directory of the Dataflow SDK ( >>>>> >>>> >>>> >>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/master/contri >>>> b >>>>> >>>>> ) >>>>> contains items from free-time contributors. Moreover, seperate >>>> >>>> projects, >>>>> >>>>> such as ScalaFlow (https://github.com/darkjh/scalaflow) have been >>>> >>>> created >>>>> >>>>> around the Dataflow model and SDK. We expect our reliance on salaried >>>>> developers will decrease over time during incubation. >>>>> >>>>> === Relationship with other Apache products === >>>>> >>>>> Dataflow directly interoperates with or utilizes several existing >>>> >>>> Apache >>>>> >>>>> projects. >>>>> >>>>> * Build >>>>> >>>>> ** Apache Maven >>>>> >>>>> * Data I/O, Libraries >>>>> >>>>> ** Apache Avro >>>>> >>>>> ** Apache Commons >>>>> >>>>> * Dataflow runners >>>>> >>>>> ** Apache Flink >>>>> >>>>> ** Apache Spark >>>>> >>>>> Dataflow when used in batch mode shares similarities with Apache >>>> >>>> Crunch; >>>>> >>>>> however, Dataflow is focused on a model, SDK, and abstraction layer >>>> >>>> beyond >>>>> >>>>> Spark and Hadoop (MapReduce.) One key goal of Dataflow is to provide >>>> >>>> an >>>>> >>>>> intermediate abstraction layer which can easily be implemented and >>>> >>>> utilized >>>>> >>>>> across several different processing frameworks. >>>>> >>>>> === An excessive fascination with the Apache brand === >>>>> >>>>> With this proposal we are not seeking attention or publicity. Rather, >>>> >>>> we >>>>> >>>>> firmly believe in the Dataflow model, SDK, and the ability to make >>>> >>>> Dataflow >>>>> >>>>> a powerful yet simple framework for data processing. While the >>>> >>>> Dataflow >>>> SDK >>>>> >>>>> and model have been open source, we believe putting code on GitHub can >>>> >>>> only >>>>> >>>>> go so far. We see the Apache community, processes, and mission as >>>> >>>> critical >>>>> >>>>> for ensuring the Dataflow SDK and model are truly community-driven, >>>>> positively impactful, and innovative open source software. While >>>> >>>> Google >>>> has >>>>> >>>>> taken a number of steps to advance its various open source projects, >>>> >>>> we >>>>> >>>>> believe Dataflow is a great fit for the Apache Software Foundation >>>> >>>> due to >>>>> >>>>> its focus on data processing and its relationships to existing ASF >>>>> projects. >>>>> >>>>> == Documentation == >>>>> >>>>> The following documentation is relevant to this proposal. Relevant >>>> >>>> portion >>>>> >>>>> of the documentation will be contributed to the Apache Dataflow >>>> >>>> project. >>>>> >>>>> >>>>> * Dataflow website: https://cloud.google.com/dataflow >>>>> >>>>> * Dataflow programming model: >>>>> https://cloud.google.com/dataflow/model/programming-model >>>>> >>>>> * Codebases >>>>> >>>>> ** Dataflow Java SDK: >>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK >>>>> >>>>> ** Flink Dataflow runner: >>>> >>>> https://github.com/dataArtisans/flink-dataflow >>>>> >>>>> >>>>> ** Spark Dataflow runner: https://github.com/cloudera/spark-dataflow >>>>> >>>>> * Dataflow Java SDK issue tracker: >>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues >>>>> >>>>> * google-cloud-dataflow tag on Stack Overflow: >>>>> http://stackoverflow.com/questions/tagged/google-cloud-dataflow >>>>> >>>>> == Initial Source == >>>>> >>>>> The initial source for Dataflow which we will submit to the Apache >>>>> Foundation will include several related projects which are currently >>>> >>>> hosted >>>>> >>>>> on the GitHub repositories: >>>>> >>>>> * Dataflow Java SDK ( >>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK) >>>>> >>>>> * Flink Dataflow runner >>>> >>>> (https://github.com/dataArtisans/flink-dataflow) >>>>> >>>>> >>>>> * Spark Dataflow runner (https://github.com/cloudera/spark-dataflow) >>>>> >>>>> These projects have always been Apache 2.0 licensed. We intend to >>>> >>>> bundle >>>>> >>>>> all of these repositories since they are all complimentary and should >>>> >>>> be >>>>> >>>>> maintained in one project. Prior to our submission, we will combine >>>> >>>> all >>>> of >>>>> >>>>> these projects into a new git repository. >>>>> >>>>> == Source and Intellectual Property Submission Plan == >>>>> >>>>> The source for the Dataflow SDK and the three runners (Spark, Flink, >>>> >>>> Google >>>>> >>>>> Cloud Dataflow) are already licensed under an Apache 2 license. >>>>> >>>>> * Dataflow SDK - >>>>> >>>> >>>> >>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/LICENS >>>> E >>>>> >>>>> >>>>> * Flink runner - >>>>> https://github.com/dataArtisans/flink-dataflow/blob/master/LICENSE >>>>> >>>>> * Spark runner - >>>>> https://github.com/cloudera/spark-dataflow/blob/master/LICENSE >>>>> >>>>> Contributors to the Dataflow SDK have also signed the Google >>>> >>>> Individual >>>>> >>>>> Contributor License Agreement ( >>>>> https://cla.developers.google.com/about/google-individual) in order to >>>>> contribute to the project. >>>>> >>>>> With respect to trademark rights, Google does not hold a trademark on >>>> >>>> the >>>>> >>>>> phrase ³Dataflow.² Based on feedback and guidance we receive during >>>> >>>> the >>>>> >>>>> incubation process, we are open to renaming the project if necessary >>>> >>>> for >>>>> >>>>> trademark or other concerns. >>>>> >>>>> == External Dependencies == >>>>> >>>>> All external dependencies are licensed under an Apache 2.0 or >>>>> Apache-compatible license. As we grow the Dataflow community we will >>>>> configure our build process to require and validate all contributions >>>> >>>> and >>>>> >>>>> dependencies are licensed under the Apache 2.0 license or are under an >>>>> Apache-compatible license. >>>>> >>>>> == Required Resources == >>>>> >>>>> === Mailing Lists === >>>>> >>>>> We currently use a mix of mailing lists. We will migrate our existing >>>>> mailing lists to the following: >>>>> >>>>> * d...@dataflow.incubator.apache.org >>>>> >>>>> * u...@dataflow.incubator.apache.org >>>>> >>>>> * priv...@dataflow.incubator.apache.org >>>>> >>>>> * comm...@dataflow.incubator.apache.org >>>>> >>>>> === Source Control === >>>>> >>>>> The Dataflow team currently uses Git and would like to continue to do >>>> >>>> so. >>>>> >>>>> We request a Git repository for Dataflow with mirroring to GitHub >>>> >>>> enabled. >>>>> >>>>> >>>>> === Issue Tracking === >>>>> >>>>> We request the creation of an Apache-hosted JIRA. The Dataflow >>>> >>>> project is >>>>> >>>>> currently using both a public GitHub issue tracker and internal Google >>>>> issue tracking. We will migrate and combine from these two sources to >>>> >>>> the >>>>> >>>>> Apache JIRA. >>>>> >>>>> == Initial Committers == >>>>> >>>>> * Aljoscha Krettek [aljos...@apache.org] >>>>> >>>>> * Amit Sela [amitsel...@gmail.com] >>>>> >>>>> * Ben Chambers [bchamb...@google.com] >>>>> >>>>> * Craig Chambers [chamb...@google.com] >>>>> >>>>> * Dan Halperin [dhalp...@google.com] >>>>> >>>>> * Davor Bonaci [da...@google.com] >>>>> >>>>> * Frances Perry [f...@google.com] >>>>> >>>>> * James Malone [jamesmal...@google.com] >>>>> >>>>> * Jean-Baptiste Onofré [jbono...@apache.org] >>>>> >>>>> * Josh Wills [jwi...@apache.org] >>>>> >>>>> * Kostas Tzoumas [kos...@data-artisans.com] >>>>> >>>>> * Kenneth Knowles [k...@google.com] >>>>> >>>>> * Luke Cwik [lc...@google.com] >>>>> >>>>> * Maximilian Michels [m...@apache.org] >>>>> >>>>> * Stephan Ewen [step...@data-artisans.com] >>>>> >>>>> * Tom White [t...@cloudera.com] >>>>> >>>>> * Tyler Akidau [taki...@google.com] >>>>> >>>>> == Affiliations == >>>>> >>>>> The initial committers are from six organizations. Google developed >>>>> Dataflow and the Dataflow SDK, data Artisans developed the Flink >>>> >>>> runner, >>>>> >>>>> and Cloudera (Labs) developed the Spark runner. >>>>> >>>>> * Cloudera >>>>> >>>>> ** Tom White >>>>> >>>>> * Data Artisans >>>>> >>>>> ** Aljoscha Krettek >>>>> >>>>> ** Kostas Tzoumas >>>>> >>>>> ** Maximilian Michels >>>>> >>>>> ** Stephan Ewen >>>>> >>>>> * Google >>>>> >>>>> ** Ben Chambers >>>>> >>>>> ** Dan Halperin >>>>> >>>>> ** Davor Bonaci >>>>> >>>>> ** Frances Perry >>>>> >>>>> ** James Malone >>>>> >>>>> ** Kenneth Knowles >>>>> >>>>> ** Luke Cwik >>>>> >>>>> ** Tyler Akidau >>>>> >>>>> * PayPal >>>>> >>>>> ** Amit Sela >>>>> >>>>> * Slack >>>>> >>>>> ** Josh Wills >>>>> >>>>> * Talend >>>>> >>>>> ** Jean-Baptiste Onofré >>>>> >>>>> == Sponsors == >>>>> >>>>> === Champion === >>>>> >>>>> * Jean-Baptiste Onofre [jbono...@apache.org] >>>>> >>>>> === Nominated Mentors === >>>>> >>>>> * Jim Jagielski [j...@apache.org] >>>>> >>>>> * Venkatesh Seetharam [venkat...@apache.org] >>>>> >>>>> * Bertrand Delacretaz [bdelacre...@apache.org] >>>>> >>>>> * Ted Dunning [tdunn...@apache.org] >>>>> >>>>> === Sponsoring Entity === >>>>> >>>>> The Apache Incubator >>>>> >>>> >>>> >>>> >>>> -- >>>> Sean >>>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org