Re: [DISCUSS] Apache Dataflow Incubator Proposal

Jean-Baptiste Onofré Fri, 22 Jan 2016 02:10:49 -0800

Hi Mayank,

sure: you are in.


Thanks !
Regards
JB

On 01/22/2016 12:29 AM, Mayank Bansal wrote:

Hi Jean,

Nice Proposal.

I wanted to contribute to this project. Can you please add me too?

Thanks a lot for the help

Thanks,
Mayank

On Thu, Jan 21, 2016 at 8:07 AM, Jean-Baptiste Onofré <j...@nanthrax.net
<mailto:j...@nanthrax.net>> wrote:

    Hey Alex,

    awesome: I added you on the proposal.

    Thanks,
    Regards
    JB


    On 01/21/2016 05:03 PM, Alexander Bezzubov wrote:

        Hi,

        it's great to see DataFlow becoming part to Apache ecosystem,
        thank you
        bringing it in.
        I would be happy to get involved and help.

        --
        Alex

        On Thu, Jan 21, 2016 at 8:42 PM, Jean-Baptiste Onofré
        <j...@nanthrax.net <mailto:j...@nanthrax.net>>
        wrote:

            Perfect: done, you are on the proposal.

            Thanks !
            Regards
            JB


            On 01/21/2016 11:55 AM, chatz wrote:

                Charitha Elvitigala

                On 21 January 2016 at 16:17, Jean-Baptiste Onofré
                <j...@nanthrax.net <mailto:j...@nanthrax.net>>
                wrote:

                Hi Chatz,


                    sure, what name should I use on the proposal, Charitha ?

                    Regards
                    JB


                    On 01/21/2016 11:32 AM, chatz wrote:

                    Hi Jean,


                        I’d be interested in contributing as well.

                        Thanks,

                        Chatz


                        On 21 January 2016 at 14:22, Jean-Baptiste
                        Onofré <j...@nanthrax.net <mailto:j...@nanthrax.net>>
                        wrote:

                        Sweet: you are on the proposal ;)


                            Thanks !
                            Regards
                            JB


                            On 01/21/2016 08:55 AM, Byung-Gon Chun wrote:

                            This looks very interesting. I'm interested
                            in contributing.


                                Thanks.
                                -Gon

                                ---
                                Byung-Gon Chun


                                On Thu, Jan 21, 2016 at 1:32 AM, James
                                Malone <
                                jamesmal...@google.com.invalid> wrote:

                                Hello everyone,


                                    Attached to this message is a
                                    proposed new project - Apache
                                    Dataflow, a
                                    unified programming model for data
                                    processing and integration.

                                    The text of the proposal is included
                                    below. Additionally, the
                                    proposal
                                    is
                                    in draft form on the wiki where we
                                    will make any required changes:

                                    
https://wiki.apache.org/incubator/DataflowProposal

                                    We look forward to your feedback and
                                    input.

                                    Best,

                                    James

                                    ----

                                    = Apache Dataflow =

                                    == Abstract ==

                                    Dataflow is an open source, unified
                                    model and set of
                                    language-specific
                                    SDKs
                                    for defining and executing data
                                    processing workflows, and also data
                                    ingestion and integration flows,
                                    supporting Enterprise Integration
                                    Patterns
                                    (EIPs) and Domain Specific Languages
                                    (DSLs). Dataflow pipelines
                                    simplify
                                    the mechanics of large-scale batch
                                    and streaming data processing and
                                    can
                                    run on a number of runtimes like
                                    Apache Flink, Apache Spark, and
                                    Google
                                    Cloud Dataflow (a cloud service).
                                    Dataflow also brings DSL in
                                    different
                                    languages, allowing users to easily
                                    implement their data integration
                                    processes.

                                    == Proposal ==

                                    Dataflow is a simple, flexible, and
                                    powerful system for distributed
                                    data
                                    processing at any scale. Dataflow
                                    provides a unified programming
                                    model, a
                                    software development kit to define
                                    and construct data processing
                                    pipelines,
                                    and runners to execute Dataflow
                                    pipelines in several runtime engines,
                                    like
                                    Apache Spark, Apache Flink, or
                                    Google Cloud Dataflow. Dataflow can be
                                    used
                                    for a variety of streaming or batch
                                    data processing goals including
                                    ETL,
                                    stream analysis, and aggregate
                                    computation. The underlying
                                    programming
                                    model for Dataflow provides
                                    MapReduce-like parallelism, combined
                                    with
                                    support for powerful data windowing,
                                    and fine-grained correctness
                                    control.

                                    == Background ==

                                    Dataflow started as a set of Google
                                    projects focused on making data
                                    processing easier, faster, and less
                                    costly. The Dataflow model is a
                                    successor to MapReduce, FlumeJava,
                                    and Millwheel inside Google and is
                                    focused on providing a unified
                                    solution for batch and stream
                                    processing.
                                    These projects on which Dataflow is
                                    based have been published in
                                    several
                                    papers made available to the public:

                                    * MapReduce -
                                    
http://research.google.com/archive/mapreduce.html

                                    * Dataflow model  -
                                    
http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf

                                    * FlumeJava -
                                    
http://notes.stephenholiday.com/FlumeJava.pdf

                                    * MillWheel -
                                    
http://research.google.com/pubs/pub41378.html

                                    Dataflow was designed from the start
                                    to provide a portable
                                    programming
                                    layer. When you define a data
                                    processing pipeline with the Dataflow
                                    model,
                                    you are creating a job which is
                                    capable of being processed by any
                                    number
                                    of
                                    Dataflow processing engines. Several
                                    engines have been developed to
                                    run
                                    Dataflow pipelines in other open
                                    source runtimes, including a
                                    Dataflow
                                    runner for Apache Flink and Apache
                                    Spark. There is also a “direct
                                    runner”,
                                    for execution on the developer
                                    machine (mainly for dev/debug
                                    purposes).
                                    Another runner allows a Dataflow
                                    program to run on a managed service,
                                    Google Cloud Dataflow, in Google
                                    Cloud Platform. The Dataflow Java
                                    SDK
                                    is
                                    already available on GitHub, and
                                    independent from the Google Cloud
                                    Dataflow
                                    service. Another Python SDK is
                                    currently in active development.

                                    In this proposal, the Dataflow SDKs,
                                    model, and a set of runners will
                                    be
                                    submitted as an OSS project under
                                    the ASF. The runners which are a
                                    part
                                    of
                                    this proposal include those for
                                    Spark (from Cloudera), Flink (from
                                    data
                                    Artisans), and local development
                                    (from Google); the Google Cloud
                                    Dataflow
                                    service runner is not included in
                                    this proposal. Further references
                                    to
                                    Dataflow will refer to the Dataflow
                                    model, SDKs, and runners which
                                    are
                                    a
                                    part of this proposal (Apache
                                    Dataflow) only. The initial submission
                                    will
                                    contain the already-released Java
                                    SDK; Google intends to submit the
                                    Python
                                    SDK later in the incubation process.
                                    The Google Cloud Dataflow
                                    service
                                    will
                                    continue to be one of many runners
                                    for Dataflow, built on Google
                                    Cloud
                                    Platform, to run Dataflow pipelines.
                                    Necessarily, Cloud Dataflow will
                                    develop against the Apache project
                                    additions, updates, and changes.
                                    Google
                                    Cloud Dataflow will become one user
                                    of Apache Dataflow and will
                                    participate
                                    in the project openly and publicly.

                                    The Dataflow programming model has
                                    been designed with simplicity,
                                    scalability, and speed as key
                                    tenants. In the Dataflow model, you
                                    only
                                    need
                                    to think about four top-level
                                    concepts when constructing your data
                                    processing job:

                                    * Pipelines - The data processing
                                    job made of a series of
                                    computations
                                    including input, processing, and output

                                    * PCollections - Bounded (or
                                    unbounded) datasets which represent the
                                    input,
                                    intermediate and output data in
                                    pipelines

                                    * PTransforms - A data processing
                                    step in a pipeline in which one or
                                    more
                                    PCollections are an input and output

                                    * I/O Sources and Sinks - APIs for
                                    reading and writing data which are
                                    the
                                    roots and endpoints of the pipeline

                                    == Rationale ==

                                    With Dataflow, Google intended to
                                    develop a framework which allowed
                                    developers to be maximally
                                    productive in defining the
                                    processing, and
                                    then
                                    be able to execute the program at
                                    various levels of
                                    latency/cost/completeness without
                                    re-architecting or re-writing it.
                                    This
                                    goal was informed by Google’s past
                                    experience  developing several
                                    models,
                                    frameworks, and tools useful for
                                    large-scale and distributed data
                                    processing. While Google has
                                    previously published papers describing
                                    some
                                    of
                                    its technologies, Google decided to
                                    take a different approach with
                                    Dataflow. Google open-sourced the
                                    SDK and model alongside
                                    commercialization
                                    of the idea and ahead of publishing
                                    papers on the topic. As a
                                    result, a
                                    number of open source runtimes exist
                                    for Dataflow, such as the Apache
                                    Flink
                                    and Apache Spark runners.

                                    We believe that submitting Dataflow
                                    as an Apache project will provide
                                    an
                                    immediate, worthwhile, and
                                    substantial contribution to the open
                                    source
                                    community. As an incubating project,
                                    we believe Dataflow will have a
                                    better
                                    opportunity to provide a meaningful
                                    contribution to OSS and also
                                    integrate
                                    with other Apache projects.

                                    In the long term, we believe
                                    Dataflow can be a powerful abstraction
                                    layer
                                    for data processing. By providing an
                                    abstraction layer for data
                                    pipelines
                                    and processing, data workflows can
                                    be increasingly portable,
                                    resilient
                                    to
                                    breaking changes in tooling, and
                                    compatible across many execution
                                    engines,
                                    runtimes, and open source projects.

                                    == Initial Goals ==

                                    We are breaking our initial goals
                                    into immediate (< 2 months),
                                    short-term
                                    (2-4 months), and intermediate-term
                                    (> 4 months).

                                    Our immediate goals include the
                                    following:

                                    * Plan for reconciling the Dataflow
                                    Java SDK and various runners into
                                    one
                                    project

                                    * Plan for refactoring the existing
                                    Java SDK for better extensibility
                                    by
                                    SDK and runner writers

                                    * Validating all dependencies are
                                    ASL 2.0 or compatible

                                    * Understanding and adapting to the
                                    Apache development process

                                    Our short-term goals include:

                                    * Moving the newly-merged lists, and
                                    build utilities to Apache

                                    * Start refactoring codebase and
                                    move code to Apache Git repo

                                    * Continue development of new
                                    features, functions, and fixes in the
                                    Dataflow Java SDK, and Dataflow runners

                                    * Cleaning up the Dataflow SDK
                                    sources and crafting a roadmap and
                                    plan
                                    for
                                    how to include new major ideas,
                                    modules, and runtimes

                                    * Establishment of easy and clear
                                    build/test framework for Dataflow
                                    and
                                    associated runtimes; creation of
                                    testing, rollback, and validation
                                    policy

                                    * Analysis and design for work
                                    needed to make Dataflow a better data
                                    processing abstraction layer for
                                    multiple open source frameworks and
                                    environments

                                    Finally, we have a number of
                                    intermediate-term goals:

                                    * Roadmapping, planning, and
                                    execution of integrations with other OSS
                                    and
                                    non-OSS projects/products

                                    * Inclusion of additional SDK for
                                    Python, which is under active
                                    development

                                    == Current Status ==

                                    === Meritocracy ===

                                    Dataflow was initially developed
                                    based on ideas from many employees
                                    within
                                    Google. As an ASL OSS project on
                                    GitHub, the Dataflow SDK has
                                    received
                                    contributions from data Artisans,
                                    Cloudera Labs, and other individual
                                    developers. As a project under
                                    incubation, we are committed to
                                    expanding
                                    our effort to build an environment
                                    which supports a meritocracy. We
                                    are
                                    focused on engaging the community
                                    and other related projects for
                                    support
                                    and contributions. Moreover, we are
                                    committed to ensure contributors
                                    and
                                    committers to Dataflow come from a
                                    broad mix of organizations
                                    through a
                                    merit-based decision process during
                                    incubation. We believe strongly
                                    in
                                    the
                                    Dataflow model and are committed to
                                    growing an inclusive community of
                                    Dataflow contributors.

                                    === Community ===

                                    The core of the Dataflow Java SDK
                                    has been developed by Google for
                                    use
                                    with
                                    Google Cloud Dataflow. Google has
                                    active community engagement in the
                                    SDK
                                    GitHub repository (
                                    
https://github.com/GoogleCloudPlatform/DataflowJavaSDK
                                    ),
                                    on Stack Overflow (
                                    
http://stackoverflow.com/questions/tagged/google-cloud-dataflow)
                                    and
                                    has
                                    had contributions from a number of
                                    organizations and indivuduals.

                                    Everyday, Cloud Dataflow is actively
                                    used by a number of
                                    organizations
                                    and
                                    institutions for batch and stream
                                    processing of data. We believe
                                    acceptance
                                    will allow us to consolidate
                                    existing Dataflow-related work, grow the
                                    Dataflow community, and deepen
                                    connections between Dataflow and other
                                    open
                                    source projects.

                                    === Core Developers ===

                                    The core developers for Dataflow and
                                    the Dataflow runners are:

                                    * Frances Perry

                                    * Tyler Akidau

                                    * Davor Bonaci

                                    * Luke Cwik

                                    * Ben Chambers

                                    * Kenn Knowles

                                    * Dan Halperin

                                    * Daniel Mills

                                    * Mark Shields

                                    * Craig Chambers

                                    * Maximilian Michels

                                    * Tom White

                                    * Josh Wills

                                    === Alignment ===

                                    The Dataflow SDK can be used to
                                    create Dataflow pipelines which can
                                    be
                                    executed on Apache Spark or Apache
                                    Flink. Dataflow is also related to
                                    other
                                    Apache projects, such as Apache
                                    Crunch. We plan on expanding
                                    functionality
                                    for Dataflow runners, support for
                                    additional domain specific
                                    languages,
                                    and
                                    increased portability so Dataflow is
                                    a powerful abstraction layer for
                                    data
                                    processing.

                                    == Known Risks ==

                                    === Orphaned Products ===

                                    The Dataflow SDK is presently used
                                    by several organizations, from
                                    small
                                    startups to Fortune 100 companies,
                                    to construct production pipelines
                                    which
                                    are executed in Google Cloud
                                    Dataflow. Google has a long-term
                                    commitment
                                    to
                                    advance the Dataflow SDK; moreover,
                                    Dataflow is seeing increasing
                                    interest,
                                    development, and adoption from
                                    organizations outside of Google.

                                    === Inexperience with Open Source ===

                                    Google believes strongly in open
                                    source and the exchange of
                                    information
                                    to
                                    advance new ideas and work. Examples
                                    of this commitment are active
                                    OSS
                                    projects such as Chromium
                                    (https://www.chromium.org) and
                                    Kubernetes
                                    (
                                    http://kubernetes.io/). With
                                    Dataflow, we have tried to be
                                    increasingly
                                    open and forward-looking; we have
                                    published a paper in the VLDB
                                    conference
                                    describing the Dataflow model (
                                    
http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf)
                                    and were quick to
                                    release
                                    the Dataflow SDK as open source
                                    software with the launch of Cloud
                                    Dataflow.
                                    Our submission to the Apache
                                    Software Foundation is a logical
                                    extension
                                    of
                                    our commitment to open source software.

                                    === Homogeneous Developers ===

                                    The majority of committers in this
                                    proposal belong to Google due to
                                    the
                                    fact that Dataflow has emerged from
                                    several internal Google projects.
                                    This
                                    proposal also includes committers
                                    outside of Google who are actively
                                    involved with other Apache projects,
                                    such as Hadoop, Flink, and
                                    Spark.
                                    We
                                    expect our entry into incubation
                                    will allow us to expand the number
                                    of
                                    individuals and organizations
                                    participating in Dataflow development.
                                    Additionally, separation of the
                                    Dataflow SDK from Google Cloud
                                    Dataflow
                                    allows us to focus on the open
                                    source SDK and model and do what is
                                    best
                                    for
                                    this project.

                                    === Reliance on Salaried Developers ===

                                    The Dataflow SDK and Dataflow
                                    runners have been developed primarily
                                    by
                                    salaried developers supporting the
                                    Google Cloud Dataflow project.
                                    While
                                    the
                                    Dataflow SDK and Cloud Dataflow have
                                    been developed by different
                                    teams
                                    (and
                                    this proposal would reinforce that
                                    separation) we expect our initial
                                    set
                                    of
                                    developers will still primarily be
                                    salaried. Contribution has not
                                    been
                                    exclusively from salaried
                                    developers, however. For example, the
                                    contrib
                                    directory of the Dataflow SDK (



                                    
https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/master/contrib
                                    )
                                    contains items from free-time
                                    contributors. Moreover, seperate
                                    projects,
                                    such as ScalaFlow
                                    (https://github.com/darkjh/scalaflow) have
                                    been
                                    created
                                    around the Dataflow model and SDK.
                                    We expect our reliance on salaried
                                    developers will decrease over time
                                    during incubation.

                                    === Relationship with other Apache
                                    products ===

                                    Dataflow directly interoperates with
                                    or utilizes several existing
                                    Apache
                                    projects.

                                    * Build

                                    ** Apache Maven

                                    * Data I/O, Libraries

                                    ** Apache Avro

                                    ** Apache Commons

                                    * Dataflow runners

                                    ** Apache Flink

                                    ** Apache Spark

                                    Dataflow when used in batch mode
                                    shares similarities with Apache
                                    Crunch;
                                    however, Dataflow is focused on a
                                    model, SDK, and abstraction layer
                                    beyond
                                    Spark and Hadoop (MapReduce.) One
                                    key goal of Dataflow is to provide
                                    an
                                    intermediate abstraction layer which
                                    can easily be implemented and
                                    utilized
                                    across several different processing
                                    frameworks.

                                    === An excessive fascination with
                                    the Apache brand ===

                                    With this proposal we are not
                                    seeking attention or publicity. Rather,
                                    we
                                    firmly believe in the Dataflow
                                    model, SDK, and the ability to make
                                    Dataflow
                                    a powerful yet simple framework for
                                    data processing. While the
                                    Dataflow
                                    SDK
                                    and model have been open source, we
                                    believe putting code on GitHub
                                    can
                                    only
                                    go so far. We see the Apache
                                    community, processes, and mission as
                                    critical
                                    for ensuring the Dataflow SDK and
                                    model are truly community-driven,
                                    positively impactful, and innovative
                                    open source software. While
                                    Google
                                    has
                                    taken a number of steps to advance
                                    its various open source projects,
                                    we
                                    believe Dataflow is a great fit for
                                    the Apache Software Foundation
                                    due
                                    to
                                    its focus on data processing and its
                                    relationships to existing ASF
                                    projects.

                                    == Documentation ==

                                    The following documentation is
                                    relevant to this proposal. Relevant
                                    portion
                                    of the documentation will be
                                    contributed to the Apache Dataflow
                                    project.

                                    * Dataflow website:
                                    https://cloud.google.com/dataflow

                                    * Dataflow programming model:
                                    
https://cloud.google.com/dataflow/model/programming-model

                                    * Codebases

                                    ** Dataflow Java SDK:
                                    
https://github.com/GoogleCloudPlatform/DataflowJavaSDK

                                    ** Flink Dataflow runner:
                                    
https://github.com/dataArtisans/flink-dataflow

                                    ** Spark Dataflow runner:
                                    https://github.com/cloudera/spark-dataflow

                                    * Dataflow Java SDK issue tracker:
                                    
https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues

                                    * google-cloud-dataflow tag on Stack
                                    Overflow:
                                    
http://stackoverflow.com/questions/tagged/google-cloud-dataflow

                                    == Initial Source ==

                                    The initial source for Dataflow
                                    which we will submit to the Apache
                                    Foundation will include several
                                    related projects which are currently
                                    hosted
                                    on the GitHub repositories:

                                    * Dataflow Java SDK (
                                    
https://github.com/GoogleCloudPlatform/DataflowJavaSDK)

                                    * Flink Dataflow runner (
                                    
https://github.com/dataArtisans/flink-dataflow)

                                    * Spark Dataflow runner
                                    (https://github.com/cloudera/spark-dataflow)

                                    These projects have always been
                                    Apache 2.0 licensed. We intend to
                                    bundle
                                    all of these repositories since they
                                    are all complimentary and should
                                    be
                                    maintained in one project. Prior to
                                    our submission, we will combine
                                    all
                                    of
                                    these projects into a new git
                                    repository.

                                    == Source and Intellectual Property
                                    Submission Plan ==

                                    The source for the Dataflow SDK and
                                    the three runners (Spark, Flink,
                                    Google
                                    Cloud Dataflow) are already licensed
                                    under an Apache 2 license.

                                    * Dataflow SDK -



                                    
https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/LICENSE

                                    * Flink runner -
                                    
https://github.com/dataArtisans/flink-dataflow/blob/master/LICENSE

                                    * Spark runner -
                                    
https://github.com/cloudera/spark-dataflow/blob/master/LICENSE

                                    Contributors to the Dataflow SDK
                                    have also signed the Google
                                    Individual
                                    Contributor License Agreement (
                                    
https://cla.developers.google.com/about/google-individual)
                                    in order
                                    to
                                    contribute to the project.

                                    With respect to trademark rights,
                                    Google does not hold a trademark on
                                    the
                                    phrase “Dataflow.” Based on feedback
                                    and guidance we receive during
                                    the
                                    incubation process, we are open to
                                    renaming the project if necessary
                                    for
                                    trademark or other concerns.

                                    == External Dependencies ==

                                    All external dependencies are
                                    licensed under an Apache 2.0 or
                                    Apache-compatible license. As we
                                    grow the Dataflow community we will
                                    configure our build process to
                                    require and validate all contributions
                                    and
                                    dependencies are licensed under the
                                    Apache 2.0 license or are under
                                    an
                                    Apache-compatible license.

                                    == Required Resources ==

                                    === Mailing Lists ===

                                    We currently use a mix of mailing
                                    lists. We will migrate our existing
                                    mailing lists to the following:

                                    * d...@dataflow.incubator.apache.org
                                    <mailto:d...@dataflow.incubator.apache.org>

                                    * u...@dataflow.incubator.apache.org
                                    <mailto:u...@dataflow.incubator.apache.org>

                                    *
                                    priv...@dataflow.incubator.apache.org 
<mailto:priv...@dataflow.incubator.apache.org>

                                    *
                                    comm...@dataflow.incubator.apache.org 
<mailto:comm...@dataflow.incubator.apache.org>

                                    === Source Control ===

                                    The Dataflow team currently uses Git
                                    and would like to continue to do
                                    so.
                                    We request a Git repository for
                                    Dataflow with mirroring to GitHub
                                    enabled.

                                    === Issue Tracking ===

                                    We request the creation of an
                                    Apache-hosted JIRA. The Dataflow
                                    project
                                    is
                                    currently using both a public GitHub
                                    issue tracker and internal
                                    Google
                                    issue tracking. We will migrate and
                                    combine from these two sources to
                                    the
                                    Apache JIRA.

                                    == Initial Committers ==

                                    * Aljoscha Krettek
                                      [aljos...@apache.org
                                    <mailto:aljos...@apache.org>]

                                    * Amit Sela
                                    [amitsel...@gmail.com
                                    <mailto:amitsel...@gmail.com>]

                                    * Ben Chambers
                                      [bchamb...@google.com
                                    <mailto:bchamb...@google.com>]

                                    * Craig Chambers
                                      [chamb...@google.com
                                    <mailto:chamb...@google.com>]

                                    * Dan Halperin
                                      [dhalp...@google.com
                                    <mailto:dhalp...@google.com>]

                                    * Davor Bonaci
                                      [da...@google.com
                                    <mailto:da...@google.com>]

                                    * Frances Perry
                                    [f...@google.com <mailto:f...@google.com>]

                                    * James Malone
                                      [jamesmal...@google.com
                                    <mailto:jamesmal...@google.com>]

                                    * Jean-Baptiste Onofré
                                    [jbono...@apache.org
                                    <mailto:jbono...@apache.org>]

                                    * Josh Wills
                                      [jwi...@apache.org
                                    <mailto:jwi...@apache.org>]

                                    * Kostas Tzoumas
                                      [kos...@data-artisans.com
                                    <mailto:kos...@data-artisans.com>]

                                    * Kenneth Knowles
                                    [k...@google.com <mailto:k...@google.com>]

                                    * Luke Cwik
                                    [lc...@google.com
                                    <mailto:lc...@google.com>]

                                    * Maximilian Michels
                                      [m...@apache.org
                                    <mailto:m...@apache.org>]

                                    * Stephan Ewen
                                      [step...@data-artisans.com
                                    <mailto:step...@data-artisans.com>]

                                    * Tom White
                                    [t...@cloudera.com
                                    <mailto:t...@cloudera.com>]

                                    * Tyler Akidau
                                      [taki...@google.com
                                    <mailto:taki...@google.com>]

                                    == Affiliations ==

                                    The initial committers are from six
                                    organizations. Google developed
                                    Dataflow and the Dataflow SDK, data
                                    Artisans developed the Flink
                                    runner,
                                    and Cloudera (Labs) developed the
                                    Spark runner.

                                    * Cloudera

                                    ** Tom White

                                    * Data Artisans

                                    ** Aljoscha Krettek

                                    ** Kostas Tzoumas

                                    ** Maximilian Michels

                                    ** Stephan Ewen

                                    * Google

                                    ** Ben Chambers

                                    ** Dan Halperin

                                    ** Davor Bonaci

                                    ** Frances Perry

                                    ** James Malone

                                    ** Kenneth Knowles

                                    ** Luke Cwik

                                    ** Tyler Akidau

                                    * PayPal

                                    ** Amit Sela

                                    * Slack

                                    ** Josh Wills

                                    * Talend

                                    ** Jean-Baptiste Onofré

                                    == Sponsors ==

                                    === Champion ===

                                    * Jean-Baptiste Onofre
                                    [jbono...@apache.org
                                    <mailto:jbono...@apache.org>]

                                    === Nominated Mentors ===

                                    * Jim Jagielski
                                      [j...@apache.org
                                    <mailto:j...@apache.org>]

                                    * Venkatesh Seetharam
                                      [venkat...@apache.org
                                    <mailto:venkat...@apache.org>]

                                    * Bertrand Delacretaz
                                      [bdelacre...@apache.org
                                    <mailto:bdelacre...@apache.org>]

                                    * Ted Dunning
                                      [tdunn...@apache.org
                                    <mailto:tdunn...@apache.org>]

                                    === Sponsoring Entity ===

                                    The Apache Incubator





                                --

                            Jean-Baptiste Onofré
                            jbono...@apache.org <mailto:jbono...@apache.org>
                            http://blog.nanthrax.net
                            Talend - http://www.talend.com

                            
---------------------------------------------------------------------
                            To unsubscribe, e-mail:
                            general-unsubscr...@incubator.apache.org
                            <mailto:general-unsubscr...@incubator.apache.org>
                            For additional commands, e-mail:
                            general-h...@incubator.apache.org
                            <mailto:general-h...@incubator.apache.org>




                        --

                    Jean-Baptiste Onofré
                    jbono...@apache.org <mailto:jbono...@apache.org>
                    http://blog.nanthrax.net
                    Talend - http://www.talend.com

                    
---------------------------------------------------------------------
                    To unsubscribe, e-mail:
                    general-unsubscr...@incubator.apache.org
                    <mailto:general-unsubscr...@incubator.apache.org>
                    For additional commands, e-mail:
                    general-h...@incubator.apache.org
                    <mailto:general-h...@incubator.apache.org>




            --
            Jean-Baptiste Onofré
            jbono...@apache.org <mailto:jbono...@apache.org>
            http://blog.nanthrax.net
            Talend - http://www.talend.com

            
---------------------------------------------------------------------
            To unsubscribe, e-mail:
            general-unsubscr...@incubator.apache.org
            <mailto:general-unsubscr...@incubator.apache.org>
            For additional commands, e-mail:
            general-h...@incubator.apache.org
            <mailto:general-h...@incubator.apache.org>




    --
    Jean-Baptiste Onofré
    jbono...@apache.org <mailto:jbono...@apache.org>
    http://blog.nanthrax.net
    Talend - http://www.talend.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
    <mailto:general-unsubscr...@incubator.apache.org>
    For additional commands, e-mail: general-h...@incubator.apache.org
    <mailto:general-h...@incubator.apache.org>




--
Thanks and Regards,
Mayank
Cell: 408-718-9370


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [DISCUSS] Apache Dataflow Incubator Proposal

Reply via email to