Hi Hari,

I'm on the Apache Falcon PMC and Falcon being a data pipeline management
solution for Hadoop, there might be enough interest to explore if we can
collaborate either being part of Falcon or a separate project.

Can you please elaborate on the scope and if orchestration is part of this?
Falcon also integrates with a metadata solution in Apache Atlas which I'm
part of as well.

Thanks!
Venkatesh

On Fri, Jul 15, 2016 at 6:49 AM Srihari Srinivasan
<arya...@yahoo.com.invalid> wrote:

> Hi Folks,
> I am Hari, a developer with a company called ThoughtWorks. We've been
> developing data pipelines using on Hadoop,Spark etc for a while now. From
> our experiences with different customers we've noticed a recurring need to
> carry out tasks such as data preparation, data anonymization etc on large
> datasets using Java MR and Spark.Based on this experience, we have been
> working on building a couple of libraries targeted at data preparation and
> data protection to begin with. Its hosted under an umbrella project
> called Data Commons at the moment (inspired by the Apache Commons project
> which is organized around a similar theme).
> At the moment this is a fledgling project and its contributions are driven
> by our data team. However we are very keen on making this part of the
> larger Apache collective and make it a community driven effort.
> Hence, I am reaching out to you folks for advise on what could be the best
> way forward for this effort. We are also open to explore collaborations
> with other existing projects that are already part of Apache. Please share
> your thoughts, advise.
> -- Hari
>
>

Reply via email to