Hi Hari, I'm on the Apache Falcon PMC and Falcon being a data pipeline management solution for Hadoop, there might be enough interest to explore if we can collaborate either being part of Falcon or a separate project.
Can you please elaborate on the scope and if orchestration is part of this? Falcon also integrates with a metadata solution in Apache Atlas which I'm part of as well. Thanks! Venkatesh On Fri, Jul 15, 2016 at 6:49 AM Srihari Srinivasan <arya...@yahoo.com.invalid> wrote: > Hi Folks, > I am Hari, a developer with a company called ThoughtWorks. We've been > developing data pipelines using on Hadoop,Spark etc for a while now. From > our experiences with different customers we've noticed a recurring need to > carry out tasks such as data preparation, data anonymization etc on large > datasets using Java MR and Spark.Based on this experience, we have been > working on building a couple of libraries targeted at data preparation and > data protection to begin with. Its hosted under an umbrella project > called Data Commons at the moment (inspired by the Apache Commons project > which is organized around a similar theme). > At the moment this is a fledgling project and its contributions are driven > by our data team. However we are very keen on making this part of the > larger Apache collective and make it a community driven effort. > Hence, I am reaching out to you folks for advise on what could be the best > way forward for this effort. We are also open to explore collaborations > with other existing projects that are already part of Apache. Please share > your thoughts, advise. > -- Hari > >