Exciting stuff, it may have already been said but the name is pretty bad. To my (native) English ear it sounds like "Amateur".
Ross -----Original Message----- From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] Sent: Monday, March 20, 2017 11:39 PM To: general@incubator.apache.org Subject: Re: [DISCUSS] Apache Amaterasu Incubator Proposal Hi all, gently reminder about this thread. I would like to start a formal vote pretty soon. Thanks, Regards JB On 03/07/2017 09:49 PM, Jean-Baptiste Onofré wrote: > Hi all, > > I would like to submit a new proposal to bring Amaterasu to the Apache > Software Foundation incubator. > > The proposal is included below and available on the wiki: > > https://wiki.apache.org/incubator/AmaterasuProposal > > We are eager to get your comments and questions. > > Thanks ! > JB (on behalf of the Amaterasu community) > > = Apache Amaterasu = > > == Abstract == > > Apache Amaterasu is a framework providing continuous deployment for > Big Data pipelines. > > It provides the following capabilities: > > * '''Continuous integration''' tools to '''package pipelines and run > tests'''. > * A repository to store those packaged applications: the > '''applications repository'''. > * A repository to store the pipelines, and engine configuration (for > instance, location of the Spark master, etc.): per environment - the > '''configuration repository'''. > * A '''dashboard''' to monitor the pipelines. > * A '''DSL and integration hooks''' allowing third parties to easily > integrate. > > == Proposal == > > Amaterasu is a simple and powerful framework to build and dispense pipelines. > It aims to help data engineers and data scientists to compose, > configure, test, package, deploy and execute data pipelines written > using multiple tools, languages and frameworks. Amaterasu provides a > standard repo structure to package big data pipelines, a YAML based > Domain Specific Languages (DSL) for data engineers, data scientists > and operations engineers to manage complex pipelines throughout their entire > lifecycle (Dev, UAT, Prod, etc.). > > == Background == > > Amaterasu is a relatively new project that was created to deal with > some of the issues that as Consultants, we have seen recurring at different > client sites. > Mainly the need to continuously deploy complex pipelines built in > multiple tools and languages. > Amaterasu started as a pet project and is currently being evaluated by > a couple of organizations, supported by the contributors, on a > personal time and voluntary bases. > > == Rational == > > As software engineers working on big data projects we have straggled > for a long time to apply the same CI/CD practices that have become the > standard in the software industry for the last few years. While some > of them are possible, for example Apache Spark is easy to unit test. > However large scale pipelines are more complex and often use data, > which might be un-structured as integration point, which requires heavy > integration tests. > > To automate such tests and complex deployments, we have found the need > to often handcraft scripts and use a mixture tools, so we have decided > to finally build a tool we can apply in a general way and not on a project by > project basis. > > Another issue Amaterasu is trying to tackle is the Integrating between > the work of software engineers, data scientists, and sometimes > operations engineers. The approach Amaterasu takes to integrate > between those three schools of thought it to provide a simple YAML > based DSL that provides a simple way to integrate different pipeline > written in the native tools for each task (R, Spark in different languages, > etc.). > > == Initial Goals == > > Our initial goals are to bring Amaterasu into the ASF, transition > internal engineering processes into the open, and foster a > collaborative development model according to the "Apache Way". > > In addition, we intend to continue the development of Amaterasu, add > new features as well as integrate better with other frameworks, including: > > * Apache Arrow > * Apache Hive > * Apache Drill > * Apache Beam > * Apache YARN > * Farther and more complete integration with Apache Spark > > Other frameworks will be evaluated after those initial goals are reached. > > == Current Status == > > Amaterasu is preview state but provide a large set of features. We > plan to stabilize and head to a first production ready release during > the incubation process. The current license is already Apache 2.0. > > === Meritocracy === > > We intend to radically expand the initial developer and user community > by running the project in accordance with the "Apache Way". Users and > new contributors will be treated with respect and welcomed. By > participating in the community and providing quality patches/support > that move the project forward, they will earn merit. They also will be > encouraged to provide non-code contributions (documentation, events, > community management, etc.) and will gain merit for doing so. Those > with a proven support and quality track record will be encouraged to become > committers. > > === Community === > > As a relatively new project, Amaterasu has a small, but growing community. > Amaterasu is an open project, not just with it’s source code but also > with our discussions which are held openly in our slack > https://shintoio.slack.com which contains channels for design, tech and > future directions discussions. > > If Amaterasu is accepted for incubation, the primary initial goal is > to build a large and strong community. We are confident that Amaterasu > can become a key project for big data operations, which hopefully will > create a large community of users and developers. > > === Known Risks === > > Development has been sponsored mostly by a one company. For the > project to fully transition to the Apache Way governance model, > development must shift towards the meritocracy-centric model of > growing a community of contributors balanced with the needs for extreme > stability and core implementation coherency. > > === Orphaned products === > > We are fully committed on Amaterasu. A few organizations have > expressed their interest in using Amaterasu. > > === Inexperience with Open Source === > > We have been developing and using open source software for a long time. > Additionally, several ASF veterans have agreed to mentor the project > and they are listed in this proposal. The project will rely on their > guidance and collective wisdom to quickly transition the entire team > of initial committers towards practicing the Apache Way. > > === Reliance on Salaried Developers === > > Most of the current contributors are employed in the Big Data space. > While they might wander from their current employers, they are > unlikely to venture far from their core expertises and thus will > continue to be engaged with the project regardless of their current employers. > > === An Excessive Fascination with the Apache Brand === > > While we intend to leverage the Apache ‘branding’ when talking to > other projects as testament of our project’s ‘neutrality’, we have no > plans for making use of Apache brand in press releases nor posting > billboards advertising acceptance of Amaterasu into Apache Incubator. > > The main purpose in applying for Apache incubation is due to the fact > that Amaterasu is built with integration already in mind for many > tools which are Apache projects, and we see Amaterasu as an extension > of these projects. We hope that by being an Apache project, we can > integrate better, and collaborate more effectively with the relevant > projects. As Amaterasu matures, we see mutual benefits for all involved. > > === Initial Source === > > https://github.com/shintoio/amaterasu > > === External Dependencies === > > All external dependencies are licensed under an Apache 2.0 license or > Apache-compatible license. As we grow the Amaterasu community we will > configure our build process to require and validate all contributions > and dependencies are licensed under the Apache 2.0 license or are under an > Apache-compatible license. > > * Apache Spark > * Apache Hadoop > * Apache Maven (maven-core) > * Apache Commons > * Apache Log4j > * Apache Mesos > * Apache Zookeeper > * Apache Curator > * Scala > * Junit > * Py4j > > Future versions are planned to integrate with: > > * Apache YARN > * Apache Hive > * Apache Drill > > === Required Resources === > > ==== Mailing lists ==== > > * priv...@amaterasu.incubator.apache.org (moderated subscriptions) > * comm...@amaterasu.incubator.apache.org > * d...@amaterasu.incubator.apache.org > * iss...@amaterasu.incubator.apache.org > > ==== Git Repository ==== > > * https://git-wip-us.apache.org/repos/asf/incubator-amaterasu.git > > ==== Issue Tracking ==== > > * JIRA Project Amaterasu > > ==== Initial Committers ===== > > * Yaniv Rodenski > * Jean-Baptiste Onofré > * Eyal Ben Ivri > * Karel Alfonso > * Kirupagaran (Kirupa) Devarajan > * Nadav Har Tzvi > > ==== Affiliations ==== > > * Yaniv Rodenski - Shinto > * Jean-Baptiste Onofré - Talend > * Olivier Lamy - Webtide > > ==== Sponsors ==== > > ==== Champion ==== > > * Jean-Baptiste Onofré > > ==== Mentors ==== > > * Jean-Baptiste Onofré > * Olivier Lamy > > ==== Sponsoring Entity ==== > > The Apache Incubator > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org