Thank you very much Kevin! On Tue, Sep 8, 2020 at 12:07 PM Kevin Ratnasekera <djkevincr1...@gmail.com> wrote:
> +1 ( binding ) Interesting project. Please add me as a mentor to the > project. > > On Tue, Sep 8, 2020 at 3:26 PM Matt Casters > <matt.cast...@neotechnology.com.invalid> wrote: > > > Hello Apache, > > > > Our community is eager to propose for Hop to join the Apache Incubator. > > The Hop Orchestration Platform aims to help people with complex data and > > metadata orchestration problems. > > > > Below is the complete text of the proposal but you can also find it here: > > https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal > > > > Any help with respect to the incubation is appreciated including help > from > > a few more mentors to set us on the right track. On behalf of my > community > > I'd be happy to answer any questions you might have regarding Hop. Our > > thanks go out to Max, Julian and Tom for helping us set up this proposal. > > > > Thanks in advance for your time! > > > > Best regards, > > > > Matt - Hop co-founder > > www.project-hop.org > > --- > > > > Abstract > > ========= > > Hop is short for the Hop Orchestration Platform. Written completely in > Java > > it aims to provide a wide range of data orchestration tools, including a > > visual development environment, servers, metadata analysis, auditing > > services and so on. As a platform Hop also wants to be a re-usable > library > > so that it can be easily re-used by other software. > > > > Proposal > > ========= > > Hop provides all the tools to build, maintain and deploy data > > orchestration, ETL and data integration solutions. For example, Hop > allows > > you to diagram a data flow that propagates changes from a database via > > Apache Kafka to a data warehouse and deploy it as an Apache Beam > pipeline. > > The core concepts of Hop are Pipelines and Workflows. > > * Pipelines do the core data manipulation work (read, manipulate, write > > data). The main items of work in pipelines are transforms. A pipeline > > consists of two or more (usually many) transforms that each perform a > > granular piece of work. The transforms in a pipeline run in parallel, and > > together create a powerful data processing tool. > > * Workflows take care of the orchestration of actions: execute pipelines, > > run child workflows, environment checks, preparation, problem alerting > and > > so on. > > If these terms sound familiar it’s because they are taken from the Apache > > Beam and Apache Airflow projects. > > > > > > The main components of the Hop platform are: > > * hop-gui, a visual data orchestration IDE > > * hop-run: a CLI tool to run workflows or pipelines > > * hop-config: a CLI tool to configure Hop and its components > > * hop-server: a light-weight web server to run and monitor workflows and > > pipelines > > * hop-translator: a tool for translating the various parts of the Hop > tools > > (i18n). > > * hop-web: a thin client version of hop-gui for web browsers and mobile > > devices > > > > > > The cornerstone of the Hop platform is extensibility: all major > components > > of the platform are designed to be pluggable. This allows any possible > > missing functionality to be created in a short amount of time. > > > > Background > > =========== > > The Hop Orchestration Platform has its origins in the Kettle community. > > Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi > > in 2015, the community struck out to solve problems less aligned with > > Hitachi’s interests. > > > > Rationale > > ========== > > In the Hop community, we have always aimed to function as a meritocracy, > > where contributions are accepted based on merit, and individuals gain > > status in the community based on their contributions (coding and > > otherwise). We’re proud to have a diverse group of people doing all the > > required things in a project: development , documentation, tutorials, > > architecture, testing, graphics design and much more. Bringing the > project > > under the Apache Software Foundation would allow us to continue and grow, > > but also give our users confidence about the governance, IP status, and > > future of the project. > > > > ASF Preparation Phase > > ====================== > > The very first goal of project Hop is to find a good way to cooperate on > > the development across wide geographical, economical and social spectra. > To > > make this possible real changes were needed to a codebase which is > > essentially 20 years old. Most of these changes have been tackled by now. > > We think it’s fair to say that by now, Hop is a new platform even though > it > > shares a common background as it partly started from the Kettle code > base. > > Here are a few of the key focus areas we’re trying to saveguard going > > forward: > > * Plugins: lightweight plugins for all major functionality. This makes it > > possible to extend Hop or reduce Hop in size. It also allows people to > > implement or change functionality with minimal coding. In other words it > > makes it easier to contribute. > > * Maintain an open and responsive community where every concern, feedback > > and contribution is welcome. > > * Maintain a clear focus on data orchestration user requirements, not on > > “industry trends” > > * Documentation: we set up a version controlled “adoc” system with > > automated builds which is both open, controlled and reviewed. This is > > incredibly important for every Hop user and developer. > > * Testing and stability: we want to massively increase stability by > > implementing integration tests beyond the standard Java unit testing > > because of the dynamic nature of data orchestration work. We still have > a > > long way to go. This work will never be finished. It’s a clear and > > important goal nevertheless. > > * Simplicity: things are complex enough. We follow the example of > projects > > like Apache Spark and Flink and so as an example “hop-run.sh” does > exactly > > what the name says without the need to dive into documentation. As much > as > > possible we make things self-evident and will re-use existing > terminology. > > > > > > For a list of the changes you can look at the monthly roundup which was > > compiled since February 2020. It documents to hard work of our community > > so far: > > > > > > http://www.project-hop.org/news/roundup-2020-02/ > > http://www.project-hop.org/news/roundup-2020-03/ > > http://www.project-hop.org/news/roundup-2020-04/ > > http://www.project-hop.org/news/roundup-2020-05/ > > http://www.project-hop.org/news/roundup-2020-06/ > > http://www.project-hop.org/news/roundup-2020-08/ > > > > > > Goals > > ====== > > Here are a few more details and specifics of things we still want to take > > on going forward: > > * Add more plugin metadata to Transforms and Action plugins as well as > > their supported engines. This will make it easier to refine the user > > interface and make the user experience better by giving to the point > > feedback on what operations are supported and required. Example metadata > > to add: extra version and build information, dependencies, tags and > labels > > (replacing categories), documentation links, input and output > capabilities, > > engine capabilities and so on. > > * SWT: While the Eclipse SWT project is still supported we want to make > a > > list of all the commonly used API calls and stick to those with our own > > API. This will help the development of hop-web and allow us to possibly > > more easily migrate to different user interfaces later on. > > * Integration testing: every transform and action should have an > > integration test before it is released to ensure quality. Java unit > > testing has been proven to be insufficient in guarding against backward > > compatibility, stability and functionality. We need to do better. > > * Apache VFS: Hop makes extensive use of this API to handle files. As > such > > we want to implement the various drivers for gs://, hdfs://, s3:// > through > > standard Kettle plugins making it easier to choose which protocols to > > support. > > * Variables & Parameters: make this experience more intuitive, clean up > > the underlying API and add more options to the various user interfaces > > responsible for setting and passing variables and parameters. > > * Make Hop-Web an integral part of the Apache Hop project removing the > code > > duplication (fork) we’re dealing with now. This includes the need to > > improve various user interfaces which were designed for non-web clients. > > * Make best practices and governance functionality an integral part of > the > > API of the project: > > * Data sets and unit testing (already done) > > * Environments and lifecycle management (partly done) > > * Git support (partly done) > > * Auditing and lineage > > * Software policies and enforcement thereof > > * Configuration management (partly done) > > > > > > Current Status > > =============== > > > > Meritocracy > > ------------ > > With Project Hop, we actively work to foster the existing community and > > encourage community contributions. As of September 1st 2020 we received > > over 250 pull requests and have around 600 tickets in our JIRA platform > (a > > lot of which were created by community members) and have active > discussions > > in our Mattermost chat platform with over 80 members. > > > > > > The last half year we started to ask users on our chat chat server for > > specific feedback on terminology, features and so on. It’s been a > > wonderfully positive experience to have in-depth discussions on complex > > issues with industry experts. We look forward to moving these discussions > > and votes to an Apache mailing list. > > > > Community > > ------------ > > Hop is developed, extended and maintained by a global community of users > > and developers. The Hop community is what has driven its development and > > growth. > > The particular past history of Hop has led to a lot of interest for the > > project and already led to a number of contributions, documentation and > > translations. > > > > Core Developers > > ---------------- > > We have a diverse group of core developers with people joining on a > regular > > basis. Matt Casters, Rodrigo Haces and David Rosenblum are part time > > developers on Hop, salaried by Neo Solutions. Bart Maertens, Hans Van > > Akelyen, Yannick Mols are part time Hop developers paid for by company > > know.bi. Doug and Gretchen Moran were Pentaho employees but along with > > Rafael Valenzuela, Dan Keeley, Jason Chu, Sergio Ramazzina and many > others > > they can be considered to be long time consultants and community members > > for over a decade that joined the Hop community in the last year or two. > > > > > > Alignment > > ---------- > > We want to anchor and safeguard our development and community building > > efforts for the future. We strongly believe that as an Apache project > this > > can be achieved in the best possible way. The Hop project also started to > > align with projects like Apache Beam, Spark and Flink in it's use of > > terminology, tools, manner of configuration and so on. As mentioned > > elsewhere in this document Hop is a large user of other Apache projects > and > > libraries and we believe that becoming an Apache project is beneficial. > > Specifically for Apache Beam we believe that providing a visual pipeline > > development tool can be of great value. > > > > Known Risks > > ============ > > While the current code-base of Kettle on which we have started from is > > already released under the Apache Public License 2.0 proper attribution > > needs to happen to Hitachi Vantara. > > We have no knowledge of existing patents on any part of the Kettle > > codebase. > > To further reduce any risk of there even being any discussion on naming > the > > Hop team decided to rename the project, its tools (to be more > self-evident > > as well), the java API and even the main concepts (Transformations are > now > > called Pipelines, in line with Apache Beam naming conventions). > > > > Orphaned products > > ------------------ > > There is little risk that the project will become orphaned. The list of > > active developers is large, and consists of a mix of developers who have > > been working on the code for several years and recent arrivals in the > > community. > > > > Inexperience with Open Source > > ------------------------------ > > The project team has a long history in open source and has contributed to > > Apache licensed open source projects, mostly in the Kettle ecosystem such > > as Kettle itself and the many plugins and projects surrounding it. The > > experience gained there has allowed us to quickly set up all required > build > > tools and processes. In its fairly short history, Hop has been > advocating > > open source in all aspects of the project. Our submission to the Apache > > Software Foundation is a logical extension of our commitment to open > source > > software. > > > > Licensing > > ---------- > > The original source code we started from (see below) has been open source > > since december 2005, initially under the Lesser GPL but since January > 2012 > > all under the Apache License version 2.0. All Hop code has been scanned > for > > compliance with APL 2.0. We integrated Apache Rat with our build process. > > > > Heterogeneous Developers > > ------------------------- > > Hop is built, developed and maintained by a global community of > > developers. Input comes from a large group of developers and users from > > all over the world. At this moment over 7 companies contribute to Hop > > through the developers along with a list of individuals and consultants. > > > > Reliance on Salaried Developers > > -------------------------------- > > Hop developers are a mix of volunteers, enthusiasts and people working > for > > an employer. There is also a group of consultants who want to be involved > > in Hop because it allows them to do projects with it. They are in fact > our > > most important users and developers since they provide valuable feedback > > from the trenches. > > > > Relationships with Other Apache Products > > ----------------------------------------- > > Hop is a heavy user of Apache software libraries. > > > > Apache Commons usage: > > * commons-beanutils > > * commons-cli > > * commons-codec > > * commons-collections > > * commons-collections4 > > * commons-compiler > > * commons-compress > > * commons-configuration > > * commons-database-model > > * commons-dbcp > > * commons-digester > > * commons-el > > * commons-httpclient > > * commons-io > > * commons-lang and commons-lang3 > > * commons-logging > > * commons-math and commons-math3-3.5.jar > > * commons-net > > * commons-pool > > * commons-validator > > * commons-vfs2 > > > > > > Other libraries: > > * Apache Batik : for the front-end SVG drawing > > * Apache Xerces (XSLT, XML processing) > > > > > > Other usage of Apache projects related to Hop (plugins): > > * Apache Avro > > * Apache Beam w/ Apache Spark, Apache Flink, … > > * Apache Cassandra > > * Apache CouchDB > > * Apache Derby > > * Apache Flume > > * Apache Hadoop > > * Apache Hive > > * Apache Kafka > > * Apache Solr > > * Apache Subversion > > * Apache Zookeeper > > > > > > For the build process > > * Apache Maven > > * Apache Jenkins > > > > An excessive Fascination with the Apache Brand > > ----------------------------------------------- > > With this proposal we are not seeking attention or publicity. Rather, we > > firmly believe in Hop, visual data pipeline development and the ability > to > > treat the developed data pipelines (ETL) as software code. While the > > original Hop code has been open source for about 15 years, we believe > > putting code on GitHub can only go so far. We see the Apache community, > > processes, and mission as critical for ensuring Hop is truly > > community-driven, positively impactful, and innovative open source > > software. We believe Hop is a great fit for the Apache Software > Foundation > > due to its focus on visual data processing and its relationships to > > existing ASF projects. > > > > Documentation > > ============== > > Over the years, the community has contributed extensive documentation to > > wiki.pentaho.com. Over time, areas of the available information have > > become > > incomplete or outdated. Most of this documentation has been reviewed, > > updated and will be contributed to the Apache foundation with the Hop > > source code. Documentation for the extensive new functionality that was > > added to Hop in recent months is being written. > > We consider documentation to be a core piece of the Hop platform and will > > treat documentation as any other item of code. > > > > Initial Source > > =============== > > While there isn’t a Java class in Hop which is unchanged from its origins > > we should mention we selected this source code to form the base of Apache > > Kettle: > > https://github.com/pentaho/pentaho-kettle/tree/8.2.0.7-R > > > > We merged various changes from the WebSpoon fork found over here: > > https://github.com/HiromuHota/pentaho-kettle > > > > > > Various community driven Kettle plugins were written to bypass bugs, slow > > down code-rot and to implement missing features. They were were merged > > into Hop from these locations: > > https://github.com/mattcasters/kettle-debug-plugin (better debugging) > > https://github.com/mattcasters/kettle-beam (Apache Beam support) > > https://github.com/mattcasters/pentaho-pdi-dataset (Unit Testing) > > https://github.com/mattcasters/kettle-needful-things (Bug fixes & > > workarounds) > > https://github.com/mattcasters/kettle-environment (Environment > management) > > > > > > The Hop repositories are currently hosted at: > > https://github.com/project-hop/ > > * Hop: source code for the Hop project > > * Hop-doc: technical documentation for the Hop project > > * Hop-website: Hop website and content repository > > * Hop-docker: Docker containers, Kubernetes > > > > Source and Intellectual Property Submission Plan > > ================================================= > > The originating source code is already licensed under an Apache 2 > license: > > * https://github.com/pentaho/pentaho-kettle/blob/8.2.0.7-R/LICENSE.txt > > * > > > https://github.com/HiromuHota/pentaho-kettle/blob/webspoon-8.3/LICENSE.txt > > * https://github.com/mattcasters/kettle-debug-plugin/blob/master/LICENSE > > * https://github.com/mattcasters/kettle-beam/blob/master/LICENSE > > * > > > https://github.com/mattcasters/pentaho-pdi-dataset/blob/master/LICENSE.txt > > * > https://github.com/mattcasters/kettle-needful-things/blob/master/LICENSE > > * https://github.com/mattcasters/kettle-environment/blob/master/LICENSE > > > > > > For all contributions we have an agreement in place: > > https://cla-assistant.io/project-hop/hop > > > > External Dependencies > > ====================== > > Over the course of the last year we removed non-essential dependencies as > > much as possible and replaced them by interfaces and plugin types. We did > > this to simplify the architecture. > > It’s important to note all external dependencies are licensed under an > > Apache 2.0 or Apache-compatible license. As we grow the Hop community we > > will configure our build process to require and validate all > contributions > > and dependencies are licensed under the Apache 2.0 license or are under > an > > Apache-compatible license. > > > > Cryptography > > ============= > > > > Required Resources > > =================== > > > > Mailing lists > > -------------- > > We currently use a mix of email and Mattermost. We will migrate our > > existing mailing lists to the following: > > > > d...@hop.incubator.apache.org > > u...@hop.incubator.apache.org > > priv...@hop.incubator.apache.org > > comm...@hop.incubator.apache.org > > > > Git Repository > > --------------- > > The Hop code is currently in git, we’d like to keep it that way. We > request > > a git repository for incubator-hop with mirroring to GitHub. > > > > Issue Tracking > > --------------- > > We request the creation of an Apache-hosted JIRA. > > > > Jira ID: HOP > > > > > > Other Resources > > ---------------- > > To allow other projects to use Hop as a library we would love to publish > > artifacts on a Maven server like maven.apache.org. > > > > Initial Committers > > =================== > > * Nicholas Adment <nadm...@gmail.com> > > * Hans Van Akelyen <hans.van.akel...@know.bi> > > * Lokke Bruyndonckx <lokke.bruyndon...@know.bi> > > * Matt Casters <matt.cast...@neo4j.com> > > * Jason Chu <jianjun...@gmail.com> > > * Peter Fabricius <i...@peter-fabricius.de> > > * Rodrigo Haces <rodrigo.ha...@neo4j.com> > > * Dave Henry <dshenr...@gmail.com> > > * Hiromu Hota <hiromu.h...@gmail.com> > > * Brandon Jackson <usbran...@gmail.com> > > * Dan Keeley <d...@dankeeley.co.uk> > > * Bart Maertens <bart.maert...@know.bi> > > * Yannick Mols <yannick.m...@know.bi> > > * Doug Moran <d...@dougandgretchen.com> > > * Gretchen Moran <gretc...@dougandgretchen.com> > > * Sergio Ramazzina <sergio.ramazz...@serasoft.it> > > * Maria Carina Roldan <maria.carina.rol...@gmail.com> > > * David Rosenblum <david.rosenb...@neo4j.com> > > * Rafael Valenzuela <rav...@gmail.com> > > > > Affiliations > > ============= > > * Neo4J > > * Matt Casters > > * Rodrigo Haces > > * David Rosenblum > > * Know.bi > > * Bart Maertens > > * Hans Van Akelyen > > * Lokke Bruyndonckx > > * Yannick Mols > > * eHealth Africa > > * Doug & Gretchen Moran > > * Schemetrica > > * Dave Henry > > * Beijing Auphi Data Co > > * Jason Chu > > * Serasoft Italy > > * Sergio Ramazzina > > * Hitachi Research > > * Hiromu Hota > > > > > > Sponsors > > ========= > > Champion > > --------- > > Maximilian Michels (m...@apache.org) > > > > Nominated Mentors > > ------------------ > > Tom Barber (magicaltr...@apache.org) > > Julian Hyde (jh...@apache.org) > > Maximilian Michels (m...@apache.org) > > > > Sponsoring Entity > > ================== > > The Apache Incubator > > > -- Neo4j Chief Solutions Architect *✉ *matt.cast...@neo4j.com ☎ +32486972937