Hi, +1
The project seems to be very interesting and we can see that there is documentation, contribution guide... I will be more than happy to help as a mentor. regards, François fpa...@apache.org Le 10/09/2020 à 13:05, Julian Feinauer a écrit : > Hey, > > thanks for your statement Max and thats already a great start as we coannot > expect fresh podlings to know the apache way (at all?) as then there would be > no point for the incubator. > But knowing you and your motivation and reading your statement about the team > makes me very confident that this could be a very smooth ride : ) > > So, best from my side! > > Julian > > Am 10.09.20, 12:40 schrieb "Maximilian Michels" <m...@apache.org>: > > I've met Matt and other folks from the Hop project more than a year ago > through Beam Summit Europe. I can say that they are genuinely passionate > about open-source. Initially, they were not familiar with the Apache > Way, but throughout the past year, everyone has ramped up their > knowledge about the ASF. You will also see that reflected in the proposal. > > Hop is a great project in the sense that it adds GUI-based integration > to many data processing projects at Apache. This is appealing to me > because we are leveraging many of the existing projects such as Spark, > Flink, Hadoop, Cassandra, Kafka, etc. The project would be a great > addition to the Apache project portfolio. > > This is going to be my first project as a Champion and I'm very much > looking forward to guiding the project throughout the incubation process. > > Please post your questions or let us know if you want to help with > mentoring the project. > > -Max > > On 08.09.20 12:30, Matt Casters wrote: > > Thank you very much Kevin! > > > > On Tue, Sep 8, 2020 at 12:07 PM Kevin Ratnasekera > <djkevincr1...@gmail.com> > > wrote: > > > >> +1 ( binding ) Interesting project. Please add me as a mentor to the > >> project. > >> > >> On Tue, Sep 8, 2020 at 3:26 PM Matt Casters > >> <matt.cast...@neotechnology.com.invalid> wrote: > >> > >>> Hello Apache, > >>> > >>> Our community is eager to propose for Hop to join the Apache > Incubator. > >>> The Hop Orchestration Platform aims to help people with complex data > and > >>> metadata orchestration problems. > >>> > >>> Below is the complete text of the proposal but you can also find it > here: > >>> https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal > >>> > >>> Any help with respect to the incubation is appreciated including help > >> from > >>> a few more mentors to set us on the right track. On behalf of my > >> community > >>> I'd be happy to answer any questions you might have regarding Hop. > Our > >>> thanks go out to Max, Julian and Tom for helping us set up this > proposal. > >>> > >>> Thanks in advance for your time! > >>> > >>> Best regards, > >>> > >>> Matt - Hop co-founder > >>> www.project-hop.org > >>> --- > >>> > >>> Abstract > >>> ========= > >>> Hop is short for the Hop Orchestration Platform. Written completely in > >> Java > >>> it aims to provide a wide range of data orchestration tools, > including a > >>> visual development environment, servers, metadata analysis, auditing > >>> services and so on. As a platform Hop also wants to be a re-usable > >> library > >>> so that it can be easily re-used by other software. > >>> > >>> Proposal > >>> ========= > >>> Hop provides all the tools to build, maintain and deploy data > >>> orchestration, ETL and data integration solutions. For example, Hop > >> allows > >>> you to diagram a data flow that propagates changes from a database via > >>> Apache Kafka to a data warehouse and deploy it as an Apache Beam > >> pipeline. > >>> The core concepts of Hop are Pipelines and Workflows. > >>> * Pipelines do the core data manipulation work (read, manipulate, > write > >>> data). The main items of work in pipelines are transforms. A pipeline > >>> consists of two or more (usually many) transforms that each perform a > >>> granular piece of work. The transforms in a pipeline run in parallel, > and > >>> together create a powerful data processing tool. > >>> * Workflows take care of the orchestration of actions: execute > pipelines, > >>> run child workflows, environment checks, preparation, problem alerting > >> and > >>> so on. > >>> If these terms sound familiar it’s because they are taken from the > Apache > >>> Beam and Apache Airflow projects. > >>> > >>> > >>> The main components of the Hop platform are: > >>> * hop-gui, a visual data orchestration IDE > >>> * hop-run: a CLI tool to run workflows or pipelines > >>> * hop-config: a CLI tool to configure Hop and its components > >>> * hop-server: a light-weight web server to run and monitor workflows > and > >>> pipelines > >>> * hop-translator: a tool for translating the various parts of the Hop > >> tools > >>> (i18n). > >>> * hop-web: a thin client version of hop-gui for web browsers and > mobile > >>> devices > >>> > >>> > >>> The cornerstone of the Hop platform is extensibility: all major > >> components > >>> of the platform are designed to be pluggable. This allows any possible > >>> missing functionality to be created in a short amount of time. > >>> > >>> Background > >>> =========== > >>> The Hop Orchestration Platform has its origins in the Kettle > community. > >>> Kettle got acquired by Pentaho and after Pentaho’s acquisition by > Hitachi > >>> in 2015, the community struck out to solve problems less aligned with > >>> Hitachi’s interests. > >>> > >>> Rationale > >>> ========== > >>> In the Hop community, we have always aimed to function as a > meritocracy, > >>> where contributions are accepted based on merit, and individuals gain > >>> status in the community based on their contributions (coding and > >>> otherwise). We’re proud to have a diverse group of people doing all > the > >>> required things in a project: development , documentation, tutorials, > >>> architecture, testing, graphics design and much more. Bringing the > >> project > >>> under the Apache Software Foundation would allow us to continue and > grow, > >>> but also give our users confidence about the governance, IP status, > and > >>> future of the project. > >>> > >>> ASF Preparation Phase > >>> ====================== > >>> The very first goal of project Hop is to find a good way to cooperate > on > >>> the development across wide geographical, economical and social > spectra. > >> To > >>> make this possible real changes were needed to a codebase which is > >>> essentially 20 years old. Most of these changes have been tackled by > now. > >>> We think it’s fair to say that by now, Hop is a new platform even > though > >> it > >>> shares a common background as it partly started from the Kettle code > >> base. > >>> Here are a few of the key focus areas we’re trying to saveguard going > >>> forward: > >>> * Plugins: lightweight plugins for all major functionality. This > makes it > >>> possible to extend Hop or reduce Hop in size. It also allows people > to > >>> implement or change functionality with minimal coding. In other > words it > >>> makes it easier to contribute. > >>> * Maintain an open and responsive community where every concern, > feedback > >>> and contribution is welcome. > >>> * Maintain a clear focus on data orchestration user requirements, not > on > >>> “industry trends” > >>> * Documentation: we set up a version controlled “adoc” system with > >>> automated builds which is both open, controlled and reviewed. This is > >>> incredibly important for every Hop user and developer. > >>> * Testing and stability: we want to massively increase stability by > >>> implementing integration tests beyond the standard Java unit testing > >>> because of the dynamic nature of data orchestration work. We still > have > >> a > >>> long way to go. This work will never be finished. It’s a clear and > >>> important goal nevertheless. > >>> * Simplicity: things are complex enough. We follow the example of > >> projects > >>> like Apache Spark and Flink and so as an example “hop-run.sh” does > >> exactly > >>> what the name says without the need to dive into documentation. As > much > >> as > >>> possible we make things self-evident and will re-use existing > >> terminology. > >>> > >>> > >>> For a list of the changes you can look at the monthly roundup which > was > >>> compiled since February 2020. It documents to hard work of our > community > >>> so far: > >>> > >>> > >>> http://www.project-hop.org/news/roundup-2020-02/ > >>> http://www.project-hop.org/news/roundup-2020-03/ > >>> http://www.project-hop.org/news/roundup-2020-04/ > >>> http://www.project-hop.org/news/roundup-2020-05/ > >>> http://www.project-hop.org/news/roundup-2020-06/ > >>> http://www.project-hop.org/news/roundup-2020-08/ > >>> > >>> > >>> Goals > >>> ====== > >>> Here are a few more details and specifics of things we still want to > take > >>> on going forward: > >>> * Add more plugin metadata to Transforms and Action plugins as well as > >>> their supported engines. This will make it easier to refine the user > >>> interface and make the user experience better by giving to the point > >>> feedback on what operations are supported and required. Example > metadata > >>> to add: extra version and build information, dependencies, tags and > >> labels > >>> (replacing categories), documentation links, input and output > >> capabilities, > >>> engine capabilities and so on. > >>> * SWT: While the Eclipse SWT project is still supported we want to > make > >> a > >>> list of all the commonly used API calls and stick to those with our > own > >>> API. This will help the development of hop-web and allow us to > possibly > >>> more easily migrate to different user interfaces later on. > >>> * Integration testing: every transform and action should have an > >>> integration test before it is released to ensure quality. Java unit > >>> testing has been proven to be insufficient in guarding against > backward > >>> compatibility, stability and functionality. We need to do better. > >>> * Apache VFS: Hop makes extensive use of this API to handle files. As > >> such > >>> we want to implement the various drivers for gs://, hdfs://, s3:// > >> through > >>> standard Kettle plugins making it easier to choose which protocols to > >>> support. > >>> * Variables & Parameters: make this experience more intuitive, clean > up > >>> the underlying API and add more options to the various user interfaces > >>> responsible for setting and passing variables and parameters. > >>> * Make Hop-Web an integral part of the Apache Hop project removing the > >> code > >>> duplication (fork) we’re dealing with now. This includes the need to > >>> improve various user interfaces which were designed for non-web > clients. > >>> * Make best practices and governance functionality an integral part of > >> the > >>> API of the project: > >>> * Data sets and unit testing (already done) > >>> * Environments and lifecycle management (partly done) > >>> * Git support (partly done) > >>> * Auditing and lineage > >>> * Software policies and enforcement thereof > >>> * Configuration management (partly done) > >>> > >>> > >>> Current Status > >>> =============== > >>> > >>> Meritocracy > >>> ------------ > >>> With Project Hop, we actively work to foster the existing community > and > >>> encourage community contributions. As of September 1st 2020 we > received > >>> over 250 pull requests and have around 600 tickets in our JIRA > platform > >> (a > >>> lot of which were created by community members) and have active > >> discussions > >>> in our Mattermost chat platform with over 80 members. > >>> > >>> > >>> The last half year we started to ask users on our chat chat server for > >>> specific feedback on terminology, features and so on. It’s been a > >>> wonderfully positive experience to have in-depth discussions on > complex > >>> issues with industry experts. We look forward to moving these > discussions > >>> and votes to an Apache mailing list. > >>> > >>> Community > >>> ------------ > >>> Hop is developed, extended and maintained by a global community of > users > >>> and developers. The Hop community is what has driven its development > and > >>> growth. > >>> The particular past history of Hop has led to a lot of interest for > the > >>> project and already led to a number of contributions, documentation > and > >>> translations. > >>> > >>> Core Developers > >>> ---------------- > >>> We have a diverse group of core developers with people joining on a > >> regular > >>> basis. Matt Casters, Rodrigo Haces and David Rosenblum are part time > >>> developers on Hop, salaried by Neo Solutions. Bart Maertens, Hans Van > >>> Akelyen, Yannick Mols are part time Hop developers paid for by company > >>> know.bi. Doug and Gretchen Moran were Pentaho employees but along > with > >>> Rafael Valenzuela, Dan Keeley, Jason Chu, Sergio Ramazzina and many > >> others > >>> they can be considered to be long time consultants and community > members > >>> for over a decade that joined the Hop community in the last year or > two. > >>> > >>> > >>> Alignment > >>> ---------- > >>> We want to anchor and safeguard our development and community building > >>> efforts for the future. We strongly believe that as an Apache project > >> this > >>> can be achieved in the best possible way. The Hop project also > started to > >>> align with projects like Apache Beam, Spark and Flink in it's use of > >>> terminology, tools, manner of configuration and so on. As mentioned > >>> elsewhere in this document Hop is a large user of other Apache > projects > >> and > >>> libraries and we believe that becoming an Apache project is > beneficial. > >>> Specifically for Apache Beam we believe that providing a visual > pipeline > >>> development tool can be of great value. > >>> > >>> Known Risks > >>> ============ > >>> While the current code-base of Kettle on which we have started from is > >>> already released under the Apache Public License 2.0 proper > attribution > >>> needs to happen to Hitachi Vantara. > >>> We have no knowledge of existing patents on any part of the Kettle > >>> codebase. > >>> To further reduce any risk of there even being any discussion on > naming > >> the > >>> Hop team decided to rename the project, its tools (to be more > >> self-evident > >>> as well), the java API and even the main concepts (Transformations are > >> now > >>> called Pipelines, in line with Apache Beam naming conventions). > >>> > >>> Orphaned products > >>> ------------------ > >>> There is little risk that the project will become orphaned. The list > of > >>> active developers is large, and consists of a mix of developers who > have > >>> been working on the code for several years and recent arrivals in the > >>> community. > >>> > >>> Inexperience with Open Source > >>> ------------------------------ > >>> The project team has a long history in open source and has > contributed to > >>> Apache licensed open source projects, mostly in the Kettle ecosystem > such > >>> as Kettle itself and the many plugins and projects surrounding it. The > >>> experience gained there has allowed us to quickly set up all required > >> build > >>> tools and processes. In its fairly short history, Hop has been > >> advocating > >>> open source in all aspects of the project. Our submission to the > Apache > >>> Software Foundation is a logical extension of our commitment to open > >> source > >>> software. > >>> > >>> Licensing > >>> ---------- > >>> The original source code we started from (see below) has been open > source > >>> since december 2005, initially under the Lesser GPL but since January > >> 2012 > >>> all under the Apache License version 2.0. All Hop code has been > scanned > >> for > >>> compliance with APL 2.0. We integrated Apache Rat with our build > process. > >>> > >>> Heterogeneous Developers > >>> ------------------------- > >>> Hop is built, developed and maintained by a global community of > >>> developers. Input comes from a large group of developers and users > from > >>> all over the world. At this moment over 7 companies contribute to Hop > >>> through the developers along with a list of individuals and > consultants. > >>> > >>> Reliance on Salaried Developers > >>> -------------------------------- > >>> Hop developers are a mix of volunteers, enthusiasts and people working > >> for > >>> an employer. There is also a group of consultants who want to be > involved > >>> in Hop because it allows them to do projects with it. They are in > fact > >> our > >>> most important users and developers since they provide valuable > feedback > >>> from the trenches. > >>> > >>> Relationships with Other Apache Products > >>> ----------------------------------------- > >>> Hop is a heavy user of Apache software libraries. > >>> > >>> Apache Commons usage: > >>> * commons-beanutils > >>> * commons-cli > >>> * commons-codec > >>> * commons-collections > >>> * commons-collections4 > >>> * commons-compiler > >>> * commons-compress > >>> * commons-configuration > >>> * commons-database-model > >>> * commons-dbcp > >>> * commons-digester > >>> * commons-el > >>> * commons-httpclient > >>> * commons-io > >>> * commons-lang and commons-lang3 > >>> * commons-logging > >>> * commons-math and commons-math3-3.5.jar > >>> * commons-net > >>> * commons-pool > >>> * commons-validator > >>> * commons-vfs2 > >>> > >>> > >>> Other libraries: > >>> * Apache Batik : for the front-end SVG drawing > >>> * Apache Xerces (XSLT, XML processing) > >>> > >>> > >>> Other usage of Apache projects related to Hop (plugins): > >>> * Apache Avro > >>> * Apache Beam w/ Apache Spark, Apache Flink, … > >>> * Apache Cassandra > >>> * Apache CouchDB > >>> * Apache Derby > >>> * Apache Flume > >>> * Apache Hadoop > >>> * Apache Hive > >>> * Apache Kafka > >>> * Apache Solr > >>> * Apache Subversion > >>> * Apache Zookeeper > >>> > >>> > >>> For the build process > >>> * Apache Maven > >>> * Apache Jenkins > >>> > >>> An excessive Fascination with the Apache Brand > >>> ----------------------------------------------- > >>> With this proposal we are not seeking attention or publicity. Rather, > we > >>> firmly believe in Hop, visual data pipeline development and the > ability > >> to > >>> treat the developed data pipelines (ETL) as software code. While the > >>> original Hop code has been open source for about 15 years, we believe > >>> putting code on GitHub can only go so far. We see the Apache > community, > >>> processes, and mission as critical for ensuring Hop is truly > >>> community-driven, positively impactful, and innovative open source > >>> software. We believe Hop is a great fit for the Apache Software > >> Foundation > >>> due to its focus on visual data processing and its relationships to > >>> existing ASF projects. > >>> > >>> Documentation > >>> ============== > >>> Over the years, the community has contributed extensive documentation > to > >>> wiki.pentaho.com. Over time, areas of the available information have > >>> become > >>> incomplete or outdated. Most of this documentation has been reviewed, > >>> updated and will be contributed to the Apache foundation with the Hop > >>> source code. Documentation for the extensive new functionality that > was > >>> added to Hop in recent months is being written. > >>> We consider documentation to be a core piece of the Hop platform and > will > >>> treat documentation as any other item of code. > >>> > >>> Initial Source > >>> =============== > >>> While there isn’t a Java class in Hop which is unchanged from its > origins > >>> we should mention we selected this source code to form the base of > Apache > >>> Kettle: > >>> https://github.com/pentaho/pentaho-kettle/tree/8.2.0.7-R > >>> > >>> We merged various changes from the WebSpoon fork found over here: > >>> https://github.com/HiromuHota/pentaho-kettle > >>> > >>> > >>> Various community driven Kettle plugins were written to bypass bugs, > slow > >>> down code-rot and to implement missing features. They were were > merged > >>> into Hop from these locations: > >>> https://github.com/mattcasters/kettle-debug-plugin (better debugging) > >>> https://github.com/mattcasters/kettle-beam (Apache Beam support) > >>> https://github.com/mattcasters/pentaho-pdi-dataset (Unit Testing) > >>> https://github.com/mattcasters/kettle-needful-things (Bug fixes & > >>> workarounds) > >>> https://github.com/mattcasters/kettle-environment (Environment > >> management) > >>> > >>> > >>> The Hop repositories are currently hosted at: > >>> https://github.com/project-hop/ > >>> * Hop: source code for the Hop project > >>> * Hop-doc: technical documentation for the Hop project > >>> * Hop-website: Hop website and content repository > >>> * Hop-docker: Docker containers, Kubernetes > >>> > >>> Source and Intellectual Property Submission Plan > >>> ================================================= > >>> The originating source code is already licensed under an Apache 2 > >> license: > >>> * https://github.com/pentaho/pentaho-kettle/blob/8.2.0.7-R/LICENSE.txt > >>> * > >>> > >> > https://github.com/HiromuHota/pentaho-kettle/blob/webspoon-8.3/LICENSE.txt > >>> * > https://github.com/mattcasters/kettle-debug-plugin/blob/master/LICENSE > >>> * https://github.com/mattcasters/kettle-beam/blob/master/LICENSE > >>> * > >>> > >> > https://github.com/mattcasters/pentaho-pdi-dataset/blob/master/LICENSE.txt > >>> * > >> > https://github.com/mattcasters/kettle-needful-things/blob/master/LICENSE > >>> * > https://github.com/mattcasters/kettle-environment/blob/master/LICENSE > >>> > >>> > >>> For all contributions we have an agreement in place: > >>> https://cla-assistant.io/project-hop/hop > >>> > >>> External Dependencies > >>> ====================== > >>> Over the course of the last year we removed non-essential > dependencies as > >>> much as possible and replaced them by interfaces and plugin types. We > did > >>> this to simplify the architecture. > >>> It’s important to note all external dependencies are licensed under an > >>> Apache 2.0 or Apache-compatible license. As we grow the Hop community > we > >>> will configure our build process to require and validate all > >> contributions > >>> and dependencies are licensed under the Apache 2.0 license or are > under > >> an > >>> Apache-compatible license. > >>> > >>> Cryptography > >>> ============= > >>> > >>> Required Resources > >>> =================== > >>> > >>> Mailing lists > >>> -------------- > >>> We currently use a mix of email and Mattermost. We will migrate our > >>> existing mailing lists to the following: > >>> > >>> d...@hop.incubator.apache.org > >>> u...@hop.incubator.apache.org > >>> priv...@hop.incubator.apache.org > >>> comm...@hop.incubator.apache.org > >>> > >>> Git Repository > >>> --------------- > >>> The Hop code is currently in git, we’d like to keep it that way. We > >> request > >>> a git repository for incubator-hop with mirroring to GitHub. > >>> > >>> Issue Tracking > >>> --------------- > >>> We request the creation of an Apache-hosted JIRA. > >>> > >>> Jira ID: HOP > >>> > >>> > >>> Other Resources > >>> ---------------- > >>> To allow other projects to use Hop as a library we would love to > publish > >>> artifacts on a Maven server like maven.apache.org. > >>> > >>> Initial Committers > >>> =================== > >>> * Nicholas Adment <nadm...@gmail.com> > >>> * Hans Van Akelyen <hans.van.akel...@know.bi> > >>> * Lokke Bruyndonckx <lokke.bruyndon...@know.bi> > >>> * Matt Casters <matt.cast...@neo4j.com> > >>> * Jason Chu <jianjun...@gmail.com> > >>> * Peter Fabricius <i...@peter-fabricius.de> > >>> * Rodrigo Haces <rodrigo.ha...@neo4j.com> > >>> * Dave Henry <dshenr...@gmail.com> > >>> * Hiromu Hota <hiromu.h...@gmail.com> > >>> * Brandon Jackson <usbran...@gmail.com> > >>> * Dan Keeley <d...@dankeeley.co.uk> > >>> * Bart Maertens <bart.maert...@know.bi> > >>> * Yannick Mols <yannick.m...@know.bi> > >>> * Doug Moran <d...@dougandgretchen.com> > >>> * Gretchen Moran <gretc...@dougandgretchen.com> > >>> * Sergio Ramazzina <sergio.ramazz...@serasoft.it> > >>> * Maria Carina Roldan <maria.carina.rol...@gmail.com> > >>> * David Rosenblum <david.rosenb...@neo4j.com> > >>> * Rafael Valenzuela <rav...@gmail.com> > >>> > >>> Affiliations > >>> ============= > >>> * Neo4J > >>> * Matt Casters > >>> * Rodrigo Haces > >>> * David Rosenblum > >>> * Know.bi > >>> * Bart Maertens > >>> * Hans Van Akelyen > >>> * Lokke Bruyndonckx > >>> * Yannick Mols > >>> * eHealth Africa > >>> * Doug & Gretchen Moran > >>> * Schemetrica > >>> * Dave Henry > >>> * Beijing Auphi Data Co > >>> * Jason Chu > >>> * Serasoft Italy > >>> * Sergio Ramazzina > >>> * Hitachi Research > >>> * Hiromu Hota > >>> > >>> > >>> Sponsors > >>> ========= > >>> Champion > >>> --------- > >>> Maximilian Michels (m...@apache.org) > >>> > >>> Nominated Mentors > >>> ------------------ > >>> Tom Barber (magicaltr...@apache.org) > >>> Julian Hyde (jh...@apache.org) > >>> Maximilian Michels (m...@apache.org) > >>> > >>> Sponsoring Entity > >>> ================== > >>> The Apache Incubator > >>> > >> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org