WOW that is so awesome guys! Taverna at Apache FTW!! Let me know if you need a mentor, I'm in! :)
Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Stian Soiland-Reyes <soiland-re...@cs.manchester.ac.uk> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org> Date: Tuesday, September 23, 2014 5:43 AM To: "general@incubator.apache.org" <general@incubator.apache.org> Cc: List for general discussion and hacking of the Taverna project <taverna-hack...@lists.sourceforge.net> Subject: [Proposal] Taverna workflow >I hereby present the Apache Incubator proposal for the project Taverna. > > >Also available in rich text in the Taverna wiki (with more hyperlinks!): > >http://dev.mygrid.org.uk/wiki/display/developer/Taverna+incubator+proposal > >(Could someone grant me access to edit the Incubator wiki pages? My >wiki username is soilandreyes) > > > > ># Abstract > >Taverna is an open source and domain-independent suite of tools used >to design and execute data-driven workflows. > > ># Proposal > >The Taverna suite includes: > >* Taverna Workbench, a Java-based desktop application for graphically >composing, editing and executing workflows of distributed web services >and local tools >* Taverna Commandline Tool which allows repeated execution of >parameterized workflow definitions >* Taverna Server provides a REST and SOAP API for executing workflows >* Taverna Player is a Ruby-based web interface towards the Server, >providing a high-level view of workflow executions and their results, >and allows further integrations with Ruby on Rails applications. > >Taverna can browse and combine different service types, allowing >workflows to integrate steps of arbitrary REST and SOAP web services >with command line tools (local and SSH), scripts (Beanshell, R, >Jython) and finally visualize the results. > >The goal of the Taverna suite is to help researchers to access >distributed datasets and processing capabilities by the construction >of pipelines, and also to simplify the execution of these pipelines >in various environments. > >The Taverna suite of products is already successful and in wide-use >across different domains. The software is currently licensed as LGPL >2.1, with copyright owned by University of Manchester. External >contributors have all signed Apache-like CLAs. > > ># Background > >Taverna workflows coordinate inputs and outputs between computational >processes and Web Services. The workflow is designed in a graphical >interface which shows the workflow as a series of boxes and arrows; >representing processes and their data connections. The different >processes in a workflow can be command line tools, REST and WSDL Web >Services; which are used for combining steps such as data acquisition, >filtering, cleaning, integrating, analysis and visualization. Taverna >calls these processes "services", as they generally are provided by >remote (third-party) servers. > >These kind of computational workflows, also known as pipelines and >dataflows, focus on the movement of data rather than the execution >order of the underlying processes. Features such as implicit >iterations (where an input list of values causes multiple process >executions) and parallel invocations (independent processes are >executed as soon as their data is available) are intrinsic to a >dataflow system, not requiring any particular constructs by the >workflow designer. > >As a visual programming environment, workflows aids collaboration and >reuse of workflows. At the highest level, a workflow represents the >conceptual level of an analysis, allowing understanding, discussion >and communication of the overall analysis protocol. More detail can be >revealed and modified for individual steps. At the individual process >level, the workflow defines execution specifics such as operations, >parameters and command line tools. > >Sharing of the workflow definitions allows re-use and re-purposing of >the computational analysis. During workflow execution, provenance can >be collected from every step, allowing deep inspection of intermediate >values for the purpose of debugging and validation. > > ># Rationale > >There is a strong need to lower the barrier of entry to datasets and >computational resources widely available on the Internet, to increase >their use by researchers who understand the computational steps needed >to produce their results, but who are not necessarily expert >programmers. Taverna has already shown its success and popularity in a >wide range of scientific disciplines. > > ># Initial Goals > >* Transition mailing lists to Apache (keep existing subscribers, but >invite more) >* Taverna developer workshop (2014-10-30) >* Prepare git repositories for move: > * Update headers/metadata to indicate Apache License 2.0 > * Restructure git repositories > * Rename Maven groupIds to org.apache.taverna.* > * Rename packages to org.apache.taverna.* > >* Move Github repositories to Apache git >* Automated builds in Apache's Jenkins >* Update to latest releases of Apache dependencies >* Propose updated release & testing procedure under Apache >* Moved Website and documentation > >We intend to only release the current development version Taverna 3.x >http://www.taverna.org.uk/developers/work-in-progress/taverna-3/ under >the Apache umbrella (). 3.0 is not yet officially released - however >the Taverna 3.0 Command Line can be released almost "as-is" after >migration. The Taverna 3.0 Server is at beta quality, while the >Taverna 3.0 Workbench is at alpha stage and would need to be >stabilized to an initial beta release. > >* Before first release: Maven Central releases of Taverna support >libraries (e.g. taverna-scufl2 and taverna-databundle) >* First release: Apache Taverna Command Line 3.0 (OSGi-based) >* Release: Apache Taverna Server 3.0 >* Release: Apache Taverna Workbench 3.0 beta >* Provenance exchange with relevant Apache products (e.g. Apache >CXF->Taverna->CouchDB) >* Release: Apache Taverna Workbench 3.0 > >It is not yet decided if the current Workbench Editions >http://www.taverna.org.uk/download/workbench/2-5/ will be carried over >to Taverna 3, or if this can be solved by having a "Install extra >plugin" step on first start-up of Apache Taverna. In any case, we >imagine that some of these specializing editions will be maintained >outside (but in collaboration with) the Apache project. This is >particularly the case for the Astronomy edition as it depends on >several LGPL/GPL libraries and is maintained by the AstroTaverna team. > > ># Current Status > >## Meritocracy > >Taverna was initially created by the myGrid consortium in 2003. Since >2006, the majority of contributions to Taverna's core code-base, its >architecture and direction have been led by staff at The University of >Manchester and The European Bioinformatics Institute (EMBL-EBI). > >The project have benefited of a high-degree of extensions and >integrations by other developers - but mainly in the form of plugins >(http://www.taverna.org.uk/documentation/taverna-2-x/taverna-2-x-plugins/) >and integrations >(http://www.taverna.org.uk/developers/work-in-progress/taverna-online/ >http://www.taverna.org.uk/download/associated-tools/). > >Taverna's developer community have unfortunately not had a culture of >submitting patches that would warrant later commit access - perhaps >due to its background in the science community. However contributors >have been added as committers when the plugin becomes a part of the >core distribution (e.g. External Tool plugin by Möller and Krabbenhöft >and AstroTaverna by Garrido), or when their development has required >patches to the existing code base. > > >## Community > >Taverna has an active community of plug-in developers and users. The >developer mailing list (taverna-hack...@lists.sourceforge.net) has 248 >members, the user mailing list (taverna-us...@lists.sourceforge.net) >has 370 members. > >1500 users have registered as of 19 August 2014. Total downloads of >all products since version 2.1 (released December 2009) is 35000. > >A Taverna Developer workshop is being arranged for 30 October 2014 to >bring together developers and integrators of Taverna. We want to >encourage plug-in developers to participate further also in the core >development of Taverna, by introducing them to the code base and how >to contribute. >http://dev.mygrid.org.uk/wiki/display/developer/Taverna+Open+Development+W >orkshop > >Active steps to grow the communities of users and developers by >targeting specific research domains such as the work by Kevin Benson >on Taverna's use in the Heliophysics and Astrophysics community. >Susheel Varma is increasing usage of Taverna within the Biomedical >domain. Julián Garrido and his work on AstroTaverna is promoting >Taverna within the IVOA Virtual Astronomy community. Sonja Holl and >Björn Hagemeier's are targeting high performance computing. > > >## Core Developers > >What we currently consider to be the core Taverna Team is (in >alphabetical order): > >Christian Brenninkmeijer (University of Manchester) >Donal Fellows (University of Manchester) >Robert Haines (University of Manchester) >Aleksandra Nenadic (University of Manchester) >Dmitry Repchevsky (Barcelona Supercomputing Center) >Stian Soiland-Reyes (University of Manchester) >Shoaib Sufi (University of Manchester) >Vadim Surpin (Institute for Information Transmission Problems in Moscow) >Alan Williams (University of Manchester) > >The team consists of experienced developers who have worked on a >multitude projects, particular within writing software for supporting >scientists. The committers list (See below) includes additionally >plugin developers whose contributions have become part of Taverna. >Part of our desire to join the Apache Foundation is to recognise their >effort and promote them into also being "core developers". > > >## Alignment > >Taverna dependencies include Apache Commons, Axis, Abdera, Batik, CXF, >Derby, Felix, HttpComponents, Jena, log4j, Maven, POI, Velocity, >Xerces, XMLBeans, Xalan, We use Tomcat for testing and deployment of >the Taverna Server. >As part of moving to Apache-compatible dependencies, Taverna will >probably adopt OpenJPA to replace (LGPL) Hibernate. > > > ># Known Risks > >## Orphaned products > >Most of the core developers are from the myGrid team at University of >Manchester, but are funded through a series of projects - see >http://www.mygrid.org.uk/projects/. Many of these projects incorporate >Taverna, so the effort from Manchester is partially based on direct >project requirements, but also partially a volunteer effort for >project maintenance and general development. The myGrid team has >guaranteed funding until 2017. > >The developers that are outside Manchester are generally funded for >other activities, and so their effort to Taverna is to a greater >extent a volunteer effort - although again project-specific >requirements steer their effort (e.g. for a new Taverna plugin). > >One of the reasons for our desire to move to the Apache Foundation is >to formalise this volunteering/contribution effort so that it becomes >obvious that it is not just University of Manchester that is >contributing to the core code base - and therefore reducing the >impression that Taverna is vulnerable to Manchester¹s future funding >and projects. > > >## Inexperience with Open Source > >Taverna has been an open-source project since its first release in >2003. Most of the contributors also have experience with working with >and contributing to other open source projects (e.g. TCL, CXF, Jena), >particularly as Taverna strongly relies on other open source tools. >Most of the research projects which the myGrid members have >participated in produces open-source software. > > >## Homogeneous Developers > >The committers list includes many people from myGrid, University of >Manchester in United Kingdom - but these developers have been working >on a range of distributed and European projects in the field of >scientific software - see http://www.mygrid.org.uk/projects/ > >The other developers on the committers list come from many different >projects and institutions across the world, from Russia, Canada, >Germany and Spain. > > >## Reliance on Salaried Developers > >Development for Taverna is mainly performed as part of the developers' >salaried work, but funded through many different projects at several >institutions (see above). These projects don't generally have >"contribute to Taverna" as their main goals - so therefore in many >ways the effort is still volunteer-based - contributing to Taverna as >a way to support one's own work. > >From our experience of running Taverna over the last 10 years, new >contributors will continue to join as Taverna becomes an ingredient in >new projects, while existing contributors more slowly fade out of >their involvement. Often existing contributors and users gives the >personal link to the new contributors. > > >## Relationships with Other Apache Products > >Apache already contains projects that seem relevant to Taverna. > >Apache Pig https://pig.apache.org/ is a high-level language for >creating Map-Reduce programs for Apache Hadoop. There already exists >third-party efforts to convert Taverna Workflows to Hadoop and Pig - >https://github.com/umaqsud/taverna-to-pig >https://github.com/schenck/taverna-to-hadoop (thus making a graphical >interface for building Apache Pig workflows) - and part of the Apache >Taverna effort would be to invite these to join the project. > >Apache Airavata http://airavata.apache.org/ is a software framework >for executing and managing computational jobs and workflows on >distributed computing resources. Taverna's concern is not as much job >coordination, but more of a data flow between services. Airavata's >XBaya Workflow Suite can export workflows in Taverna 1 format SCUFL, >but could be updated to work with Taverna 3's SCUFL2 format. > >Apache ODE https://ode.apache.org/ is a WS-BPEL workflow engine. BPEL >as a workflow language is quite verbose compared to dataflow languages >like Taverna, and is additionally bound to a particular protocol >(SOAP). Nevertheless, a sub-section of Taverna workflows could in >theory run on the Apache ODE engine - and the Taverna 3 Platform API >has facilities for plugging in alternative workflow engines. We have >previously considered Apache Hadoop as one such alternate engine for >executing a different subset of workflows with local command line >tools. > >Apache Storm http://storm.incubator.apache.org/ is a distributed >realtime computation framework. Experiments are under development to >use Taverna as a front-end for creating Apache Storm workflows - >http://markmail.org/message/zg5ylo2aucpwfc5j > >Apache has several popular frameworks for building REST/SOAP web >services (Apache CXF, Apache Clerezza), data services (Apache Jena, >Apache Hive, Apache CouchDB) and specific workflow engines (Apache >Oozie for Hadoop, Apache ODE for WS-BPEL). Taverna as a general REST >and SOAP service client can be used for combining, testing and >demonstrating such services. > > >## A Excessive Fascination with the Apache Brand > >Taverna is a long-running project (since 2003) with an existing user- >and developer base across the academic world. Our main motivation for >moving to Apache is to further encourage an open development process >and engage existing and new developers to contribute to the core code >base. We also want to ensure long-term continuity of the Taverna >products, and for its future directions to be decided by the whole >Taverna community rather than one of the parties involved. > > > ># Documentation > >Taverna's documentation is available from >http://www.taverna.org.uk/documentation/taverna-2-x/, including an >extensive user manual at >http://www.mygrid.org.uk/dev/wiki/display/taverna/User+Manual and >tutorials http://www.taverna.org.uk/documentation/taverna-2-x/tutorials/ >and videos http://www.taverna.org.uk/documentation/taverna-2-x/videos/. > >The developer documentation >http://dev.mygrid.org.uk/wiki/display/developer/Developers+Guide >includes tutorials >http://dev.mygrid.org.uk/wiki/display/developer/Tutorials for working >with Taverna's source code and creating plugins. > > ># Initial Source > >Taverna's source code is available from the 'taverna' github team >account: https://github.com/taverna/. These 85 git repositories >reflect the current modules of Taverna's plugin system after recently >transitioning from Google Code SVN at >http://taverna.googlecode.com/svn/taverna/. The history of Taverna's >code base goes back to being hosted in CVS at SourceForge >http://taverna.cvs.sourceforge.net/, transitioned as of >http://taverna.googlecode.com/svn/archived/cvs2svn-2008-09-25/. Note >that reasonable steps have been made to preserve commit history when >moving between version control system, this has not always been >achieved when moving between modules and refactoring larger Java >packages. Some source files might therefore in git have initial >commits like "Moved from /taverna/utils/trunk" referring to SVN paths. > >One of the reason for many repositories is that we rely on Apache >Maven and a plugin system (since Taverna 3 OSGi-based) where different >modules have different version numbers and release cycles (e.g. >tags/branches). This is essential for the plug-in support of Taverna >as the plug-ins depend on the semantic versioning of the APIs and >required implementations. > >It is however in our current plans to merge repositories that have >similar release cycles and greatly reduce the number of repositories. > >Taverna source code uses the package names (and children packages): > >net.sf.taverna - since Taverna 2 >uk.org.taverna - new from Taverna 3 >org.taverna (sic) - Taverna Server > >Some contributed code uses package names depending on their >originating projects: > >org.purl.wf4ever.provtaverna >org.biomart.martservice > >We intend to release only the upcoming Taverna 3.0 version under the >Apache umbrella (not 2.x) - therefore, according to semantic >versioning rules http://semver.org/, the transition period of the >Apache Incubator would be the best (and possibly only) chance to >rename Java packages and Maven groupIDs to org.apache.taverna.* Under >OSGi the packaging and JAR goes hand-in-hand (several JARs don't >normally provide the same package), and therefore any package rename >would be done together with the repository restructuring. > > ># Source and Intellectual Property Submission Plan > >Taverna source code from http://github.com/taverna/ > >(c) University of Manchester. >Signed Apache-like CLAs for all external contributors. >Current license is LGPL 2.1 (and GPL3 for one domain-specific >download), as copyright holder Manchester can change this to Apache >License 2.0 > >taverna.org.uk domain - registrant University of Manchester >http://www.taverna.org.uk/ content (c) University of Manchester >http://dev.mygrid.org.uk/wiki/display/tav250/ Confluence wiki content >(c) University of Manchester >http://dev.mygrid.org.uk/wiki/display/developer Confluence wiki >content (c) University of Manchester > >The details of intellectual property submission will be worked out >together with myGrid project manager Shoaib Sufi and the University of >Manchester's Contracts Office. > > ># External Dependencies > >Taverna, as an integrating workflow system, has a fairly large number >of dependencies - the latest 2.5.0 Core Workbench distribution has 517 >JARs (although many of those are duplicates in different versions) > >We are intending for our first Apache-based release to be Taverna 3, >which has already reduced this dependency list. > >We have performed an analysis of our dependencies of Taverna 3 at >http://dev.mygrid.org.uk/wiki/display/developer/Taverna+Dependencies - >but this is not yet a complete list. > >A second analysis looks at the license of those dependencies at >http://dev.mygrid.org.uk/wiki/display/developer/Third-party+licenses - >where we have some incompatible (LGPL) dependencies. Most of these are >resolvable as they are part of optional plugins to Taverna (e.g. R >support, BioMart). The dependency on Hibernate requires some developer >effort to be replaced with either Apache Open JPA or a "No-SQL" >solution. > > ># Cryptography > >Taverna uses these cryptography dependencies: > >BouncyCastle >OpenJDK builds with the default JCE full encryption policy (bundled in >installer) > >Taverna utilise these to form of an encrypted keystore (storing >username/password and client certificates for third-party services >accessed by the designed workflow) with corresponding user interface, >and additionally binds to Java's SSL support to provide UI and command >line options for security interactions, e.g. accepting new server >certificates, or asking for username/passwords for HTTP Basic >authentication (which can then be stored in the keystore). > > ># Required Resources > >Taverna currently relies on a mixture of infrastructure hosted for >free by third-parties (e.g. Github, SourceForge, GoogleCode, >Launchpad, Bitbucket) and infrastructure hosted by myGrid at >University of Manchester (Jenkins, Jira, Confluence, Wordpress). > >## Mailing lists > >Existing mailing lists for Taverna are hosted at Sourceforge with >archives at markmail. See http://www.taverna.org.uk/about/ > >comm...@taverna.incubator.apache.org (replacing >taverna-...@lists.sourceforge.net) >priv...@taverna.incubator.apache.org (replacing supp...@mygrid.org.uk >- to a lesser degree as we would want to encourage openness) >d...@taverna.incubator.apache.org (replacing >taverna-hack...@lists.sourceforge.net, 240 members) >us...@taverna.incubator.apache.org (replacing >taverna-us...@lists.sourceforge.net, 370 members) > > >## Git repositories > >The Taverna community would prefer to keep using git and Github, and >we would request for experimental writable git repositories >http://www.apache.org/dev/writable-git with mirroring to Github. > >The repositories would be named taverna-*, as the current repositories >on the github team: https://github.com/taverna/. This repository >organization is styled equivalent to the git repositories of cordova-* >and couchdb-*. > >Exactly how repositories are split/merged is open for discussion - it >is part of our current plan to reduce the number of repositories by >merging common modules with a similar release cycle - this could be >done at an early phase of the incubation period. > > >## Issue Tracking > >JIRA Taverna (TAV) > >Existing issues in Taverna 3's current JIRA - >http://dev.mygrid.org.uk/issues/browse/T3 - should be imported - but >its current list of Modules should be further agreed. > > >## Other Resources > >Wiki spaces in Confluence https://cwiki.apache.org/confluence - >importing the most recent Taverna-related spaces and documentation >from >http://dev.mygrid.org.uk/wiki/spacedirectory/view.action?startIndex=24 >Jenkins - replacing myGrid Jenkins at http://build.mygrid.org.uk/ci/ >Maven repository at https://repository.apache.org/ - replacing myGrid >artifactory http://repository.mygrid.org.uk/ >File-based web space for Plugin Update Site - replacing >http://updates.taverna.org.uk/ and >http://www.mygrid.org.uk/taverna/updates/ >Home pages - to be transitioned from from http://www.taverna.org.uk/ >(Wordpress) >Binary distribution download hosting, about ~8 GB pr release, >replacing http://www.taverna.org.uk/download/ (currently downloads are >hosted by http://launchpad.net/ and https://bitbucket.org/) > > ># Initial Committers > >The initial list of committers reflect the current list of active >developers at the Github team: https://github.com/orgs/taverna/people >(Note that not all of these have made their membership public on >Github) > > >Alan R williamsalan.r.willi...@manchester.ac.uk >Aleksandra nenadica.nena...@manchester.ac.uk >Christian Y. brenninkmeijerbrenn...@cs.man.ac.uk >David withersdavid.with...@gmail.com >Dmitriy Repchevsky dmitry.repchev...@bsc.es >Donal K. fellowsdonal.k.fell...@manchester.ac.uk >Finn bacallfinn.bac...@manchester.ac.uk >Hajo Nils Krabbenhöfth...@krabbenhoeft.de >Ian dunlopian.dun...@manchester.ac.uk >Ingo wassinki.h.c.wass...@ewi.utwente.nl >Julián garridojgarr...@iaa.es >Mark wilkinsonma...@illuminae.com >Luke mccarthyelmccar...@gmail.com >Robert hainesrhai...@manchester.ac.uk >Shoaib sufishoaib.s...@manchester.ac.uk >Steffen Möllermoel...@inb.uni-luebeck.de >Stian soiland-reyesst...@soiland-reyes.com (Apache CLA Signed) >Stuart owenso...@cs.manchester.ac.uk > >In addition to the Core Team (mentioned earlier), this list also >reflects Taverna's existing meritocrazy as it includes plugin >developers whose contributions have been merged into the main code >base. We acknowledge that not all of these are likely to continue as >"Core" developers, but would like to encourage that during the >Incubating process. > > ># Affiliations > >The majority of the initial committers are employed by University of >Manchester as part of the myGrid team, including responsibilities for >contributing to and supporting Taverna. >http://www.mygrid.org.uk/about-us/people/core-mygrid-team/. > >Dmitriy Repchevsky is employed by the Barcelona Supercomputing Center, >including responsibilities for contributing to Taverna. Steffen Möller >is employed by University of Lübeck. Julián Garrido is employed by >Instituto de Astrofísica de Andalucía. > > ># Sponsor Champion > >Andy Seaborne > > ># Nominated Mentors > >* Andy Seaborne > > ># Sponsoring Entity > >The Incubator. > > > > > >Your feedback is very much welcome! > > >-- >Stian Soiland-Reyes, myGrid team >School of Computer Science >The University of Manchester >http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718 > >--------------------------------------------------------------------- >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org