Hi Folks,
OK, now that discussion has settled down, I'd like to call a VOTE for
acceptance of Apache Open Climate Workbench into the Incubator.
I'll leave the VOTE open the rest of the week and close it out next
Monday, February 11th early am PT.
[ ] +1 Accept Apache Open Climate Workbench into the Incubator
[ ] +0 Don't care.
[ ] -1 Don't accept Apache Open Climate Workbench into the Incubator
because...
Full proposal is pasted at the bottom of this email. Only VOTEs from
Incubator PMC members are binding, but all are welcome to express their
thoughts.
Thank you!
Cheers,
Chris
P.S. Here's my +1 (binding)
-------------
= Apache Open Climate Workbench, tool for scalable comparison of remote
sensing observations to climate model outputs, regionally and globally. =
=== Abstract ===
The Apache Open Climate Workbench proposal desires to contribute an
existing community of software related to the analysis and evaluation of
climate models, and related to the use of remote sensing data in that
process.
Specifically, we will bring a fundamental software toolkit for analysis
and evaluation of climate model output against remote sensing data. The
toolkit is called the [[http://rcmes.jpl.nasa.gov|Regional Climate Model
Evaluation System (RCMES)]]. RCMES provides two fundamental components for
the easy, intuitive comparison of climate model output against remote
sensing data. The first component called RCMED (for "Regional Climate
Model Evaluation Database") is a scalable cloud database that decimates
remote sensing data and renalysis data related to climate using Apache
OODT extractors, Apache Tika, etc. These transformations make
traditionally heterogeneous upstream remote sensing data and climate model
output homogeneous and unify them into a data point model of the form
(lat, lng, time, value, height) on a per parameter basis. Latitude (lat)
and Longitude (lng) are in WGS84 format, but can be reformatted on the
fly. time is in ISO 8601 format, a string sortable format independent of
underlying store. value carries with it units, related to interpretation
and height allows for different values for different atmospheric vertical
levels. All of RCMES is built on Apache OODT, Apache Sqoop/Apache Hadoop
and Apache Hive, along with hooks to PostGIS and MySQL (traditional
relational databases). The second component of the system, RCMET (for
"Regional Climate Model Evaluation Toolkit") provides facilities for
connecting to RCMED, dynamically obtaining remote sensing data for a
space/time region of interest, grabbing associated model output (that the
user brings, or from the Earth System Grid Federation) of the same form,
and then regridding the remote sensing data to be on the model output
grid, or the model output to be on the remote sensing data grid. The
regridded data spatially is then temporally regridded using techniques
including seasonal cycle compositing (e.g., all summer months, all
Januaries, etc.), or by daily, monthly, etc. The uniform model output and
remote sensing data are then analyzed using pluggable metrics, e.g.,
Probability Distribution Functions (PDFs), Root Mean Squared Error (RMSE),
Bias, and other (possibly user-defined) techniques, computing an analyzed
comparison or evaluation. This evaluation is then visualized by plugging
in to the NCAR NCL library for producing static plots (histograms, time
series, etc.)
We also have performed a great deal of work in packaging RCMES to make the
system easy to deploy. We have working Virtual Machines (VMWare VMX and
Virtual Box OVA compatible formats) and we also have an installer built on
Python Buildout (http://buildout.org/) called "Easy RCMET" for dynamically
constructing the RCMET toolkit.
RCMES is currently supporting a number of recognized climate projects of
(inter-)national significance. In particular, RCMES is supporting the
[[http://www.globalchange.gov/what-we-do/assessment|U.S. National Climate
Assessment (NCA) activities]] on behalf of NASA's contribution to the NCA;
is working with the [[http://www.narccap.ucar.edu/|North American Regional
Climate Change Assessment Program (NARCCAP)]]; and is also working with
the International [[http://wcrp-cordex.ipsl.jussieu.fr/|Coordinated
Regional Downscaling Experiment (CORDEX)]].
=== Proposal ===
We propose to transition the RCMES software community, which includes
developers of the RCMET and RCMED software, along with users of RCMES in
the CORDEX project across a variety of academic institutions, scientists
helping to improve the RCMES metrics, and visualizations, and regridding
algorithms, packagers making RCMES easier to install, and scientists
helping to lead some of these international projects that are already
using RCMES.
We have been working on the RCMES project since 2009 funded initially by
the American Recovery and Reinvestment Act (ARRA) project out at NASA, and
then branching out into other sources of support and sustainability (NASA;
NSF, etc. -- see the
[[http://rcmes.jpl.nasa.gov/about/overview|acknowledgements]] section on
the RCMES website for a full list of supporting U.S. and international
partners).
With the existing RCMES community at Apache, we will also work to
encourage other climate software projects e.g., Open Climate GIS, elements
of the Earth System Grid Federation, other NASA climate projects funded
under the Computational Modeling, Algorithms and Cyberinfrastructure
(CMAC) to contribute to the Open Climate Workbench here at Apache.
RCMED is a Big Data project that combines several underlying Apache
software -- OODT, Tika, Hadoop, HIVE, and Sqoop -- and other related data
management software. Its primary language is Java; RCMET, on the other
hand, is a Python API, associated set of classes (framework), set of
Python Bottle Web services, and a PHP "Wizard"-based User Interface that
leverages Apache OODT Balance.
=== Background ===
Bringing RCMES to Apache was the brain-child of Chris Mattmann, based on
his solid experience with Apache OODT and bringing it to the ASF. Chris
worked for a year to get the support of the JPL community including
approvals from the Software Release authority at JPL to release the
software.
The initial code drop will include the RCMES SVN repository from JPL
including prior revisions. We anticipate also including a smaller package,
CDX, which contains some useful facilities for regridding, and command
line tools for manipulating large datasets, and working with OPeNDAP, etc.
After the code drop, we will work with our developers, users, documentors,
and other members of the team to teach those unfamiliar with the Apache
way how it works around here at Apache. 30% of the community from RCMES
includes those intimately familiar with Apache including 6 ASF members --
the other 70% include a range of scientific code developers, climate
scientists that use RCMES, program officers that will help make
documentation and slides for the code, and advocate for it in the
community. Their experience with Apache ranges from using various ASF
products, to contributing patches to them, to not using any ASF software
at all.
With this diversity, we anticipate that while everything may not just work
turnkey out of the box, this represents a unique opportunity to
demonstrate Apache to the international community and to show the benefits
of its community and social models. That said, we also have a lot of ASF
experience to make sure everyone learns the Apache way.
=== Rationale ===
We are bringing RCMES to Apache for a few reasons. First, we feel that it
will immediately enable our collaborators across a number of institutions
both nationally and internationally have the opportunity to work on a
common software base, and to improve it with contributions from their own
sites. Currently these are difficult to negotiate now because of varied
legal and contribution frameworks -- Apache allows us to simplify this to
a unified one. Second, using the ASF's world-wide mirroring system, we
will be able to deliver climate software broadly to the community as we
release it, rather than sneaker netting the software around or
establishing our own point release infrastructure.
Bringing this project to Apache also immediately thrusts the ASF into the
thriving ecosystem of the Coordinated Regional Downscaling Experiment
(CORDEX), the US National Cimate Assessment, the North American Regional
Climate Change Assessment Program (the US contribution to CORDEX) and into
relevance for upcoming Intergovernmental Panel on Climate Change (IPCC)
assessment activities at a number of different institutions. We also seek
to help lead and encourage du jour standard development rather than top
down level dictating of standards for climate software and the ASF will
provide us a means for that.
=== Initial Goals ===
The initial goals of the proposed project are:
* Stand up a sustaining Apache-based community around the JPL RCMES
codebase.
* Active relationships and possible cooperation with related projects and
communities, including end user and scientific communities, CORDEX,
NARCCAP, US NCA, IPCC, ESG, etc.
* Active relationships and possible cooperation with existing Apache
communities, e.g., OODT, Hadoop/HIVE, Sqoop, Tika, SIS, etc.
* Initial Apache release.
* Leverage Apache Open Climate Workbench in climate activities at NASA,
in the international community as mentioned above, and beyond.
* Vetting all software licenses and making sure IP is clear (software
grant from JPL forthcoming).
== Current Status ==
=== Meritocracy ===
30% of the proposed initial committers are familiar with the meritocracy
principles of Apache. As stated above this includes 6 ASF members. Of the
mentorship list, we have included Chris Douglas, a PMC member from Hadoop
and ASF member to help guide the community. Chris M. and Chris D. have
guided a number of projects through the Incubator over the years. The
other mentor includes Paul Ramirez, who has experience with the Incubator
-- he was a mentor for Apache Any23, and also was one of the PPMC members
and eventual mentor for Apache SIS. The 70% of proposed initial committers
that aren't as familiar with Apache have a broad range of experience in
other open source projects, and have a deep respect and affinity for the
foundation and the work that gets done here. The more experience ASF
mentors and project members will help to guide them.
=== Community ===
There is an existing, established community of developers and users of
this projet. This includes established communities including the
Coordinated Regional Downscaling Experiment, the U.S. National Climate
Assessment (NCA), the North American Regional Climate Change Assessment
Program (NARCCAP), and more. The Coordinated Regional Climate Downscaling
Experiment (CORDEX, http://wcrp.ipsl.jussieu.fr/cordex/about.html) is a
world wide effort of coordination of regional climate downscaling (RCD)
experiments driven by the World Climate Research Program (WRCP,
http://www.wcrp-climate.org/index.shtml). Recently, a large number of RCD
projects have been carried out on a large parts of the world. To maximize
the benefits of these research activities the WCRP designed a framework
(Giorgi, WMO-Bulletin, 2009) focused on "quality-control [of] data sets of
RCD-based information for the recent historical past and 21st century
projections, covering the majority of populated land regions on the
globe". CORDEX defined different control domains (up to 10,
http://cordex.dmi.dk/joomla/) for almost all the populated regions of the
world in a way to standardize the experiments and make them comparable. A
key region focused on Africa was also designated as the top priority by
WRCP. CORDEX also provides a a series of conventions and list of variables
that have to be followed by any project that wants to contribute to the
experiment. Each CORDEX region has a coordinator and regional and
international periodic meetings are scheduled in a way to ensure the
global well being. NARCCAP is the U.S. contribution to CORDEX. From the
[[http://www.globalchange.gov/what-we-do/assessment/nca-overview|US
National Climate Assessment]] site, work is "being conducted under the
auspices of the Global Change Research Act of 1990. The GCRA requires a
report to the President and the Congress every four years that integrates,
evaluates, and interprets the findings of the U.S. Global Change Research
Program (USGCRP); analyzes the effects of global change on the natural
environment, agriculture, energy production and use, land and water
resources, transportation, human health and welfare, human social systems,
and biological diversity; and analyzes current trends in global change,
both human-induced and natural, and projects major trends for the
subsequent 25 to 100 years."
Apache Open Climate Workbench will support all of these communities above,
with an eye towards being a general purpose climate evaluation toolkit for
model output and remote sensing data.
=== Core Developers ===
The initial set of developers comes from various NASA centers (JPL, and
Goddard Space Flight Center), NASA HQ, various Universities participating
in CORDEX (Cape Town, University of New South Wales), the Indian Institute
of Tropical Meteorology, the Free Univ. Berlin), the University of
California Los Angeles, and Howard University. As mentioned previously
several of our developers are Apache veterans and understand how it works
around here and for those that don't, they will have great mentorship.
=== Alignment ===
Our proposed effort aligns with the U.S. National Climate Assessment, the
CORDEX effort, other efforts, including the Earth System Grid Federation,
other climate software including the Open Climate GIS toolkit, other
science portals for climate including the Climate Information Portal (CIP)
at the University of Cape Town, and other related projects.
There are also a number of related Apache projects and dependencies, that
will be mentioned in the Relationships with Other Apache products section.
== Known Risks ==
=== Orphaned products ===
Our project has a history of funding support from JPL, NASA (Applications
program/ARRA, NCA, AIST), NSF (ExArch project), international investment
from collaborators, and from other funding sources. The funding sources
are all target future deliverables and activities, so there is little
chance this software and community will be orphaned.
=== Inexperience with Open Source ===
All the initial developers have worked on open source before -- 30% of the
proposed initial community are experience with the ASF, and are PMC
members and committers on ASF project including 6 ASF members. Our mentors
are all ASF members, and we welcome any interest from additional Apache
mentors in the effort. Those 70% of our project that aren't Apache
committers, PMC members, or members will benefit from the leadership of
the other 30% of the project.
=== Homogenous Developers ===
The initial developers come from a variety of backgrounds and with a
variety of needs for the proposed framework. Everyone is used to
communicating on mailing lists as the project spans timezones,
international institutions and centers of excellence for climate science.
=== Reliance on Salaried Developers ===
All of the proposed initial developers are paid to work on this or related
projects, but the proposed project is not the primary task for anyone.
=== Relationships with Other Apache Products ===
As mentioned above, RCMES and the Apache Open Climate Workbench already
depend on Apache OODT for facade interfaces to underlying data warehouses
for storing remote sensing data; and for metadata extraction and
transformation. The software also uses Apache Tika for this (through a
transitive dependency from OODT). In addition, we have hooks to Apache
Hadoop/HIVE, as well as dependencies on Apache Sqoop for dumping out
remote sensing data from MySQL and into HIVE.
=== A Excessive Fascination with the Apache Brand ===
All of us are familiar with Apache and have a respect for its brand and
community. Chris Mattmann is a big proponent of Apache's sustainability
factor -- and it's ability to grow software communities, in an
institution, or funding source neutral manner. All of the community have
an extreme respect for Apache, including those in our communities who
aren't necessarily trained computer scientists, but are Scientists (big
"S", e.g., land, physical, Earth/Climate scientists).
== Documentation ==
The initial RCMES code base will come from the internal JPL Subversion
repository. The [[Regional Climate Model Evaluation System (RCMES)
project|http://rcmes.jpl.nasa.gov]] at [[JPL|http://www.jpl.nasa.gov/]]
has documentation on the existing software, including links to funding
support, communities, and other projects. We will continue to maintain
that site at JPL, which is part of the reason for rebranding the project
here at Apache with a new name to not interfere with the existing RCMES
one that has a following. In addition, we hope to evolve RCMES@JPL to have
increasing levels of dependency on Apache Open Climate Workbench, so that
we can incrementally transition with little impact to existing customers.
In addition, JPL's [[http://cdx.jpl.nasa.gov|Climate Data eXchange (CDX)]]
website also has documentation on the existing software.
== Initial Source ==
The project will start with seed code donated by NASA JPL via Mattmann and
the rest of the initial committers, which consists of the Regional Climate
Model Evaluation System (RCMES) toolkit, and the Climate Data eXchange
(CDX) software. This will include the core Python API for RCMET, the RCMED
OODT catalog project (which stores remote sensing data to MySQL/PostGIS,
and HIVE), and the RCMED extractors for various climate formats. The
source will also include Easy-RCMET, the Python Buildout for RCMET. In
addition, we will bring along the CDX toolkit, which includes a CDX client
package that performs subsetting, access, regridding of climate data; and
also includes a Python Buildout installer of its own called Uber CDX.
== Source and Intellectual Property Submission Plan ==
All seed code and other contributions will be handled through the normal
Apache contribution process. Mattmann has been authorized by NASA JPL to
lead the contribution of RCMES and CDX into the Incubator via his existing
Apache CLA, and a Software Grant to be provided.
We will also contact other related efforts for possible cooperation and
contributions.
== External Dependencies ==
Our project depends on a number of external libraries with various
licensing conditions. An initial list of such dependencies is shown below.
||<tableclass="bodyTable"rowclass="b">'''Library''' ||'''License''' ||
||<rowclass="b">[[http://www.ncl.ucar.edu/|NCAR NCL]]||MIT compat||
||<rowclass="a">[[http://www.pyngl.ucar.edu/Nio.shtml|PyNIO]]||MIT compat||
||<rowclass="b">[[http://www.pyngl.ucar.edu/|PyNGL]]||MIT compat||
||<rowclass="a">[[http://matplotlib.sourceforge.net/|Matplotlib]]||Modified
PSF license||
||<rowclass="b">[[http://scipy.org/|Scipy]]||MIT compat||
||<rowclass="a">[[http://numpy.scipy.org/|NumPy]]||MIT compat||
||<rowclass="b">[[http://www.hdfgroup.org/HDF5/|HDF5]]||BSD||
||<rowclass="a">[[http://www.unidata.ucar.edu/software/netcdf/|NetCDF]]||MI
T||
== Cryptography ==
The project itself will not use cryptography, but it is possible that some
of the external software libraries will include cryptographic code to
handle features present in various science data formats. If we need to
provide an export control statement regarding cryptographic code per
Apache policy, we will follow a similar approach by Mattmann in
[[http://lucene.apache.org/nutch/|Apache Nutch]] and by Jukka Zitting lead
this effort in Apache Tika. Mattmann is familiar with this process.
== Required Resources ==
Mailing lists
* d...@climate.incubator.apache.org
* comm...@climate.incubator.apache.org
* priv...@climate.incubator.apache.org
Subversion Directory
* https://svn.apache.org/repos/asf/incubator/climate
Issue Tracking
* JIRA CLIMATE (CLIMATE)
Other Resources
* CLIMATE Wiki http://cwiki.apache.org/CLIMATE
* Review Board instance - CLIMATE
* Jenkins instance - CLIMATE
== Initial Committers ==
||'''Name''' ||'''Email''' ||'''Affiliation''' ||'''CLA''' ||
||Chris A. Mattmann ||mattmann at apache dot org
||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] ||yes ||
||Cameron E. Goodale ||goodale at apache dot org
||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] ||yes ||
||Paul Ramirez ||pramirez at apache dog org
||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] ||yes ||
||Andrew F. Hart ||ahart at apache dot org
||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] ||yes ||
||Jinwon Kim||jkim at atmos dot ucla dot edu
||[[http://jifresse.ucla.edu|UCLA Joint Institute for Regional Earth
System Science and Engineering]] || no||
||Duane Waliser||duane dot waliser at jpl dot nasa dot gov
||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] || no ||
||Huikyo Lee||Huikyo dot Lee at jpl dot nasa dot
gov||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] || no ||
||Paul Loikith|| Paul dot C dot Loikith at jpl dot nasa dot gov
||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] || no ||
||Daniel J. Crichton||crichton at apache dot
org||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] || yes ||
||Kim Whitehall||Kim dot D dot Whitehall at jpl dot nasa dot gov
||[[http://www.physics1.howard.edu/~pmisra/HUPAS/HUPAS/HUPAS%20Jenkins.html
|Howard University]] || no ||
||Paul Zimdars||pzimdars at apache dot
org||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] || yes ||
||Chris Jack||cjack at csag dot uct dot ac dot
za||[[http://www.csag.uct.ac.za/|University of Cape Town]] || no ||
||Bruce Hewitson||hewitson at csag dot uct dot ac dot
za||[[http://www.csag.uct.ac.za/|University of Cape Town]] || no ||
||Lluis Fita Borrell||l dot fitaborrell at unsw dot edu dot
au||[[http://unsw.edu.au/|University of New South Wales]] || yes ||
||Jason Evans||jason dot evans at unsw dot edu dot
au||[[http://unsw.edu.au/|University of New South Wales]] || no ||
||Estani Gonzalez||estanislao dot gonzalez at met dot fu-berlin dot de
||[[http://www.geo.fu-berlin.de/met/|Free University Berlin]] || yes ||
||Luca Cinquini||luca dot cinquini at jpl dot nasa dot gov
||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] || yes ||
||J. Sanjay||sanjay at tropmet dot res dot in ||
[[http://tropmet.res.in/|Indian Institute of Tropical Meteorology]] || yes
||
||M. V. S. Rama Rao||ramarao at tropmet dot res dot in
||[[http://tropmet.res.in/|Indian Institute of Tropical Meteorology]] ||
yes ||
||Tsengdar Lee||tsengdar dot j dot lee at nasa dot gov ||
[[http://hq.nassa.gov/|NASA HQ]] || no ||
||Laura Carriere||laura dot carriere at nasa dot gov
||[[http://www.gsfc.nasa.gov/|NASA Goddard Space Flight Center]] || no ||
||Denis Nadeau|| denis dot nadeau at nasa dot
gov||[[http://www.gsfc.nasa.gov/|NASA Goddard Space Flight Center]] || no
||
||Michael Joyce|| Michael dot J dot Joyce at jpl dot nasa dot
gov||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] ||yes||
||Shakeh Khudikyan||Shakeh dot E dot Khudikyan at jpl dot nasa dot
gov||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] ||yes||
||Maziyar Boustani||Maziyar dot Boustani at jpl dot nasa dot
gov||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion Laboratory]] ||no||
||Suresh Marru||smarru at apache dot org||[[http://pti.iu.edu/|Indiana
University]] ||yes||
== Sponsors ==
Champion
* Chris Mattmann (mattmann at apache dot org)
Nominated Mentors
* Chris A. Mattmann (mattmann at apache dot org)
* Chris Douglas (cdouglas at apache dot org)
* Paul Ramirez (pramirez at apache dot org)
Sponsoring Entity
* Apache Incubator
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org