Re: [VOTE] accept UIMA as a podling

Ian Holsman Mon, 18 Sep 2006 16:12:17 -0700

as per Garrett's suggestion.

[ ] +1 Accept UIMA as an Incubator podling
[ ]  0 Don't care
[ ] -1 Reject this proposal for the following reason:



Hello everyone -

We are submitting this proposal to the community for a
new project in the incubator, and look forward to starting to work with
this community.

This is a slightly modified and extended version of the proposal thathasalready been posted to [EMAIL PROTECTED] The whole mailthreadcan be found [http://www.nabble.com/Proposal-for-a-new-incubation-project%3A-Unstructured-Information-Management-Architecture---UIMA-tf2154324.html here].

If you don't feel like reading the whole thread, the main questionthat came up was:this is all very well, but what does it really '''do'''? Attempts toanswer that questionwhere made [http://www.nabble.com/Re%3A-Proposal-for-a-new-incubation-project%3A-Unstructured-Information-Management-Architecture---UIMA-p5986403.html here] and [http://www.nabble.com/Re%3A-Proposal-for-a-new-incubation-project%3A-Unstructured-Information-Management-Architecture---UIMA-p5987788.html here]. We'll try to work theseinto the proposal over the next few days.


----

= Proposal for Incubation Project: Unstructured InformationManagement Architecture - UIMA =


The Unstructured Information Management Architecture (UIMA) is an
architecture and software framework for creating, discovering, composing
and deploying a broad range of multi-modal analysis capabilities.  We
propose a project to develop, implement, support and enhance UIMA
framework implementations that comply with the UIMA standard (being put
forward concurrently for standardization within OASIS
http://www.oasis-open.org - not yet submitted, but we plan to do this
early in September.).

The proposal includes both a UIMA framework, as well as tools todevelop,

describe, compose and deploy UIMA-based components and applications. The

initial work will be based on the UIMA Version 2 framework codedeveloped

by IBM; snapshots of each release of this code are currently made
available on http://sourceforge.net/projects/uima-framework. The
!SourceForge versions would be stabilized in maintenance mode, if we are
successful in moving to Apache.

The framework provides a run-time environment in which developers canplug

in and run their UIMA component implementations and with which they can
build and deploy UIM applications. The framework is not specific to any
IDE or platform.

Motivation for UIMA: Databases are core components of nearly all
applications; they store information in structured tables.  But more and
more of the available digital data is unstructured (e.g. email, web
documents, images, audio clips, video streams) with little information
(metadata) attached to explain its content or context.  Although many
applications have been built to process unstructured data, they have
either managed it as a BLOB or they have developed isolated applications
for analyzing the content.  In the absence of a standardized means for
analytical applications to share insights extracted from the content,
analytical applications cannot build upon one another. As a result, the
industry has barely begun to tap the value locked in unstructured
information.

Standardization is key to achieve component interoperability, with

capabilities to mix components developed in different places and inJava,

C++ and other languages.  The Unstructured Information Management
Architecture defines standards for component interoperability and
application composition that will provide this needed unifying standard,
and allow a variety of framework implementations to exist, while

preserving the goal of unstructured information analytic componentreuse.


This project provides both:
* UIMA frameworks that provide runtime environments into which the
developers can plug in and run their UIMA component implementations and

* Tooling for the development, description, composition anddeployment of

UIMA components and applications.

It will follow and conform to the emerging work on the UIMA standardbeingproposed as a new standards effort to the OASIS standardsorganization; weexpect to submit this proposal to OASIS in early September. OASIShas anopen approach to granting Technical Committee voting rights tomembers of

OASIS, described here:
http://www.oasis-open.org/committees/process.php#2.4

UIMA was built to help developers create solutions that get more value

from unstructured information more quickly and at lower cost bymaking iteasy to reuse and combine analytic modules from different sourcesinto new

analytic applications. The architecture and the framework have been
validated through work with USA's DARPA which is using it as a standard
for key projects with several universities involved in advanced
linguistics analysis, such as Carnegie Mellon, Columbia, Stanford and

University of Massachusetts. Other companies, such as the MayoClinic andSloan Kettering, are also building efforts around UIMA. In addition,over

15 software vendors, including companies such as Inxight, Attensity,
!ClearForest, Temis, SPSS, SAS, Cognos, Endeca, Factiva and others,
announced plans to support UIMA.

The UIMA framework (binary and/or source code) has been downloaded over
8000 times from [http://www.alphaworks.ibm.com/tech/uima IBM alphaWorks]
or [http://uima-framework.sourceforge.net SourceForge].

We believe that moving the UIMA framework development to the Apache
development community will lead to faster innovation, better integration
with other open source software, and broader adoption of UIMA,
accelerating the industry's ability to get the most value from text,
audio, and video content. The UIMA framework is becoming attractive to
developers who want to build components; we believe that having UIMA on
Apache will encourage the development of a basic set of open source

components that will jumpstart these developers' efforts. One of thefirst

components we see possible synergy with is a search component based on
Apache Lucene that would enable semantic search.  We like the concept of

the Lucene Sandbox as a way to encourage innovation around UIMA, andwould

envision something similar for this project.

Some initial work we see in the incubator include the following:
* redoing the parts of the tooling that were done as derivative works of
Eclipse source code, to enable everything to be licensable under the
Apache license
* extending the framework to better support "scale-out"

* extending the framework to better align with the emerging UIMAStandards

work
* extending the framework to support XMI-based SOAP and/or other service
interfaces
* extending the framework to support OSGi-based approaches to
componentization and packaging
* exploring embeddings of the framework within other interested Apache
projects, including synergies with Lucene

* providing aids to the community to migrate from previous versionsof the

framework to the Apache version
* setting up community support: hosting a facility similar to the Lucene
Sandbox to encourage innovation and experimentation; establishing a wiki
and some process to allow better documentation to be developed by the
community, and linking our existing XHTML documentation via an XSL
transform to Apache FOP

== Criteria ==

=== Community ===

Currently, the UIMA Framework development is being done by IBM, withinputfrom a group of early adopters in industry and government. Goingforward,

we see IBM continuing to support several committers working on it.  We
have already begun talking with other people outside of IBM that have

expressed interest in contributing towards the development. Thisincludesmembers of academic institutions, people working for some of thesoftware

vendors that have announced plans to support UIMA, and others from
companies that have expressed interest since initial announcements about
our open source plans.  Multiple non-IBM people have already expressed
desires to become committers.

==== Core Developers: ====

The previous core developers of UIMA are Adam Lally, Thilo Goetz,MarshallSchor, Michael Baessler, Edward Epstein, Jaroslaw Cwiklik and ThomasHampp. Many others

have also contributed.

==== Alignment: ====
UIMA has significant synergy with search applications, and we expect to
see integration with Lucene in the future. UIMA makes use of the Apache
Portable Runtime (APR) for C++ support.  It is designed to be embeddable
into other frameworks, such as web application servers.  Part of UIMA is

Eclipse-based tooling. We use ANT for build scripting. UIMA hassupport

for various language bindings including C++ and Java; we also have more
limited bindings for Perl, Python, and TCL.  UIMA uses Web Services as

part of its approach to wiring up components in its domain. It makesuse

of XML services such as Xerces and Xalan.

==== License: ====
The current license for the source code is CPL, with a small number of

files licensed under the EPL (Eclipse Public License), because thesewerecreated as "derivative works" of existing Eclipse open source code.When

the code base is moved to Apache, it will be relicensed under the Apache
license, except for the small number of files licensed under the EPL as
derivative works of Eclipse source files.  We plan to work in the
incubator to redo these parts, so the entire offering can be licensed
under the Apache license.

The distribution for the C++ enablement layer includes open source

components ICU (a Unicode package) which has its own license. Weplan to

work with community to properly make use of this non-Apache licensed
component.

Our current vision for the future of UIMA has it aligning with and

incorporating other standards-based open source components/protocols,some

of which may have licensing other than the Apache license (for example,

the XML Metadata Interchange (XMI), and the EMF ECore Model fromEclipse);

we will work with the community in figuring out how to move forward on
this.

==== Orphaned Software: ====
UIMA has been in active development for 5 years.  The community of users

has steadily grown, and there are now significant commercial andresearch

organizations actively using it.  UIMA is embedded in IBM software
products and is delivered through IBM services engagements. IBM has
developers assigned to it, and is continuing to support its development.

In addition, several people outside of IBM have already expressedinterestin working on UIMA, and have been providing IBM with initialfeedback. One

of the objectives of starting this Apache project is to provide a
meritocratic structure for those people to begin more actively
contributing to UIMA.

==== Experience With Open Source: ====
The individuals working on this software have background as IBM software
developers.  While many of them have experience working with open source

software, none of them has had extensive experience contributing toother

open source software.  However, IBM as an organization, has extensive
experience contributing to open source projects and will make available
resources to provide guidance to the developers working on this project.

==== Homogenous Developers: (work for same company?) ====
Currently all the developers work for IBM, although they come from
different geographically dispersed organizations within IBM.  We will

reach out during the incubation time to get others to contribute; wehave

already received interest from several parties.

==== Reliance on salaried developers: ====
Currently the developers are paid employees of IBM.

==== No Ties to Other Apache Products: ====
We make use of several Apache components (SOAP / Web Services, XML
(Xerces, Xalan), languages (Perl), scripting languages (ANT), Apache
Portable Runtime.  In addition, UIMA has been embedded in other
frameworks, such as web application servers, and integrated with search

engines. We are exploring Lucene extensions that could takeadvantage ofUIMA processed data. We are currently investigating and prototypingsome

software packaging concepts based on OSGi; the Apache Incubator project
Felix may have relevance as we go forward.  The documentation is being
moved to XHTML and plans to use Apache FOP for producing PDF reference
materials.

==== Achieving the Apache Brand is a Prominent Goal: ====
UIMA is already being adopted by a wide cross section of users, both
commercial and academic, world-wide. Our experience shows that analytic
modules can be reused and combined through UIMA making it easier and
faster for developers to build new analytic applications for specific
industries or domains. Given the diversity of content and analytics that

will be required to address the multitude of opportunities - frommilitary

intelligence to quality assurance to contact center analytics -- growing

this infrastructure so that it better aligns with other major OpenSource

communities should help accelerate industry's ability to get value from
content assets.

We believe that the Apache community of developers has the experience,

background, visibility, and synergistic resources to encourage andfoster

a vibrant developer community around this project.

== Scope of the project ==

The project will develop implementations of the UIMA architecture (which
is concurrently being submitted to the OASIS standards process),

supporting the breadth of platforms that developers working in thisfield

are using, including Java, C++, Perl, Python and TCL; and utilities and
tooling to support component and application developers and assemblers /
packagers.  It will initially include the Java UIMA framework for UIMA
Version 2 (you can see a snap shot of the Version 2 release SourceForge;

the delivered code would this code base plus normal incremental bugfixes

and improvements), plus additional components (mainly documentation and
test cases, which are not currently on SourceForge).  Over time, the
project is expected grow to include supporting various embeddings and
integrations with other Apache components such as search engines and web
application frameworks.

Over time, we envision the project becoming an umbrella for related
open-source around UIMA, including things like open-source pre-annotated

corpora, and hosting a facility similar to the Lucene Sandbox toencourage

innovation and experimentation.

The UIMA framework is primarily a set of libraries (in Java, C++, Perl,
etc.), test cases, and UIMA utilities and tools (scripts, plugins,

executables, etc.) used to build, test and debug UIMA analyticcomponents.

The tooling includes several Eclipse platform plugins.

== Initial source from which the project is to be populated ==

The source currently is maintained in IBM internal software control

systems. At the time of launch, we plan to contribute the latestversion

of the code base (with some renaming of package prefixes to reflect
apache.org), test cases, build files, and documentation, under the terms

specified in the ASF Corporate Contributor License. We plan todonate theexisting C++ enablement layer and the support for Perl, Python, andTCL afew months later than the initial donation; this delay is to give ustime

to finish preparing that code base for Open Source.

== Identify the ASF resources to be created ==

Mailing lists:

* uima-dev
* uima-commits

* uima-user (we already have a substantial user community and expectthem to turnup at Apache soon after we've hopefully been accepted into theincubator)

For other resources such as Subversion repository, JIRA etc. we hopefor guidance from our mentors.


== Identify the Initial Set of Committers ==

* Michael Baessler ([EMAIL PROTECTED])
* Edward Epstein ([EMAIL PROTECTED])
* Thilo Goetz  ([EMAIL PROTECTED])
* Adam Lally  ([EMAIL PROTECTED])
* Marshall Schor ([EMAIL PROTECTED])

== Identify ASF Sponsor ==

We are requesting the Incubator to sponsor this.  Our current vision is

that it will become a top level project (other projects that developUIMA

components could become subprojects, for instance).

== Champions: ==

== Mentors: ==

* Sam Ruby
* Ken Coar
* Ian Holsman

On 19/09/2006, at 7:38 AM, Otis Gospodnetic wrote:

Damn, and I was going to give it +1.

UIMA folks answered questions about what it is that UIMA reallydoes in emails, but yes, making sure it's answered in the proposal(I can't connect to wiki.apache.org at the moment to see the finalproposal for myself).


Otis

----- Original Message ----
From: Garrett Rooney <[EMAIL PROTECTED]>
To: general@incubator.apache.org
Sent: Monday, September 18, 2006 5:11:13 PM
Subject: Re: [VOTE] accept UIMA as a podling

On 9/18/06, Ian Holsman <[EMAIL PROTECTED]> wrote:

[ ] +1 Accept UIMA as an Incubator podling
[ ]  0 Don't care
[X] -1 Reject this proposal for the following reason:


I'm sorry, but I have to vote -1 based on my new policy of rejecting
any potential podling that can't explain what it is that they do
within the first paragraph of the proposal.  I'm a fairly intelligent
person, but honestly I have no clue what "an architecture and software
framework for creating, discovering, composing and deploying a broad
range of multi-modal analysis capabilities" actually is, and I see
little potential for any project that's so bad at selling themselves
to actually grow a useful community.

Additionally, I believe we decided that having the final vote thread
point to a Wiki page was a bad idea.  It would be good to resend this
with the actual proposal content inline so everyone can be sure what
they're actually voting on.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--
Ian Holsman
[EMAIL PROTECTED]
http://VC-chat.com It's what the VC's talk about



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] accept UIMA as a podling

Reply via email to