Greetings,

I understand from the message traffic that there are some concerns about the current state of the UIMA proposal, but I'd like to offer my support (and my hope that the issues with the proposal are resolved).

Carnegie Mellon has been building and deploying text analysis programs using pluggable components for the last 3-4 years. Large-scale text analysis (e.g., for text data mining, populating a knowledge base, etc.) requires significant programming at many different levels of text representation (segmentation into sentences and tokens; recognition of basic entities such as organization names and person names; analysis of grammatical structure (parse trees); assignment of domain specific meaning to parse trees; etc.).

Until UIMA came along, there was standard for how all these separate analysis steps could be integrated, and those of us trying to build end-to-end applications had to either write everything ourselves using a one-off proprietary design, or spend lots of time writing wrapper code to integrate existing components that didn't share the same underlying data model.

UIMA provides all the necessary ingredients to ease these issues. The data models used by individual components are represented by formal type systems; the components themselves implement (or are wrapped by implementations of) well-designed abstract interfaces; and tools are provided for creating aggregate analysis engines which integrate components in (possibly distributed) run-time configurations. The fact that IBM has made UIMA open source, and is searching for an appropriate open-source development venue, represents a significant opportunity. If things continue to move ahead, I expect that the students and staff working with me will be contributing cycles to the development effort.

In addition to using UIMA on various R&D projects at CMU-LTI, we're also using UIMA in our Software Engineering course to teach architectural design for text analysis (http://durazno.lti.cs.cmu.edu/wiki/moin.cgi/11-792). Our students recently created the UIMA Component Repository (uima.lti.cs.cmu.edu), which we are promoting as a venue for sharing of completed components, type systems, and end-to-end solutions.

Eric Nyberg
Associate Professor
Language Technologies Institute
School of Computer Science
Carnegie Mellon University

Ian Holsman wrote:
Hi,

There has been some discussion around the UIMA proposal,
we feel that all the issues forwarded have been addressed, and we
would now like to officially propose UIMA to the Incubator for
consideration.


The proposal can be found in the Incubator wiki here:
http://wiki.apache.org/incubator/UIMA

[ ] +1 Accept UIMA as an Incubator podling
[ ]  0 Don't care
[ ] -1 Reject this proposal for the following reason:


--
Ian Holsman
[EMAIL PROTECTED]
http://parent-chatter.com -- what do parents know?



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to