Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

Ian Holsman Fri, 25 Aug 2006 15:10:30 -0700

Hi Thilo
your explanation attracted me ;-)

is UIMA just the interface specification only ? (ie to produce astandard in the unstructured text-processing world so that otherpeople can plug and play)

or does UIMA also provide tools for each component?

I'm interested, and time permitting, could help as a mentor .. I'mnot a java expert (compared to others on this list), or a textprocessing expert, but I know

a bit about the processes around the incubator.

regards
Ian


On 26/08/2006, at 2:04 AM, Thilo Goetz wrote:

Leo Simons wrote:

<snip/>
What does it *do*? How does it *work*? I understand there's aruntime and
a framework and a standardization process and a component-based
interoperability goal, but what I don't understand is what theyare *for*.
The unstructured content we're talking about is mainly plain texttoday. There is also some work going on analyzing video streams,as well as multi-modal streams (e.g., video + closed captioning).I'm not really competent to talk about those, so I'll stick totext. A typical processing chain for text analysis starts outsomething like this:
"language identification" -> "language specific segmentation" ->"sentence boundary detection" -> "entity detection (person/placenames etc.)" -> ...
So you start by identifying the language the text is in (Chinese,English etc.). Then you do token segmentation based on thatinformation (it's completely different for Chinese than forEnglish). Based on the tokens you discovered, you may want to dosentence boundary detection, so you know what entities occur in thesame sentence. Then, again based on the tokens you've found, youcan do so-called named entity detection, such as place names,person names etc. After that, you may have another module that candiscover relations between the entities that you have found. Andso on.
UIMA in its core is a component architecture that allows you tocreate analysis applications like the one described above. Itprovides facilities for creating meta-information on documents likein the example above. That is, the original artifact (i.e., thetext) is not modified and the derived information is kept separately.
UIMA is mostly a framework, not an application. So it is notconcerned with fetching documents, like the crawler of a searchengine. Nor does UIMA provide facilities to do very much with theinformation you have extracted from the text (or other artifact).Rather, the use case is that you have an application that has aneed for the processing of unstructured information. Thisapplication will provide the input data, and it will know what todo with the results. The value of UIMA derives from the componentmodel: it is easy to reuse existing analysis components that otherpeople have written, and it's easy to exchange, say, one languageidentifier for another.
One standard application scenario is to use UIMA to extract somenamed entities from text, feed the results into a relationaldatabase, and use the database's mining capabilities to do, e.g.,association analysis. Another area of application is enhanced textsearch, where in addition to regular free-form text search, you cansearch for documents containing certain entities. Trivial standardexample: you're looking for John's phone number in your email, soyou use semantic search to look for documents that contain John'sname and a phone number. You'll use a UIMA component that knowsthat a pattern 123-456-7890 is a phone number and will create aphone number entity.
I hope this gives you a better idea what UIMA is about.

--Thilo


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--
Ian Holsman
[EMAIL PROTECTED]
http://personalinjuryfocus.com/




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

Reply via email to