what wicket is (was: Re: [VOTE] Accept Wicket into the Incubator)

2006-08-25 Thread Leo Simons
Greg,

Basically wicket creates a session for every user and then attaches a java
object graph to that session, with parts shared between sessions. Then there
are some mechanisms for attaching "id"-ed objects in that graph to "id"-ed
elements in an HTML template, and rendering directions for the "merge" between
the template and the object graph. Basically you have java developers pretending
the web is like swing (eg stateful UI), web designers designing the "fluff"
around the "active UI" bits, and then various kinds of magic in the middle to
make that work.

Why, one might even say its a little bit like the google web toolkit, except
its also a little bit more "full stack" like ruby on rails, and fully open
source. The template language made me think of kid (you know the python one),
only its a lot simpler and doesn't allow embedding of source code.

Or like .net web development without ASP and visual basic.

Of course, once you have a java object graph with all your data in it, using
some kind of object persistence thing (probably using OR mapping) is the next
step towards not having to think about the web and just doing java development.

Wicket goes quite far that way; you don't even need to know how to write XML
files or even valid XHTML in order to use it. And making things "AJAX" is all
but transparent (since the request/response is hidden, making it into another
kind of request/response is not so difficult).

Its uber cool if you want to make java developers build web applications
quickly. Its not so cool if you want to use XSLT or similar stuff (use cocoon),
process 100s of megabytes of XML documents (use cocoon), or want some kind of
java-ish programming model which still keeps request/response somewhere in there
(use struts or a similar action-based framework), or want efficient memory use
scaling up to 1000s of concurrent users (in which case, don't put any state in
java objects and don't use any framework like any of these, in fact, anything
servlet-based kinda sucks automatically).

The project at the ASF that comes closest is tapestry, but I haven't ever fully
understood what tapestry actually is (I know it builds on hivemind which is
somewhat like excalibur/avalon/osgi automatically making it different from 
wicket
since wicket is not "IOC"), so I can't comment further.

Wicket *is* different. Whether this is the right way to do things is
debatable, but I would say now is not the right time for the incubator to start
having those kinds of debates. Various ASF members like working this way, are
working this way, and are backing this proposal. Trust darwinism.

LSD

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [PROPOSAL] InfoEng project proposal

2006-08-25 Thread Leo Simons
Hi J. (?)

I managed to read this proposal all the way to the end, but I simply
have no real clue what you're on about, what this software actually *does*,
or why anyone should care. I looked around using google and found, among
other things, found,

  http://svn.haxx.se/dev/archive-2005-12/0486.shtml

where you wrote:

  "The simplest possibility for usage of information currency is to
pay software developers by purchasing the information currency they
generate by writing software. To illustrate, in today's world, many (but
not all :) software developers have jobs where they are compensated for
writing software. In the simplest case, a developer is today paid, for
example, $2000/week for being an employee, and their future pay and
continued employment are dependent on the quality of their work,
determined after they are paid. "

Which for me reduces down to "you use software to calculate some
economic value for digital information (such as source code), and then
you can do stuff economists do", which brings up memories of programmers
being paid for every line of code they write, which is a bad idea.

Regardless, IMHO this kind of software doesn't belong at apache since we're
all about creating value (both in economic and non-economic sense) *without*
measuring or turning it into any kind of currency.

take care!

LSD

PS: As an aside, if you want something like this to become successful "on the
web", I really suggest you find someone to help you with turning your ideas
(of which there seem to be loads in InfoEng) into words in a way that more
people can readily understand. The common phrase used to describe this stuff
is "technical writing", and there's also lots of resources online about it,
eg:

  http://www.google.com/search?q=technical%20writing

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

2006-08-25 Thread Leo Simons
Hi Marshall!

I'm sure all this is potentially interesting, but you're going to have
to help us understand why.

On Wed, Aug 23, 2006 at 03:21:55PM -0400, Marshall Schor wrote:
> Proposal for Incubation Project: Unstructured Information Management 
> Architecture - UIMA
> 
> The Unstructured Information Management Architecture (UIMA) is an 
> architecture and software framework for creating, discovering, composing 
> and deploying a broad range of multi-modal analysis capabilities.  We 
> propose a project to develop, implement, support and enhance UIMA 
> framework implementations that comply with the UIMA standard (being put 
> forward concurrently for standardization within OASIS 
> http://www.oasis-open.org - not yet submitted, but we plan to do this 
> early in September.). 

> Motivation for UIMA: Databases are core components of nearly all 
> applications; they store information in structured tables.  But more and 
> more of the available digital data is unstructured (e.g. email, web 
> documents, images, audio clips, video streams) with little information 
> (metadata) attached to explain its content or context.  Although many 
> applications have been built to process unstructured data, they have 
> either managed it as a BLOB or they have developed isolated applications 
> for analyzing the content.  In the absence of a standardized means for 
> analytical applications to share insights extracted from the content, 
> analytical applications cannot build upon one another. As a result, the 
> industry has barely begun to tap the value locked in unstructured 
> information.


What does it *do*? How does it *work*? I understand there's a runtime and
a framework and a standardization process and a component-based
interoperability goal, but what I don't understand is what they are *for*.

Can you please write a paragraph or two, that

1) doesn't mention "what the industry is doing" or needs to do
2) doesn't mention frameworks, standards, or current problematic
   industry practices, SOA, SOAP, DARPA, OASIS, or other acronyms
3) outlines what problem this UIMA thing is meant to solve
4) outlines what the approach is to solving that problem
5) outlines how this turns into software
6) gives an example or even two of such software in use in the real world to
   solve some kind of tangible problem

For example, one kind of "unstructured information" is "the web", and one
way to process that is "as plain text, indexing it, create a keyword-based 
search
engine", and then there's also fancier ways such as all the things that google
does. And then there's also various ways to make the unstructured mess that is 
the
web more structured by attaching metadata, eg dublin core metadata or the whwole
semantic web thing, so right now I might walk away with the understanding that
you're devising a way for google and yahoo to interop (which I doubt they really
want) by re-inventing the semantic web movement (which I doubt is really
productive). Enlighten me, please. If it helps, imagine I'm 12 and write PHP 
and 
have difficulty with words such as interoperability since English is not my 
first
language.

cheers,

LSD

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: documentation

2006-08-25 Thread robert burrell donkin

On 8/24/06, Susanne Lefvert <[EMAIL PROTECTED]> wrote:


Hi there,


hi Susanne


I've been looking at the apache ftp server as a solution for one of
my projects. It looks really promising but I would like to get my
hands on more documentation. Do you know if there's more info out
there apart from http://incubator.apache.org/ftpserver/. I've googled
without much success.


IIRC the document is one area which needs work...

probably the best plan would be to ask on the development list. to
subscribe send a mail to [EMAIL PROTECTED]
and follow the instructions returned.


I'm especially interested in how to use the
ftplet functionality to customize the server. Also, the archive links
for the email lists on the web page  are broken...


http://mail-archives.apache.org/mod_mbox/incubator-mod_ftp-dev/

- robert

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] Accept Wicket into the Incubator

2006-08-25 Thread robert burrell donkin

On 8/24/06, Upayavira <[EMAIL PROTECTED]> wrote:

[X] +1 Accept Wicket as an Incubator podling
[ ]  0 Don't care
[ ] -1 Reject this proposal for the following reason:


- robert

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: what wicket is (was: Re: [VOTE] Accept Wicket into the Incubator)

2006-08-25 Thread Greg Stein

On 8/25/06, Leo Simons <[EMAIL PROTECTED]> wrote:

...
Wicket *is* different.


Excellent. Thanks a bunch for the thorough reply and comparison
points. Very helpful.


Whether this is the right way to do things is
debatable, but I would say now is not the right time for the incubator to start
having those kinds of debates.


I'm not trying to start a debate, nor engaging in any debate. I
offered my opinion. The model sounds cool, but I don't happen to like
it. I am fine explaining offlist cuz it really is irrelevant here, as
you note.


Various ASF members like working this way, are
working this way, and are backing this proposal. Trust darwinism.


More power to 'em. I get a vote just like any other Incubator PMC
member. Please don't attempt to deny me that. Just because I didn't +1
the proposal doesn't mean you should try to coerce me into changing my
vote. Darwin also says that proposals could be voted down :-)  (but
I'm not even doing that... it's just a -0 for cryin' out loud)

To be honest, I am rather amazed at the amount of text written because
one single person votes -0 rather than +1. Seriously... wow. God
forbid somebody votes -1. What happens then? Ten times as many words
written to convince them of the error of their ways? What are we
saying to people: don't vote anything but +1 or your inbox will get
slammed? Follow the groupmind, or you shall be mailbombed? Personally,
I'd prefer an environment MUCH more accepting of alternate votes --
that means you'll actually *get* those votes, rather than people being
quiet, too afraid to counter the majority.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: what wicket is (was: Re: [VOTE] Accept Wicket into the Incubator)

2006-08-25 Thread Leo Simons
On Fri, Aug 25, 2006 at 04:46:13AM -0700, Greg Stein wrote:
> >Whether this is the right way to do things is
> >debatable, but I would say now is not the right time for the incubator to 
> >start
> >having those kinds of debates.
> 
> I'm not trying to start a debate, nor engaging in any debate. I
> offered my opinion.

C'mon greg, opinions that are not shared often start a debate! Offering
an opinion without wanting follow-up is kinda hard around here...

> The model sounds cool, but I don't happen to like
> it. I am fine explaining offlist cuz it really is irrelevant here, as
> you note.
> 
> >Various ASF members like working this way, are
> >working this way, and are backing this proposal. Trust darwinism.
> 
> More power to 'em. I get a vote just like any other Incubator PMC
> member. Please don't attempt to deny me that.

Wouldn't dare. You said something which read to me like "+0 pending FOO"
and my "trust darwinism" was ment more as a "don't worry about the 'pending'
stuff or reading my whole e-mail in detail, we'll be fine anyhow". I'm such
an arse with words.

> Just because I didn't +1
> the proposal doesn't mean you should try to coerce me into changing my
> vote.

Not trying to. But like anyone on this planet, the co-operative process
we use says that I'm completely free to try to if I wanted. If I was a
wicket developer and totally convinced of how it absolutely is the best
thing since sliced bread I would probably try to do that. Which is very
much a healthy response. Evangelism, baby!

> Darwin also says that proposals could be voted down :-)  (but
> I'm not even doing that... it's just a -0 for cryin' out loud)

Sssh! Speak softly, or you might provoke more discussion! :-P

> To be honest, I am rather amazed at the amount of text written because
> one single person votes -0 rather than +1. Seriously... wow. God
> forbid somebody votes -1. What happens then? Ten times as many words
> written to convince them of the error of their ways? What are we
> saying to people: don't vote anything but +1 or your inbox will get
> slammed? Follow the groupmind, or you shall be mailbombed? Personally,
> I'd prefer an environment MUCH more accepting of alternate votes --
> that means you'll actually *get* those votes, rather than people being
> quiet, too afraid to counter the majority.

You didn't just -1 or -0, you did so conditionally on not having some kind
of understanding of differences or something. I didn't care much for the
actual vote (its going to get in anyway), but the conditional was interesting
to me. I figured the same conditional might be true for other people as well
(wicket simply is a bit weird, and I'd just spent time figuring out *how* it
is weird) so it was quite worthy to spend an e-mail on it irrespective of any 
vote going on.

I personally couldn't be more accepting of -1s, especially when it concerns
things I don't have a stake in, haven't worked on, and haven't proposed, and if
this really is an environment that isn't similary accepting we should change
that, but I hardly see a mailbomb around here. Of course, we might have one
now because of self-fulfilling prophecy and all that ;)

*ducks*

LSD

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

2006-08-25 Thread Thilo Goetz

Leo Simons wrote:




What does it *do*? How does it *work*? I understand there's a runtime and
a framework and a standardization process and a component-based
interoperability goal, but what I don't understand is what they are *for*.


The unstructured content we're talking about is mainly plain text today. 
 There is also some work going on analyzing video streams, as well as 
multi-modal streams (e.g., video + closed captioning).  I'm not really 
competent to talk about those, so I'll stick to text.  A typical 
processing chain for text analysis starts out something like this:


"language identification" -> "language specific segmentation" -> 
"sentence boundary detection" -> "entity detection (person/place names 
etc.)" -> ...


So you start by identifying the language the text is in (Chinese, 
English etc.).  Then you do token segmentation based on that information 
(it's completely different for Chinese than for English).  Based on the 
tokens you discovered, you may want to do sentence boundary detection, 
so you know what entities occur in the same sentence.  Then, again based 
on the tokens you've found, you can do so-called named entity detection, 
such as place names, person names etc.  After that, you may have another 
module that can discover relations between the entities that you have 
found.  And so on.


UIMA in its core is a component architecture that allows you to create 
analysis applications like the one described above.  It provides 
facilities for creating meta-information on documents like in the 
example above.  That is, the original artifact (i.e., the text) is not 
modified and the derived information is kept separately.


UIMA is mostly a framework, not an application.  So it is not concerned 
with fetching documents, like the crawler of a search engine.  Nor does 
UIMA provide facilities to do very much with the information you have 
extracted from the text (or other artifact).  Rather, the use case is 
that you have an application that has a need for the processing of 
unstructured information.  This application will provide the input data, 
and it will know what to do with the results.  The value of UIMA derives 
from the component model: it is easy to reuse existing analysis 
components that other people have written, and it's easy to exchange, 
say, one language identifier for another.


One standard application scenario is to use UIMA to extract some named 
entities from text, feed the results into a relational database, and use 
the database's mining capabilities to do, e.g., association analysis. 
Another area of application is enhanced text search, where in addition 
to regular free-form text search, you can search for documents 
containing certain entities.  Trivial standard example: you're looking 
for John's phone number in your email, so you use semantic search to 
look for documents that contain John's name and a phone number.  You'll 
use a UIMA component that knows that a pattern 123-456-7890 is a phone 
number and will create a phone number entity.


I hope this gives you a better idea what UIMA is about.

--Thilo


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: documentation

2006-08-25 Thread Susanne Lefvert

thanks

On Aug 25, 2006, at 6:11 AM, robert burrell donkin wrote:


On 8/24/06, Susanne Lefvert <[EMAIL PROTECTED]> wrote:


Hi there,


hi Susanne


I've been looking at the apache ftp server as a solution for one of
my projects. It looks really promising but I would like to get my
hands on more documentation. Do you know if there's more info out
there apart from http://incubator.apache.org/ftpserver/. I've googled
without much success.


IIRC the document is one area which needs work...

probably the best plan would be to ask on the development list. to
subscribe send a mail to [EMAIL PROTECTED]
and follow the instructions returned.


I'm especially interested in how to use the
ftplet functionality to customize the server. Also, the archive links
for the email lists on the web page  are broken...


http://mail-archives.apache.org/mod_mbox/incubator-mod_ftp-dev/

- robert

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



xml -> doap + xml - take #1

2006-08-25 Thread david reid
I've done some initial work on creating the xsl templates to manage the
conversion, but there is some more work I want to do. Question is where
should I put the files? I'm keen that others see what I'm doing and
where I'm aiming to take the work - hence my question :-)

Answers on a postcard or email if you're that way inclined!

david

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

2006-08-25 Thread Marshall Schor

Hi Leo,

Here's a response to your good questions; apologies to you and others 
(David Welton also commented) that we were not clearer, initially.


>
>  I understand there's a runtime and
> a framework and a standardization process and a component-based
> interoperability goal, but what I don't understand is what they are 
*for*.

>
>  outline what problem this UIMA thing is meant to solve

We are working to move the community of folks who write unstructured 
analysis software to a place where these things can be easily put 
together.  Here's some examples of the kind of software we mean: 

* code that works with text and identifies words or phrases as 
particular kinds of entities, such as persons, places, organizations, 
chemical names, times, dates, telephone numbers, or sentiment analysis 
(e.g likes, hates), etc.


* code that works with audio samples and extract text (think of speech 
recognition)


* code that translates text in one language to another

* code that finds similarities among images, or computes a similarity 
score for images


Today, you can find these components bundled up inside various 
solutions, etc.  Our goal was to enable putting together these parts 
into interesting new kinds of applications.  For instance:


* an application that takes several approaches to speech recognition, or 
machine translation, and combines them (in the hope of getting a better 
quality result)


* a search application that wants to search for "concepts" as well as 
key-words, where these concepts have been added by pre-processing the 
data being searched with a set of these components.  You might imagine 
the set of components used would depend on the kind of searching - if 
you were working with genetics, you might have textual information about 
"genes" identified, or you might be searching for a kind of image, along 
with some text.


* A "business intelligence" (not an oxymoron :-) (sorry for the 
buzzword) application that looks for trends in various kinds of messages 
- using components that might tag texts with concepts like "positive 
remark" or "negative remark".


---

> What does it *do*?

It runs components that are written to conform to its architecture 
(remote or local, in several computer languages) in a flow, which is 
configured with external-to-the-component-part configuration files. 


> How does it *work*?
>  outline what the approach is to solving that problem

Components work with one another by adopting a model where data is 
passed from component to component.  Each component examines the data, 
runs its particular unstructured analysis special capability, and adds 
to the data.  Components are required to specify descriptive information 
which UIMA uses in common tooling and running.
Common services needed by components and solutions built with these 
components are provided.


---

>  outline how this turns into software

The data passed from part to part is described with a single-inheritence 
type system.  Support is provided for parts written in Java, C++, and 
some scripting languages (Perl, Python, Tcl).  A variety of tradeoffs in 
programming styles are supported, from Java centric object-oriented 
styles, to styles that go for very high performance and avoid creating 
Java "objects".  Common services to process very large collections of 
things through the flow of components, with pragmatic error handling and 
recovery, are provided.


---

>  give an example or even two of such software in use in the 
real world to

>solve some kind of tangible problem

As part of a company's monitoring of it's outgoing Email (say, as part 
of efforts to comply with Sarbanes-Oxley Act), it could deploy 
components that detect a variety of named entities (persons, places, 
organizations, etc.) and relationships among these.  The company could 
deploy commercially available components, plus some customized for their 
particular domain. (As an example, a recent announcement of some 
commercially available components can be seen at 
http://be.sys-con.com/read/262873.htm )


Another example might involve an insurance brokerage whose employees 
need access to information from insurance policies, notes from field 
adjusters, emails from customers, etc. To enable more focussed 
searching, these data could be augmented with the results of 
unstructured information analysis, and the resulting "structured" 
information could be used in searching.  The company might want to 
integrate commercial, more generic named entity detectors, with specific 
recognizers for their particular needs.  Search could be performed with 
engines capable of searching both using traditional key-words, as well 
as looking for concepts (added by the various UIMA parts), and key-words 
contained within the span of particular concepts.  [Note: these search 
engines already exist].  In addition to search, the broker might process 
this information, looking for early indication o

Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

2006-08-25 Thread Brian McCallister

On Aug 25, 2006, at 4:07 AM, Leo Simons wrote:


What does it *do*?


I believe it is basically a big, pluggable, harness for analyzing and  
annotating streams of arbitrary data, if this is the same thing I  
talked with a bunch of folks about (including Martin, I believe) a  
couple years ago at Hawthorne.


I think it actually grew out of a real need in the research community  
- lots of cool tools were being built to do unstructured information  
analysis and none of them could be used together.


-Brian

ps: I may also be totally wrong. My guesstimate is based on a couple  
hours of conversations a few years ago.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

2006-08-25 Thread Brian McCallister

On Aug 25, 2006, at 11:11 AM, Brian McCallister wrote:


(including Martin, I believe)


s/Martin/Marshall/g

Doh! Sorry :-)

-Brian

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[ANNOUNCE] Apache Abdera v-0.1.0-incubating

2006-08-25 Thread James M Snell
The Apache Abdera community is pleased to announce its first developer
preview release (version 0.1.0-incubating)

You can download binary and source distributions from:

http://people.apache.org/dist/incubator/abdera/0.1.0-incubating

Builds are available for Java 1.5 and Java 1.4.2.

For further information, visit our web site at:
http://incubator.apache.org/abdera

Introduction


The goal of the Apache Abdera project is to build a
functionally-complete, high-performance implementation of the IETF Atom
Syndication Format (RFC 4287) and Atom Publishing Protocol (in-progress)
specifications.

Abdera is an effort undergoing incubation at the Apache Software
Foundation (ASF), sponsored by the Apache Incubator PMC. Incubation is
required of all newly accepted projects until a further review indicates
that the infrastructure, communications, and decision making process
have stabilized in a manner consistent with other successful ASF
projects. While incubation status is not necessarily a reflection of the
completeness or stability of the code, it does indicate that the project
has yet to be fully endorsed by the ASF.

Release Summary
===

 * Parsing Atom Syndication Format 1.0 Feeds
 * Serializing Atom Syndication Format 1.0 Feeds
 * Extension support
 * XML Digital Signature and XML Encryption support
 * High performance incremental parsing model
 * Feed Object Model API


Please feel free to send any feedback to our mailing lists:
abdera-dev@incubator.apache.org
abdera-user@incubator.apache.org

Any contribution in the form of coding, testing, improving the
documentation, and reporting bugs is always welcome. For more
information on how to get involved with the development of Abdera, visit
our website at: http://incubator.apache.org/abdera


Thank you for your interest in Apache Abdera!

- James Snell



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

2006-08-25 Thread David Welton

> What does it *do*?

I believe it is basically a big, pluggable, harness


Harness - will it be able to do something out of the box as a
demonstration of its capabilities?

--
David N. Welton
- http://www.dedasys.com/davidw/

Linux, Open Source Consulting
- http://www.dedasys.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

2006-08-25 Thread Ian Holsman

Hi Thilo
your explanation attracted me ;-)

is UIMA just the interface specification only ? (ie to produce a  
standard in the unstructured text-processing world so that other  
people can plug and play)

or does UIMA also provide tools for each component?

I'm interested, and time permitting, could help as a mentor .. I'm  
not a java expert (compared to others on this list), or a text  
processing expert, but I know

a bit about the processes around the incubator.

regards
Ian


On 26/08/2006, at 2:04 AM, Thilo Goetz wrote:


Leo Simons wrote:



What does it *do*? How does it *work*? I understand there's a  
runtime and

a framework and a standardization process and a component-based
interoperability goal, but what I don't understand is what they  
are *for*.


The unstructured content we're talking about is mainly plain text  
today.  There is also some work going on analyzing video streams,  
as well as multi-modal streams (e.g., video + closed captioning).   
I'm not really competent to talk about those, so I'll stick to  
text.  A typical processing chain for text analysis starts out  
something like this:


"language identification" -> "language specific segmentation" ->  
"sentence boundary detection" -> "entity detection (person/place  
names etc.)" -> ...


So you start by identifying the language the text is in (Chinese,  
English etc.).  Then you do token segmentation based on that  
information (it's completely different for Chinese than for  
English).  Based on the tokens you discovered, you may want to do  
sentence boundary detection, so you know what entities occur in the  
same sentence.  Then, again based on the tokens you've found, you  
can do so-called named entity detection, such as place names,  
person names etc.  After that, you may have another module that can  
discover relations between the entities that you have found.  And  
so on.


UIMA in its core is a component architecture that allows you to  
create analysis applications like the one described above.  It  
provides facilities for creating meta-information on documents like  
in the example above.  That is, the original artifact (i.e., the  
text) is not modified and the derived information is kept separately.


UIMA is mostly a framework, not an application.  So it is not  
concerned with fetching documents, like the crawler of a search  
engine.  Nor does UIMA provide facilities to do very much with the  
information you have extracted from the text (or other artifact).   
Rather, the use case is that you have an application that has a  
need for the processing of unstructured information.  This  
application will provide the input data, and it will know what to  
do with the results.  The value of UIMA derives from the component  
model: it is easy to reuse existing analysis components that other  
people have written, and it's easy to exchange, say, one language  
identifier for another.


One standard application scenario is to use UIMA to extract some  
named entities from text, feed the results into a relational  
database, and use the database's mining capabilities to do, e.g.,  
association analysis. Another area of application is enhanced text  
search, where in addition to regular free-form text search, you can  
search for documents containing certain entities.  Trivial standard  
example: you're looking for John's phone number in your email, so  
you use semantic search to look for documents that contain John's  
name and a phone number.  You'll use a UIMA component that knows  
that a pattern 123-456-7890 is a phone number and will create a  
phone number entity.


I hope this gives you a better idea what UIMA is about.

--Thilo


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Ian Holsman
[EMAIL PROTECTED]
http://personalinjuryfocus.com/




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

2006-08-25 Thread Niclas Hedhman
On Thursday 24 August 2006 03:21, Marshall Schor wrote:

> Proposal for Incubation Project: Unstructured Information Management
> Architecture - UIMA

From going from "WTF is this" to "Hmmm... interesting" after Leo's 
brilliant "please clarify" (resusable as well) mail.

I think this is an area that has plenty of potential, possibly with a lot of 
interested parties in academia at large, I think ASF could be a good 
community breeding ground.

I'm in favour of this, but not capable of contributing in any form.


Cheers
Niclas

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [PROPOSAL] InfoEng project proposal

2006-08-25 Thread Niclas Hedhman
On Friday 25 August 2006 18:52, Leo Simons wrote:
> Regardless, IMHO this kind of software doesn't belong at apache since we're
> all about creating value (both in economic and non-economic sense)
> *without* measuring or turning it into any kind of currency.

I must agree with Leo to a large extent.

Perhaps you should move your ideas towards user-centric micro-payments and 
other forms of digital trade.

Cheers
Niclas

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]