[jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

Tommaso Teofili (JIRA) Fri, 05 Nov 2010 00:54:11 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928531#action_12928531
 ]


Tommaso Teofili commented on SOLR-2129:
---------------------------------------

bq. Try to reuse the same syntax as the mapping in the ExtractingRequestHandler.

Inside <uimaConfig> there are many possible ways that configuration can be 
defined.
Let's say we want to map the feature 'text' of type 'ConceptFS' on the field 
'concept', I thought 3 options, listed here

1. exactly same syntax as ExtractingRequestHandler, though Solr-UIMA is not a 
RequestHandler but an UpdateRequestProcessor; could this create confusion?
   <lst name="defaults">
      <str 
name="fmap.org.apache.uima.alchemy.ts.categorization.concep...@text">concept</str>
    </lst>

2. define the feature of a type to map over a field with one tag
    <map field="concept" 
feature="org.apache.uima.alchemy.ts.categorization.concep...@text"/>

3. have  a more hierarchical and strict structure, though not so immediate to 
understand and maybe easier for UIMA experts
    <type name="org.apache.uima.alchemy.ts.categorization.ConceptFS">
      <feature name="text">concept</feature>
    </type>

What do you think?
Thanks for any advice,
Tommaso

> Provide a Solr module for dynamic metadata extraction/indexing with Apache 
> UIMA
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-2129
>                 URL: https://issues.apache.org/jira/browse/SOLR-2129
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Tommaso Teofili
>            Assignee: Robert Muir
>         Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, 
> SOLR-2129.patch
>
>
> Provide components to enable Apache UIMA automatic metadata extraction to be 
> exploited when indexing documents.
> The purpose of this is to get unstructured information "inside" a document 
> and create structured metadata (as fields) to enrich each document.
> Basically this can be done with a custom UpdateRequestProcessor which 
> triggers UIMA while indexing documents.
> The basic UIMA implementation of UpdateRequestProcessor extracts sentences 
> (with a tokenizer and an hidden Markov model tagger), named entities, 
> language, suggested category, keywords and concepts (exploiting external 
> services from OpenCalais and AlchemyAPI). Such an implementation can be 
> easily extended adding or selecting different UIMA analysis engines, both 
> from UIMA repositories on the web or creating new ones from scratch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

Reply via email to