[jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

Tommaso Teofili (JIRA) Tue, 05 Oct 2010 08:58:58 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918037#action_12918037
 ]


Tommaso Teofili commented on SOLR-2129:
---------------------------------------

Robert thanks for that, I confirm that UIMA doesn't require java 6, java 5 is 
fine so this is fine for branc_3x too.

Jörn, good to see you here too :) you can run also custom UIMA Analysis.
By default the default AEs are WhitespaceTokenizer, Tagger, 
AlchemyAPIAnnotator, OpenCalaisAnnotator.


To customize the default behavior you should:
a) change the OverridingParamsExtServicesAEDescriptor and (eventually) 
eventually extend BaseUIMAUpdateRequestProcessor and its SolrUIMAConsumers

or

b) define a new AE descriptor and create for it a new class extending 
UIMAUpdateRequestProcessor (or extend BaseUIMAUpdateRequestProcessor) then 
modify the UIMAUpdateRequestProcessorFactory to initialize that class instead 
of the base one.


If you need any parameters to be set at runtime for a delegate AE, you must 
set, inside the aggregate AE, an overriding parameter that overrides some 
parameter in the delegate AE and then define its runtime value in solrconfig 
with:

<uimaConfig>
  <runtimeParameters>
      <overriding_param_name>RUNTIMEVALUE</overriding_param_name>
  </runtimeParameters>
</uimaConfig>




> Provide a Solr module for dynamic metadata extraction/indexing with Apache 
> UIMA
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-2129
>                 URL: https://issues.apache.org/jira/browse/SOLR-2129
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Tommaso Teofili
>            Assignee: Robert Muir
>         Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, 
> SOLR-2129.patch
>
>
> Provide components to enable Apache UIMA automatic metadata extraction to be 
> exploited when indexing documents.
> The purpose of this is to get unstructured information "inside" a document 
> and create structured metadata (as fields) to enrich each document.
> Basically this can be done with a custom UpdateRequestProcessor which 
> triggers UIMA while indexing documents.
> The basic UIMA implementation of UpdateRequestProcessor extracts sentences 
> (with a tokenizer and an hidden Markov model tagger), named entities, 
> language, suggested category, keywords and concepts (exploiting external 
> services from OpenCalais and AlchemyAPI). Such an implementation can be 
> easily extended adding or selecting different UIMA analysis engines, both 
> from UIMA repositories on the web or creating new ones from scratch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

Reply via email to