[jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

Tommaso Teofili (JIRA) Tue, 04 Jan 2011 01:06:19 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977177#action_12977177
 ]


Tommaso Teofili commented on SOLR-2129:
---------------------------------------

bq. StringBuffer usage in UpdateRequestProcessor - should be StringBuilder 
right?

yes, right.

bq. private void executeAE(AnalysisEngine ae, JCas jcas) throws 
AnalysisEngineProcessException { ae.getLogger().log(Level.INFO, new 
StringBuffer("Analazying text").toString()); ae.process(jcas); 
ae.getLogger().log(Level.INFO, new StringBuffer("Text processing 
completed").toString()); }

I wanted to logically isolate everything regarding actual processing of text, 
but I agree that this piece of code would look better inside the calling method 
( processText(String) ).

bq. AEProviderFactory should be thread safe?? At a min, you have to consider 
multicore ... consider that you could be sharing AEProvider across threads 
because of this as well (static cache in AEProviderFactory). Perhaps the cache 
should not be static?

Thanks Mark for this, I agree the cache shouldn't be static especially in cases 
where each core has AEs with same classpaths but different runtime parameters.
For what concerns OverridingParamsAEProvider (the only AEProvider impl 
available at the moment) being processed by different threads we can make the 
getAE() method synchronized (or, perhaps, making cachedAE field volatile, but 
need to check better).

bq. Don't want to at least log this? } catch (AnalysisEngineProcessException e) 
{ // do nothing }

I wanted the UIMA enrichment pipeline to be error safe but I agree it'd be 
reasonable to log the error in this case (even if I don't like logging 
exceptions in general).


> Provide a Solr module for dynamic metadata extraction/indexing with Apache 
> UIMA
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-2129
>                 URL: https://issues.apache.org/jira/browse/SOLR-2129
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Tommaso Teofili
>            Assignee: Robert Muir
>         Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, 
> SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch, 
> SOLR-2129.patch
>
>
> Provide components to enable Apache UIMA automatic metadata extraction to be 
> exploited when indexing documents.
> The purpose of this is to get unstructured information "inside" a document 
> and create structured metadata (as fields) to enrich each document.
> Basically this can be done with a custom UpdateRequestProcessor which 
> triggers UIMA while indexing documents.
> The basic UIMA implementation of UpdateRequestProcessor extracts sentences 
> (with a tokenizer and an hidden Markov model tagger), named entities, 
> language, suggested category, keywords and concepts (exploiting external 
> services from OpenCalais and AlchemyAPI). Such an implementation can be 
> easily extended adding or selecting different UIMA analysis engines, both 
> from UIMA repositories on the web or creating new ones from scratch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

Reply via email to