[
https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924691#action_12924691
]
Grant Ingersoll commented on SOLR-2129:
---------------------------------------
Cool stuff, Tommaso. I'm starting to look at adding classifiers into Solr via
Mahout, so thought I would look at this too.
Couple of early things, based on looking at the getting started instructions.
# I think we should do like we do with Tika and provide a way for users to map
UIMA output to Solr fields as opposed to having to hardcode in specific fields.
# For the Jars, have a look at how the clustering is setup. We should be able
to just point at the UIMA libs in solrconfig.xml under contrib/uima/lib instead
of having to copy them around
> Provide a Solr module for dynamic metadata extraction/indexing with Apache
> UIMA
> -------------------------------------------------------------------------------
>
> Key: SOLR-2129
> URL: https://issues.apache.org/jira/browse/SOLR-2129
> Project: Solr
> Issue Type: New Feature
> Reporter: Tommaso Teofili
> Assignee: Robert Muir
> Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch,
> SOLR-2129.patch
>
>
> Provide components to enable Apache UIMA automatic metadata extraction to be
> exploited when indexing documents.
> The purpose of this is to get unstructured information "inside" a document
> and create structured metadata (as fields) to enrich each document.
> Basically this can be done with a custom UpdateRequestProcessor which
> triggers UIMA while indexing documents.
> The basic UIMA implementation of UpdateRequestProcessor extracts sentences
> (with a tokenizer and an hidden Markov model tagger), named entities,
> language, suggested category, keywords and concepts (exploiting external
> services from OpenCalais and AlchemyAPI). Such an implementation can be
> easily extended adding or selecting different UIMA analysis engines, both
> from UIMA repositories on the web or creating new ones from scratch.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]