[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jan Høydahl updated SOLR-1979:
------------------------------
Attachment: SOLR-1979.patch
First raw patch implementing language identification.
> Create LanguageIdentifierUpdateProcessor
> ----------------------------------------
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
> Issue Type: New Feature
> Components: update
> Reporter: Jan Høydahl
> Priority: Minor
> Attachments: SOLR-1979.patch
>
>
> We need the ability to detect language of some random text in order to act
> upon it, such as indexing the content into language aware fields. Another
> usecase is to be able to filter/facet on language on random unstructured
> content.
> To do this, we should wrap the [Nutch
> LanguageIdentifier|http://nutch.apache.org/apidocs-1.1/org/apache/nutch/analysis/lang/LanguageIdentifier.html"]
> in an UpdateProcessor. The processor should be configured like this:
> {code:xml}
> <processor
> class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
> <str name="inputFields">title,teaser,body</str>
> <str name="isoOutputField">language</str>
> <str name="fullOutputField">language_display</str>
> </processor>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]