[
https://issues.apache.org/jira/browse/SOLR-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joan Codina updated SOLR-1997:
------------------------------
Attachment: SOLR-1997-1.4.patch
SOLR-1997-1.5.patch
patch for 1.4 and 1.5 versions
> analyzed field: Store internal value instead of input one
> ---------------------------------------------------------
>
> Key: SOLR-1997
> URL: https://issues.apache.org/jira/browse/SOLR-1997
> Project: Solr
> Issue Type: New Feature
> Affects Versions: 1.4, 1.4.1, 1.5
> Reporter: Joan Codina
> Fix For: 1.4, 1.4.1, 1.5
>
> Attachments: SOLR-1997-1.4.patch, SOLR-1997-1.5.patch
>
>
> Solr implements a set of filters and tokenizers that allow the filtering and
> treatment of text, but when the field is set to be stored, the text stored is
> the input one. This is may useful when the end user reads the input, but may
> not be like this in others, cases, when for example there are payloads and
> the text is something like A|2.0 good|1.0 day|3.0, or if the result of a
> query is processed using something like Carrot2
> So this is a simple new kind of field that takes as input the output of a
> given type (source), and then performs the normal processing with the desired
> tokenizers and filters . The difference is that the stored value is the
> output of the source type, and this is what is retrieved when getting the
> document.
> The name of the field type is AnalyzedField and in the schema is introduced
> in the following way to create the analyzedSourceType from the SourceType
> <fieldType name="SourceType" class="solr.TextField" >
> <analyzer type="index">
> <tokenizer
> class="solr.StandardTokenizerFactory" />
> <filter class......." />
> </analyzer>
> <analyzer type="query">
> <tokenizer
> class="solr.StandardTokenizerFactory" />
> <filter ....." />
> </analyzer>
> </fieldType>
> <fieldType name="analyzedSoureType" class="solr.AnalyzedField"
> positionIncrementGap="100" preProcessType="SourceType">
> <analyzer>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> many times just the WhitespaceTokenizerFactory is needed as the tokens have
> already been cut down by the SourceType
> finally, a field can be declared as
> <field name="analyzedData" type="analyzedSoureType" indexed="true"
> stored="true" termVectors="true" multiValued="true"/>
> which can be written directly or can be defined as a copy of the source one.
> <field name="Data" type="analyzedSoureType" indexed="true" stored="true"
> termVectors="true" multiValued="true"/>
> ...
> <copyField source=data" dest="analyzedData"/>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]