[ 
https://issues.apache.org/jira/browse/SOLR-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-1997:
-------------------------------

    Fix Version/s:     (was: 4.7)
                   4.8

> analyzed field: Store internal value instead of input one
> ---------------------------------------------------------
>
>                 Key: SOLR-1997
>                 URL: https://issues.apache.org/jira/browse/SOLR-1997
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joan Codina
>             Fix For: 4.8
>
>         Attachments: SOLR-1997-1.4.patch, SOLR-1997-1.5.patch
>
>
> Solr implements a set of filters and tokenizers that allow the filtering and 
> treatment of text, but when the field is set to be stored, the text stored is 
> the input one. This is may useful when the end user reads the input, but may 
> not be like this in others, cases, when for example there are payloads and 
> the text is something like A|2.0 good|1.0 day|3.0, or if the result of a 
> query is processed using something like Carrot2
> So this is a simple new kind of field that takes as input the output of a 
> given type (source), and then performs the normal processing with the desired 
> tokenizers and filters . The difference is that the stored value is the 
> output of the source type, and this is what is retrieved when getting the 
> document.
> The name of the field type  is AnalyzedField and in the schema is introduced 
> in the following way to create the analyzedSourceType from the  SourceType
>               <fieldType name="SourceType" class="solr.TextField"  >
>                       <analyzer type="index">
>                               <tokenizer 
> class="solr.StandardTokenizerFactory" />
>                               <filter class......." />
>                       </analyzer>
>                       <analyzer type="query">
>                               <tokenizer 
> class="solr.StandardTokenizerFactory" />
>                               <filter ....." />
>                       </analyzer>
>               </fieldType>
>  <fieldType name="analyzedSoureType" class="solr.AnalyzedField" 
> positionIncrementGap="100" preProcessType="SourceType">
>              <analyzer>
>                  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>            </analyzer>
>  </fieldType>
> many times just the WhitespaceTokenizerFactory  is needed as the tokens have 
> already been cut down by the  SourceType
> finally, a field can be declared as 
> <field name="analyzedData" type="analyzedSoureType" indexed="true" 
> stored="true" termVectors="true" multiValued="true"/>
> which can be written directly or can be defined as a copy of the source one.
> <field name="Data" type="analyzedSoureType" indexed="true" stored="true" 
> termVectors="true" multiValued="true"/>
> ...
> <copyField source=data" dest="analyzedData"/>



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to