Why ever would you not index less than three characters?
“To be or not to be”
Seems like a significant search 

> On Oct 23, 2021, at 7:28 AM, son hoang <sonhoan...@gmail.com> wrote:
> 
> Yep, words less than 3 chars will not be indexed. But if "Al Abbas" text can 
> be separated into a token "Abbas" (and "Al"  but it is not counted as a token 
> as it has 2 chars only) then we can apply OR condition in the query?  
> 
>> On 2021/10/22 14:37:51, Andy C <andycs...@gmail.com> wrote: 
>> The issue looks to me to be with the use of EdgeNGramFilterFactory in your
>> field type. You have configured it with minGramSize="3" and have not
>> specified preserveOriginal="true".
>> 
>> So words less than 3 characters will not be indexed, and therefore can't be
>> searched.
>> 
>> See
>> https://solr.apache.org/guide/8_8/filter-descriptions.html#edge-n-gram-filter
>> 
>> - Andy -
>> 
>>> On Fri, Oct 22, 2021 at 10:12 AM son hoang <sonhoan...@gmail.com> wrote:
>>> 
>>> Thanks, Thamiz
>>> 
>>> It seems that I have index=StandardTokenizerFactory causing the issue
>>> 
>>> I do not want to re-index. Is there any solution ? Should I have query
>>> "OR" so that the search can return  "Al Abbas" when I have  "Al Abbas" in
>>> the query field  (eg: there is a OR match "Abbas" ?
>>> 
>>> Thanks
>>> 
>>> On 2021/10/21 07:56:20, Thamizhazhagan B <thamizhazhagan....@kp.org>
>>> wrote:
>>>> Hi,
>>>> 
>>>> Create a copy field as below and use this copyfield in your query..
>>>> 
>>>> <copyField source="_name" dest="itemFullName"/>
>>>>  <field name="itemFullName" type="itemFullName_type" stored="true"
>>> indexed="true" termVectors="true" termPositions="true" termOffsets="true"/>
>>>> 
>>>> <fieldType name="itemFullName_type" class="solr.TextField"
>>> sortMissingLast="true" omitNorms="true" positionIncrementGap="100"
>>> multiValued="false">
>>>>    <analyzer type="index">
>>>>      <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>>      <filter class="solr.StopFilterFactory" words="stopwords.txt"
>>> ignoreCase="true"/>
>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>    </analyzer>
>>>>    <analyzer type="query">
>>>>      <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>>      <filter class="solr.StopFilterFactory" words="stopwords.txt"
>>> ignoreCase="true"/>
>>>>      <filter class="solr.SynonymFilterFactory" expand="true"
>>> ignoreCase="true" synonyms="synonyms.txt"/>
>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>    </analyzer>
>>>>  </fieldType>
>>>> 
>>>> Thanks,
>>>> Thamizh
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: son hoang <sonhoan...@gmail.com>
>>>> Sent: Thursday, October 21, 2021 8:19 AM
>>>> To: users@solr.apache.org
>>>> Subject: Index for text with space
>>>> 
>>>> Caution: This email came from outside Kaiser Permanente. Do not open
>>> attachments or click on links if you do not recognize the sender.
>>>> 
>>>> ______________________________________________________________________
>>>> Hello
>>>> 
>>>> I have a config like this:
>>>> 
>>>> <fieldtype name="tok" class="solr.TextField" positionIncrementGap="100">
>>>>            <analyzer type="index">
>>>>                <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>                <filter class="solr.ASCIIFoldingFilterFactory"/>
>>>>                <filter class="solr.LowerCaseFilterFactory"/>
>>>>        <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
>>>> maxGramSize="15"/>
>>>>            </analyzer>
>>>>            <analyzer type="query">
>>>>                <tokenizer class="solr.StandardTokenizerFactory" />
>>>>                <filter class="solr.ASCIIFoldingFilterFactory"/>
>>>>                <filter class="solr.LowerCaseFilterFactory"/>
>>>>        <!-- <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
>>>> maxGramSize="15"/> -->
>>>>            </analyzer>
>>>>    </fieldtype>
>>>> 
>>>> Using this config:
>>>> 
>>>> 1. When I search for "Abbas", the result for "Al Abbas" appears.
>>>> 
>>>> 2. When I search for "Al Abbas" in the search field, I get no results.
>>>> 
>>>> It seems that "Al Abbas" is not indexed. What I should do in the config
>>> so #2 can return the result
>>>> 
>>>> Many thanks
>>>> NOTICE TO RECIPIENT:  If you are not the intended recipient of this
>>> e-mail, you are prohibited from sharing, copying, or otherwise using or
>>> disclosing its contents.  If you have received this e-mail in error, please
>>> notify the sender immediately by reply e-mail and permanently delete this
>>> e-mail and any attachments without reading, forwarding or saving them.
>>> v.173.295  Thank you.
>>>> 
>>> 
>> 

Reply via email to