Re: Index for text with space

Dave Mon, 25 Oct 2021 05:48:06 -0700

You can pre process the query to remove anything not indexed (less than 3 
characters) but that initial scheme decision was a mistake, and should be 
remedied and reindexed.


> On Oct 25, 2021, at 8:36 AM, son hoang <[email protected]> wrote:
> 
> Is there any way in the query so that I do not need to reindex the whole 
> data?
> 
>> On 2021/10/23 15:39:18, Walter Underwood <[email protected]> wrote: 
>> Agreed. There is a simple fix. Index all the words. Also, stop using 
>> EdgeNgramFilter.
>> That is only used for completion, not word search.
>> 
>> wunder
>> Walter Underwood
>> [email protected]
>> http://observer.wunderwood.org/  (my blog)
>> 
>>>> On Oct 23, 2021, at 4:31 AM, Dave <[email protected]> wrote:
>>> 
>>> Why ever would you not index less than three characters?
>>> “To be or not to be”
>>> Seems like a significant search 
>>> 
>>>> On Oct 23, 2021, at 7:28 AM, son hoang <[email protected]> wrote:
>>>> 
>>>> Yep, words less than 3 chars will not be indexed. But if "Al Abbas" text 
>>>> can be separated into a token "Abbas" (and "Al"  but it is not counted as 
>>>> a token as it has 2 chars only) then we can apply OR condition in the 
>>>> query?  
>>>> 
>>>>> On 2021/10/22 14:37:51, Andy C <[email protected]> wrote: 
>>>>> The issue looks to me to be with the use of EdgeNGramFilterFactory in your
>>>>> field type. You have configured it with minGramSize="3" and have not
>>>>> specified preserveOriginal="true".
>>>>> 
>>>>> So words less than 3 characters will not be indexed, and therefore can't 
>>>>> be
>>>>> searched.
>>>>> 
>>>>> See
>>>>> https://solr.apache.org/guide/8_8/filter-descriptions.html#edge-n-gram-filter
>>>>> 
>>>>> - Andy -
>>>>> 
>>>>>> On Fri, Oct 22, 2021 at 10:12 AM son hoang <[email protected]> wrote:
>>>>>> 
>>>>>> Thanks, Thamiz
>>>>>> 
>>>>>> It seems that I have index=StandardTokenizerFactory causing the issue
>>>>>> 
>>>>>> I do not want to re-index. Is there any solution ? Should I have query
>>>>>> "OR" so that the search can return  "Al Abbas" when I have  "Al Abbas" in
>>>>>> the query field  (eg: there is a OR match "Abbas" ?
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> On 2021/10/21 07:56:20, Thamizhazhagan B <[email protected]>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Create a copy field as below and use this copyfield in your query..
>>>>>>> 
>>>>>>> <copyField source="_name" dest="itemFullName"/>
>>>>>>> <field name="itemFullName" type="itemFullName_type" stored="true"
>>>>>> indexed="true" termVectors="true" termPositions="true" 
>>>>>> termOffsets="true"/>
>>>>>>> 
>>>>>>> <fieldType name="itemFullName_type" class="solr.TextField"
>>>>>> sortMissingLast="true" omitNorms="true" positionIncrementGap="100"
>>>>>> multiValued="false">
>>>>>>>  <analyzer type="index">
>>>>>>>    <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>>>>>    <filter class="solr.StopFilterFactory" words="stopwords.txt"
>>>>>> ignoreCase="true"/>
>>>>>>>    <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>  </analyzer>
>>>>>>>  <analyzer type="query">
>>>>>>>    <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>>>>>    <filter class="solr.StopFilterFactory" words="stopwords.txt"
>>>>>> ignoreCase="true"/>
>>>>>>>    <filter class="solr.SynonymFilterFactory" expand="true"
>>>>>> ignoreCase="true" synonyms="synonyms.txt"/>
>>>>>>>    <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>  </analyzer>
>>>>>>> </fieldType>
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Thamizh
>>>>>>> 
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: son hoang <[email protected]>
>>>>>>> Sent: Thursday, October 21, 2021 8:19 AM
>>>>>>> To: [email protected]
>>>>>>> Subject: Index for text with space
>>>>>>> 
>>>>>>> Caution: This email came from outside Kaiser Permanente. Do not open
>>>>>> attachments or click on links if you do not recognize the sender.
>>>>>>> 
>>>>>>> ______________________________________________________________________
>>>>>>> Hello
>>>>>>> 
>>>>>>> I have a config like this:
>>>>>>> 
>>>>>>> <fieldtype name="tok" class="solr.TextField" positionIncrementGap="100">
>>>>>>>          <analyzer type="index">
>>>>>>>              <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>>>>              <filter class="solr.ASCIIFoldingFilterFactory"/>
>>>>>>>              <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>      <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
>>>>>>> maxGramSize="15"/>
>>>>>>>          </analyzer>
>>>>>>>          <analyzer type="query">
>>>>>>>              <tokenizer class="solr.StandardTokenizerFactory" />
>>>>>>>              <filter class="solr.ASCIIFoldingFilterFactory"/>
>>>>>>>              <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>      <!-- <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
>>>>>>> maxGramSize="15"/> -->
>>>>>>>          </analyzer>
>>>>>>>  </fieldtype>
>>>>>>> 
>>>>>>> Using this config:
>>>>>>> 
>>>>>>> 1. When I search for "Abbas", the result for "Al Abbas" appears.
>>>>>>> 
>>>>>>> 2. When I search for "Al Abbas" in the search field, I get no results.
>>>>>>> 
>>>>>>> It seems that "Al Abbas" is not indexed. What I should do in the config
>>>>>> so #2 can return the result
>>>>>>> 
>>>>>>> Many thanks
>>>>>>> NOTICE TO RECIPIENT:  If you are not the intended recipient of this
>>>>>> e-mail, you are prohibited from sharing, copying, or otherwise using or
>>>>>> disclosing its contents.  If you have received this e-mail in error, 
>>>>>> please
>>>>>> notify the sender immediately by reply e-mail and permanently delete this
>>>>>> e-mail and any attachments without reading, forwarding or saving them.
>>>>>> v.173.295  Thank you.
>>>>>>> 
>>>>>> 
>>>>> 
>> 
>>

Re: Index for text with space

Reply via email to