Why ever would you not index less than three characters? “To be or not to be” Seems like a significant search
> On Oct 23, 2021, at 7:28 AM, son hoang <sonhoan...@gmail.com> wrote: > > Yep, words less than 3 chars will not be indexed. But if "Al Abbas" text can > be separated into a token "Abbas" (and "Al" but it is not counted as a token > as it has 2 chars only) then we can apply OR condition in the query? > >> On 2021/10/22 14:37:51, Andy C <andycs...@gmail.com> wrote: >> The issue looks to me to be with the use of EdgeNGramFilterFactory in your >> field type. You have configured it with minGramSize="3" and have not >> specified preserveOriginal="true". >> >> So words less than 3 characters will not be indexed, and therefore can't be >> searched. >> >> See >> https://solr.apache.org/guide/8_8/filter-descriptions.html#edge-n-gram-filter >> >> - Andy - >> >>> On Fri, Oct 22, 2021 at 10:12 AM son hoang <sonhoan...@gmail.com> wrote: >>> >>> Thanks, Thamiz >>> >>> It seems that I have index=StandardTokenizerFactory causing the issue >>> >>> I do not want to re-index. Is there any solution ? Should I have query >>> "OR" so that the search can return "Al Abbas" when I have "Al Abbas" in >>> the query field (eg: there is a OR match "Abbas" ? >>> >>> Thanks >>> >>> On 2021/10/21 07:56:20, Thamizhazhagan B <thamizhazhagan....@kp.org> >>> wrote: >>>> Hi, >>>> >>>> Create a copy field as below and use this copyfield in your query.. >>>> >>>> <copyField source="_name" dest="itemFullName"/> >>>> <field name="itemFullName" type="itemFullName_type" stored="true" >>> indexed="true" termVectors="true" termPositions="true" termOffsets="true"/> >>>> >>>> <fieldType name="itemFullName_type" class="solr.TextField" >>> sortMissingLast="true" omitNorms="true" positionIncrementGap="100" >>> multiValued="false"> >>>> <analyzer type="index"> >>>> <tokenizer class="solr.KeywordTokenizerFactory"/> >>>> <filter class="solr.StopFilterFactory" words="stopwords.txt" >>> ignoreCase="true"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> </analyzer> >>>> <analyzer type="query"> >>>> <tokenizer class="solr.KeywordTokenizerFactory"/> >>>> <filter class="solr.StopFilterFactory" words="stopwords.txt" >>> ignoreCase="true"/> >>>> <filter class="solr.SynonymFilterFactory" expand="true" >>> ignoreCase="true" synonyms="synonyms.txt"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> </analyzer> >>>> </fieldType> >>>> >>>> Thanks, >>>> Thamizh >>>> >>>> >>>> -----Original Message----- >>>> From: son hoang <sonhoan...@gmail.com> >>>> Sent: Thursday, October 21, 2021 8:19 AM >>>> To: users@solr.apache.org >>>> Subject: Index for text with space >>>> >>>> Caution: This email came from outside Kaiser Permanente. Do not open >>> attachments or click on links if you do not recognize the sender. >>>> >>>> ______________________________________________________________________ >>>> Hello >>>> >>>> I have a config like this: >>>> >>>> <fieldtype name="tok" class="solr.TextField" positionIncrementGap="100"> >>>> <analyzer type="index"> >>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>> <filter class="solr.ASCIIFoldingFilterFactory"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" >>>> maxGramSize="15"/> >>>> </analyzer> >>>> <analyzer type="query"> >>>> <tokenizer class="solr.StandardTokenizerFactory" /> >>>> <filter class="solr.ASCIIFoldingFilterFactory"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> <!-- <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" >>>> maxGramSize="15"/> --> >>>> </analyzer> >>>> </fieldtype> >>>> >>>> Using this config: >>>> >>>> 1. When I search for "Abbas", the result for "Al Abbas" appears. >>>> >>>> 2. When I search for "Al Abbas" in the search field, I get no results. >>>> >>>> It seems that "Al Abbas" is not indexed. What I should do in the config >>> so #2 can return the result >>>> >>>> Many thanks >>>> NOTICE TO RECIPIENT: If you are not the intended recipient of this >>> e-mail, you are prohibited from sharing, copying, or otherwise using or >>> disclosing its contents. If you have received this e-mail in error, please >>> notify the sender immediately by reply e-mail and permanently delete this >>> e-mail and any attachments without reading, forwarding or saving them. >>> v.173.295 Thank you. >>>> >>> >>