Re: Problem with word separators?

Benjamin Armintor Wed, 17 Apr 2024 06:33:55 -0700

I don’t use that graph filter, but from the documentation it looks like a
couple of other splitters may still be affecting those tokens (like
splitOnCaseChange, splitOnNumerics, generateNumberParts).


Some of the apparent complexity here is using text-oriented fields and
tokenizers but trying to capture what appear to be structured article
identifiers. If you are specifically trying to find these in text content,
you might be better served by a different tokenizer (maybe even the
ClassicTokenizer) or a regex matcher.

If you don’t actually need to find those numbers in text, you might be
better served by using a plain string index field?

On Wed, Apr 17, 2024 at 8:43 AM Carsten Klement <kont...@carsten-klement.de>
wrote:

> Hello, doesn't anyone have an idea? ☹
>
>
>
> Am 10.04.24, 11:40 schrieb "Carsten Klement" <kont...@carsten-klement.de
> <mailto:kont...@carsten-klement.de>>:
>
>
> Hello,
> I think I have a problem with configured Word separators.
>
> For example, I would like 3 items to be found when searching for 640,
> 640-0 and two when searching for 640-01.
>
> #1
> artikelnummer_txt:"640*" AND lng:"de"
> "docs":[{
> "artikelnummer_txt":"640-01"
> },{
> "artikelnummer_txt":"640-02"
> },{
> "artikelnummer_txt":"640-01LFM"
> }]
>
> This is perfect, everything from the “artikelnummer_txt” field that starts
> with 640 will be found.
>
> #2
> artikelnummer_txt:"640-0*" AND lng:"de"
> "docs":[ ]
>
>
> However, if I enter a "-" with a "0", no article is found. Here I expect
> all three items
>
>
> #3
> artikelnummer_txt:"640-01*" AND lng:"de"
> "docs":[{
> "artikelnummer_txt":"640-01"
> }]
>
> Here I only get one item, but I also expect two items.
>
> My configuration in schema.xml
> <dynamicField name="*_txt" type="text_general" indexed="true"
> stored="true"/>
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="false">
> <analyzer type="index">
> <tokenizer name="standard"/>
> <filter ignoreCase="true" words="stopwords.txt" name="stop"/>
> <filter name="lowercase"/>
> </analyzer>
>
> <analyzer type="query">
> <tokenizer name="standard"/>
> <!-- Test START -->
> <filter name="wordDelimiterGraph" types="wordDelimiters.txt"/>
> <filter name="flattenGraph"/>
> <!-- Test ENDE -->
> <filter ignoreCase="true" words="stopwords.txt" name="stop"/>
> <filter ignoreCase="true" synonyms="synonyms.txt" name="synonymGraph"
> expand="true"/>
> <filter name="lowercase"/>
> </analyzer>
> </fieldType>
>
> ### wordDelimiters.txt
> # Don't split numbers at '$', '.' or ','
> $ => DIGIT
> . => DIGIT
> - => ALPHANUM
>
>
> Maybe someone has an idea what I'm doing wrong?
>
> Thanks
> Carsten
>
>
>
>
>
>
>
>
>
>

Re: Problem with word separators?

Reply via email to