Hi Kirstine,

You might be able to narrow down the issue by looking at what happens to
the keyword in the Analysis Screen (
https://solr.apache.org/guide/solr/latest/indexing-guide/analysis-screen.html
).

I could imagine that lægen is stemmed to læg in the index and your wildcard
search for læge* cannot find læg as no stemming is performed on wildcard
prefixes. You might want to add a KeywordRepeatFilter (
https://solr.apache.org/guide/solr/latest/indexing-guide/language-analysis.html#keywordrepeatfilterfactory)
to preserve the unstemmed token in the analysis chain.

Best regards
Matthias


On Tue, Apr 30, 2024 at 1:13 PM Kirstine Wilfred Christensen
<k...@dbc.dk.invalid> wrote:

> Hi,
>
> I'm trying to figure out if it's intentional or a bug that truncated
> search in fields with field type text_da only works for 4-6 characters -
> longer queries gives 0 results.
>
> I've tried starting up both solr 9.4 and 9.6 using the tutorial for
> launching Solr in SolrCloud mode, but instead of choosing the techproducts
> configset I've used the _default configset, because it has in the
> managed-schema.xml a
> dynamic field for Danish.
>
>  <!-- Danish -->
>     <dynamicField name="*_txt_da" type="text_da"  indexed="true"
> stored="true"/>
>
> I've then posted a modified record (like the ones in the books.json
> example) to my collection with this data:
>
>   {
>     "id" : "978-8776075224",
>     "cat" : ["book","paperback"],
>     "name" : "Lægen",
>     "author" : "Kirsten Ahlburg",
>     "sequence_i" : 1,
>     "genre_s" : "fiction",
>     "inStock" : true,
>     "price" : 30.50,
>     "pages_i" : 36,
>     "abstract_txt_da": "Da Lisa skal opereres, møder hun lægen Jacob. Hun
> forelsker sig i ham, men kan man blive kæreste med sin læge?"
>   }
>
> If I in the solr GUI send of this query (expected to hit "læge"/"lægen")
> abstract_txt_da:læg*
> Numfound: 1
>
> If I add a letter, I get no results (expected to hit "lægen")
> abstract_txt_da:læge*
> Numfound: 0
>
> The same can be seen with this query (expected to hit "opereres")
> abstract_txt_da:oper*
> Numfound: 1
>
> abstract_txt_da:opere*  (expected to hit "opereres")
> Numfound: 0
>
> If I try truncating the word 'kæreste' I however get to add a few more
> characters before it gives no results
>
> abstract_txt_da:kærest* (expected to hit "kæreste")
> Numfound: 1
>
> abstract_txt_da:kæreste* (expected to hit "kæreste")
> Numfound: 0
>
> I doesn't seem to be a problem to truncate a word that doesn't really need
> truncating in the genre_s field
>
> genre_s:fiction*
> Numfound: 1
>
> Is this working as intended, and if yes, then why?
>
> Or is this a bug?
>
> Link to the tutorial I used.
>
>
> https://solr.apache.org/guide/solr/latest/getting-started/tutorial-techproducts.html
>
> Best regards,
> Kirstine Christensen,
> Developer at Danish company
>
>

Reply via email to