Hello dear Solr Team

In my documents I use fields with field type text_de. Lately I came across 
weird behavior with this field type. To reproduce these weird behavior I have 
set up a local solr verver on my machine. There I could identify the same 
behavior.

To reproduce it yourself, just set up a local solr server as described here: 
https://solr.apache.org/guide/solr/latest/deployment-guide/installing-solr.html

Start the solr server: bin/solr start
Create a simple core: bin/solr create -c dario

Now in the browser go to http://localhost:8983/solr/#/dario/schema and add two 
fields.
Field 1:
Use name_general for name and text_general for the field type. Leave everything 
else as default.
Field 2:
Use name_de for name and text_de for the field type. Leave everything else as 
default.

Then go to http://localhost:8983/solr/#/dario/documents to index some documents.
Select CSV for Document Type.
In the textfield Document(s) type in the fallowing text: (hyphens mark start 
and end of text, should be excluded from pasted text)
-
name_general,name_de
DARIO,DARIO
DARIOT,DARIOT
DARIOTE,DARIOTE
DARIOTEN,DARIOTEN
DARIOTENU,DARIOTENU
-

Now go to http://localhost:8983/solr/#/dario/query to search documents.

First, I use the query *:* to find all documents. This works as expected.
Then I use the query name_general:* to find all documents with the field 
name_general. This again finds all documents as expected. The same for name_de:*

Now I want to find specific documents with the text_general field:
Searching for name_general:DARIO finds one document where name_general is 
exactly DARIO. This is what I expected. The same thing happens for all other 
names.

Now it gets interesting. Searching for text_de leads to all kind of weird 
results. Listed below are the queries and what is found with this query, and if 
I expected the result or not:
-
name_de:DARIO -> DARIO (expected)
name_de:DARIOT -> DARIOT (expected), DARIOTE (not expected), DARIOTEN (not 
expected)
name_de:DARIOTE -> DARIOT (not expected), DARIOTE (expected), DARIOTEN (not 
expected)
name_de:DARIOTEN -> DARIOT (not expected), DARIOTE (not expected), DARIOTEN 
(expected)
name_de:DARIOTENU -> DARIOTENU (expected)
-

This probably has something to do with stemming. But why should DARIOTEN be 
stemmed to DARIOT? (Sorry for the bad English, as you can see with the usage of 
text_de my first language is German)
DARIOTEN is also not defined in the stop words file or anywhere else. So what 
is happening here?

Now I also tried enclosing the searched word in quotes. E. g. name_de:"DARIO"
This did not change the results.

So this is one problem that I can not explain, but would very much like to 
understand. But this is not my only confusion with this field type. What 
fallows is another problem.

Sometimes I want to find documents that contain some text (as opposed to exact 
matches)
So I enclose the name with asterisks. I expect these queries to find all the 
documents that EITHER exactly match the word OR contain the word.
Again, this works as expected with the name_general field. But again the 
name_de behaves weirdly.
I will again list all the queries and the results. This time I will also note 
the documents that were missing from the results, but I expected them to be 
included. I will note these documents after the pipe |.
As this time no documents are found that should not be found, the expected / 
not expected marker is omitted.

-
name_de:*DARIO* -> DARIO, DARIOT, DARIOTE, DARIOTEN, DARIOTENU
name_de:*DARIOT* -> DARIOT, DARIOTE, DARIOTEN, DARIOTENU
name_de:*DARIOTE* -> DARIOTENU | DARIOTE, DARIOTEN
name_de:*DARIOTEN* -> DARIOTENU | DARIOTEN
name_de:*DARIOTENU* -> DARIOTENU
-

As you can see for name_de:*DARIOTE* and name_de:*DARIOTEN* not all documents 
are found that I expected to be found.
What are the mechanisms leading to this behavior?

I also tried all queries using regular expressions. E. g. name_de:/.*DARIO.*/
This lead to the same results as before.

I look forward to an easy to understand explanation for why text_de behaves the 
way it does.

With kind regards

Dario Viva

Reply via email to