Hello dear Solr Team In my documents I use fields with field type text_de. Lately I came across weird behavior with this field type. To reproduce these weird behavior I have set up a local solr verver on my machine. There I could identify the same behavior.
To reproduce it yourself, just set up a local solr server as described here: https://solr.apache.org/guide/solr/latest/deployment-guide/installing-solr.html Start the solr server: bin/solr start Create a simple core: bin/solr create -c dario Now in the browser go to http://localhost:8983/solr/#/dario/schema and add two fields. Field 1: Use name_general for name and text_general for the field type. Leave everything else as default. Field 2: Use name_de for name and text_de for the field type. Leave everything else as default. Then go to http://localhost:8983/solr/#/dario/documents to index some documents. Select CSV for Document Type. In the textfield Document(s) type in the fallowing text: (hyphens mark start and end of text, should be excluded from pasted text) - name_general,name_de DARIO,DARIO DARIOT,DARIOT DARIOTE,DARIOTE DARIOTEN,DARIOTEN DARIOTENU,DARIOTENU - Now go to http://localhost:8983/solr/#/dario/query to search documents. First, I use the query *:* to find all documents. This works as expected. Then I use the query name_general:* to find all documents with the field name_general. This again finds all documents as expected. The same for name_de:* Now I want to find specific documents with the text_general field: Searching for name_general:DARIO finds one document where name_general is exactly DARIO. This is what I expected. The same thing happens for all other names. Now it gets interesting. Searching for text_de leads to all kind of weird results. Listed below are the queries and what is found with this query, and if I expected the result or not: - name_de:DARIO -> DARIO (expected) name_de:DARIOT -> DARIOT (expected), DARIOTE (not expected), DARIOTEN (not expected) name_de:DARIOTE -> DARIOT (not expected), DARIOTE (expected), DARIOTEN (not expected) name_de:DARIOTEN -> DARIOT (not expected), DARIOTE (not expected), DARIOTEN (expected) name_de:DARIOTENU -> DARIOTENU (expected) - This probably has something to do with stemming. But why should DARIOTEN be stemmed to DARIOT? (Sorry for the bad English, as you can see with the usage of text_de my first language is German) DARIOTEN is also not defined in the stop words file or anywhere else. So what is happening here? Now I also tried enclosing the searched word in quotes. E. g. name_de:"DARIO" This did not change the results. So this is one problem that I can not explain, but would very much like to understand. But this is not my only confusion with this field type. What fallows is another problem. Sometimes I want to find documents that contain some text (as opposed to exact matches) So I enclose the name with asterisks. I expect these queries to find all the documents that EITHER exactly match the word OR contain the word. Again, this works as expected with the name_general field. But again the name_de behaves weirdly. I will again list all the queries and the results. This time I will also note the documents that were missing from the results, but I expected them to be included. I will note these documents after the pipe |. As this time no documents are found that should not be found, the expected / not expected marker is omitted. - name_de:*DARIO* -> DARIO, DARIOT, DARIOTE, DARIOTEN, DARIOTENU name_de:*DARIOT* -> DARIOT, DARIOTE, DARIOTEN, DARIOTENU name_de:*DARIOTE* -> DARIOTENU | DARIOTE, DARIOTEN name_de:*DARIOTEN* -> DARIOTENU | DARIOTEN name_de:*DARIOTENU* -> DARIOTENU - As you can see for name_de:*DARIOTE* and name_de:*DARIOTEN* not all documents are found that I expected to be found. What are the mechanisms leading to this behavior? I also tried all queries using regular expressions. E. g. name_de:/.*DARIO.*/ This lead to the same results as before. I look forward to an easy to understand explanation for why text_de behaves the way it does. With kind regards Dario Viva