Ah, yes, that seems to be the case. Thanks for the pointer! Also the discussion is enlightening. It looks like there hasn't been much happening in that bug so I'll need to consider any other options.

--Ere

Jan Høydahl kirjoitti 29.4.2021 klo 11.48:
I think you are hitting this bug 
https://issues.apache.org/jira/browse/SOLR-12779

Jan

29. apr. 2021 kl. 08:51 skrev Ere Maijala <ere.maij...@helsinki.fi>:

Hello Markus,

Thanks for the reply. I'm not sure I understand. The docs state the following:

"The default value of mm is 0% (all clauses optional), unless q.op is specified as 
"AND", in which case mm defaults to 100% (all clauses required)."
(https://solr.apache.org/guide/8_8/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter)

And obviously it has effect. You can also replace q.op=AND with mm=100%25 in my 
examples with the same results. The multi-word synonym makes the query 
explained by debugQuery=true seem wrong to me in that it requires all terms to 
match in the same field, whereas normally the match can be found in any of the 
fields listed in qf. For example this is the query from my first example:

+(+DisjunctionMaxQuery((name:corsair | manu:corsair | cat:corsair)) 
+DisjunctionMaxQuery((name:microsystems | manu:microsystems | 
cat:microsystems)) +DisjunctionMaxQuery((name:memory | manu:memory | 
cat:memory)))

Using the synonym instead of `corsair microsystems` produces this:

+(+((+name:corsair +name:microsystems +name:memory) | (+manu:corsair 
+manu:microsystems +manu:memory) | (+cat:cmi +cat:memory)))

We don't use stopwords. mm.autoRelax does not make a difference here.

Best,
Ere

Markus Jelsma kirjoitti 28.4.2021 klo 16.20:
Hello Ere,
The q.op parameter is not a dismax parameter. instead i think you are being
bitten bij de mm parameter [1] which by default is 100%, meaning all terms
must match. Multi word synonym handing and mm are not a very intuitive
match, and can lead to crazy problems. Also beware of mm and stopword
handling and check out mm.autoRelax [2]. But it is best not to use
stopwords at all.
Check it out,
Markus
[1] https://solr.apache.org/guide/6_6/the-dismax-query-parser.html
[2] https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html
Op wo 28 apr. 2021 om 15:02 schreef Ere Maijala <ere.maij...@helsinki.fi>:
Hi,

Here's one that I can't wrap my head around. The main question is: why
are the search terms treated differently in eDisMax if the query expands
to a multi-word synonym, and there are different field types and q.op=AND?

This gets complicated quickly, so I tried to reproduce the results with
the techproducts example:

1. Start with vanilla Solr 8.8.2

2. echo "cor => Corsair" >>
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt

4. echo "cmi => Corsair Microsystems" >>
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt

4. bin/solr start -e techproducts


Now, a basic query that works fine produces 2 results:


http://localhost:8983/solr/techproducts/select?q=corsair+microsystems+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND

But if I use the synonym, I don't get any results:


http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND

If I leave cat field out, however, I get 2 results:


http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu&q.op=AND

Also if leave q.op out and add AND between the terms, I get 2 results
even with the cat field:


http://localhost:8983/solr/techproducts/select?q=cmi+AND+memory&debugQuery=true&defType=edismax&qf=name+manu+cat

The single-word synonym works just fine:


http://localhost:8983/solr/techproducts/select?q=cor+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND


Can anyone shine a light on what's happening here?

Additional notes:

1. This is a simplified example, and the real-world case is much more
complicated. It has our custom class create the synonyms for compound
words in Finnish, and the queries come from users.

2. As far as I can see mm doesn't affect the results in any meaningful
way, but I just might be doing something wrong.

3. I included the debugQuery parameter so that it's easy to see how
different the queries become.

Best Regards,
Ere

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


--
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Reply via email to