Ah, yes, that seems to be the case. Thanks for the pointer! Also the
discussion is enlightening. It looks like there hasn't been much
happening in that bug so I'll need to consider any other options.
--Ere
Jan Høydahl kirjoitti 29.4.2021 klo 11.48:
I think you are hitting this bug
https://issues.apache.org/jira/browse/SOLR-12779
Jan
29. apr. 2021 kl. 08:51 skrev Ere Maijala <ere.maij...@helsinki.fi>:
Hello Markus,
Thanks for the reply. I'm not sure I understand. The docs state the
following:
"The default value of mm is 0% (all clauses optional), unless q.op is
specified as "AND", in which case mm defaults to 100% (all clauses
required)."
(
https://solr.apache.org/guide/8_8/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter
)
And obviously it has effect. You can also replace q.op=AND with
mm=100%25 in my examples with the same results. The multi-word synonym
makes the query explained by debugQuery=true seem wrong to me in that it
requires all terms to match in the same field, whereas normally the match
can be found in any of the fields listed in qf. For example this is the
query from my first example:
+(+DisjunctionMaxQuery((name:corsair | manu:corsair | cat:corsair))
+DisjunctionMaxQuery((name:microsystems | manu:microsystems |
cat:microsystems)) +DisjunctionMaxQuery((name:memory | manu:memory |
cat:memory)))
Using the synonym instead of `corsair microsystems` produces this:
+(+((+name:corsair +name:microsystems +name:memory) | (+manu:corsair
+manu:microsystems +manu:memory) | (+cat:cmi +cat:memory)))
We don't use stopwords. mm.autoRelax does not make a difference here.
Best,
Ere
Markus Jelsma kirjoitti 28.4.2021 klo 16.20:
Hello Ere,
The q.op parameter is not a dismax parameter. instead i think you are
being
bitten bij de mm parameter [1] which by default is 100%, meaning all
terms
must match. Multi word synonym handing and mm are not a very intuitive
match, and can lead to crazy problems. Also beware of mm and stopword
handling and check out mm.autoRelax [2]. But it is best not to use
stopwords at all.
Check it out,
Markus
[1] https://solr.apache.org/guide/6_6/the-dismax-query-parser.html
[2]
https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html
Op wo 28 apr. 2021 om 15:02 schreef Ere Maijala <
ere.maij...@helsinki.fi>:
Hi,
Here's one that I can't wrap my head around. The main question is: why
are the search terms treated differently in eDisMax if the query
expands
to a multi-word synonym, and there are different field types and
q.op=AND?
This gets complicated quickly, so I tried to reproduce the results
with
the techproducts example:
1. Start with vanilla Solr 8.8.2
2. echo "cor => Corsair" >>
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt
4. echo "cmi => Corsair Microsystems" >>
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt
4. bin/solr start -e techproducts
Now, a basic query that works fine produces 2 results:
http://localhost:8983/solr/techproducts/select?q=corsair+microsystems+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
But if I use the synonym, I don't get any results:
http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
If I leave cat field out, however, I get 2 results:
http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu&q.op=AND
Also if leave q.op out and add AND between the terms, I get 2 results
even with the cat field:
http://localhost:8983/solr/techproducts/select?q=cmi+AND+memory&debugQuery=true&defType=edismax&qf=name+manu+cat
The single-word synonym works just fine:
http://localhost:8983/solr/techproducts/select?q=cor+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
Can anyone shine a light on what's happening here?
Additional notes:
1. This is a simplified example, and the real-world case is much more
complicated. It has our custom class create the synonyms for compound
words in Finnish, and the queries come from users.
2. As far as I can see mm doesn't affect the results in any meaningful
way, but I just might be doing something wrong.
3. I included the debugQuery parameter so that it's easy to see how
different the queries become.
Best Regards,
Ere
--
Ere Maijala
Kansalliskirjasto / The National Library of Finland
--
Ere Maijala
Kansalliskirjasto / The National Library of Finland
--
Ere Maijala
Kansalliskirjasto / The National Library of Finland