(replying on behalf of my colleague Julius who wrote this question who is unable to reply for technical reasons) Hi David,
Thanks for the reply! I think your question may point to something we overlooked. We are actually using Solr 8.11 and we want to use fuzzy search ( https://solr.apache.org/guide/8_11/the-standard-query-parser.html#fuzzy-searches), i.e. find words that differ from the query by one or a few characters. Our understanding was that to get matches that differ by max two chars from (using separate line to avoid adding confusing quotation marks) term-with-hyphens we should send the following query (without any quotation marks): term-with-hyphens~2 Our thinking was that the hyphenated term is one word so there is no need to quote it. We had a quick try quoting the hyphenated term in the query as you suggested and it looks like it works (i.e. returns matches). Since the standard tokenizer splits on hyphens, I'm wondering the unquoted query somehow gets converted to the *proximity search* query "term with hyphens"~2 which then fails (though it looks like it should still match term-with-hyphens). Would be great to understand what is happening. Best, Morten On Tue, 23 Aug 2022 at 16:30, David Hastings <hastings.recurs...@gmail.com> wrote: > I’m not certain of course of your tokenizer but shouldn’t it be > “terms-with-hyphens”~1 > > ? Just a syntax thing that may not have translated over email but curious > > On Tue, Aug 23, 2022 at 10:12 AM Julian Hugo <julian.h...@data4life.care> > wrote: > > > Hello, > > > > I am getting peculiar results when querying for a term containing hyphens > > and add fuzzy search > > < > > > https://solr.apache.org/guide/6_6/the-standard-query-parser.html#TheStandardQueryParser-FuzzySearches > > > > > . > > > > I have indexed two items (1) "term-with-hyphens" and (2) "term with > > hyphens". When I query ("q") for "term-with-hyphens" or "term with > hyphens" > > both items are returned as expected. The same is the case for escaped > > hyphens "term\-with\-hyphens". > > > > The problem: When I add the fuzzy search parameter (i.e., > > "term-with-hyphens~1" or "term\-with\-hyphens~1"). I get zero results > back. > > > > I struggle to understand the results, or how to solve this problem. My > > intuition tells me that adding a fuzzy search parameter should surely > > increase the size of the set of results. I am happy for any help on this! > > > > Our current setup is using the "Extended DisMax Query Parser" > > <https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html > > > > however we observe the same behaviour using the "Standard Query Parser > > <https://solr.apache.org/guide/6_6/the-standard-query-parser.html>". We > > are > > using the "Standard Tokenizer > > < > > > https://solr.apache.org/guide/6_6/tokenizers.html#Tokenizers-StandardTokenizer > > >", > > which splits at hyphens. Does this relate to this problem? > > > > Thank you! > > > > -- > > > > *Julian Hugo* > > > > Working Student > > Backend Development > > > > (he/his) > > > > > > julian.h...@data4life.care > > > > > > D4L data4life gGmbH > > Charlottenstraße 109 > > 14467 Potsdam, Germany > > > > www.data4life.care > > > > > > Amtsgericht Potsdam, HRB 30667 > > > > Managing Director: Christian-Cornelius Weiß > > > > > > We are Data4Life. We've been certified by the German Federal Office for > > Information Security (BSI) in accordance with ISO 27001 on the basis of > > "IT-Grundschutz". > > > > > > Diversity is the driving force behind our work towards a society where > > digital health improves quality of life for everyone. > > Data4Life warmly welcomes applicants from the LGBTQI+ community, people > > with a migration background, People of Color, and individuals with > > disabilities or chronic illnesses to the team. > > > > > > Climate neutral since 2019 <https://wtca.lfca.earth/e/data4life> > > > -- *Morten Ernebjerg, Ph.D.* Senior Developer morten.ernebj...@data4life.care D4L data4life gGmbH Charlottenstraße 109 14467 Potsdam, Germany www.data4life.care Amtsgericht Potsdam, HRB 30667 Managing Director: Christian-Cornelius Weiß We are Data4Life. We've been certified by the German Federal Office for Information Security (BSI) in accordance with ISO 27001 on the basis of "IT-Grundschutz". Climate neutral since 2019 <https://wtca.lfca.earth/e/data4life>