Ok so from what I’m looking at you have a proximity search so the terms have to be within the distance value of each other. In my example, 2, which obviously won’t work since there are three terms. A fuzzy search is based on a single term/token. So you need to add ~2 to each term if that’s what you want. There’s really good Documentation about the difference and why it’s not working as you expected here:
https://examples.javacodegeeks.com/apache-solr-fuzzy-search-example/ Also try to make use of phrase query fields and boosting them, > On Aug 23, 2022, at 11:18 AM, Morten Ernebjerg > <morten.ernebj...@data4life.care> wrote: > > (replying on behalf of my colleague Julius who wrote this question who is > unable to reply for technical reasons) > Hi David, > > Thanks for the reply! I think your question may point to something we > overlooked. We are actually using Solr 8.11 and we want to use fuzzy search > ( > https://solr.apache.org/guide/8_11/the-standard-query-parser.html#fuzzy-searches), > i.e. find words that differ from the query by one or a few characters. Our > understanding was that to get matches that differ by max two chars from > (using separate line to avoid adding confusing quotation marks) > > term-with-hyphens > > we should send the following query (without any quotation marks): > > term-with-hyphens~2 > > Our thinking was that the hyphenated term is one word so there is no need > to quote it. We had a quick try quoting the hyphenated term in the query as > you suggested and it looks like it works (i.e. returns matches). Since the > standard tokenizer splits on hyphens, I'm wondering the unquoted query > somehow gets converted to the *proximity search* query > > "term with hyphens"~2 > > which then fails (though it looks like it should still match > term-with-hyphens). Would be great to understand what is happening. > > Best, > > Morten > > > >> On Tue, 23 Aug 2022 at 16:30, David Hastings <hastings.recurs...@gmail.com> >> wrote: >> >> I’m not certain of course of your tokenizer but shouldn’t it be >> “terms-with-hyphens”~1 >> >> ? Just a syntax thing that may not have translated over email but curious >> >> On Tue, Aug 23, 2022 at 10:12 AM Julian Hugo <julian.h...@data4life.care> >> wrote: >> >>> Hello, >>> >>> I am getting peculiar results when querying for a term containing hyphens >>> and add fuzzy search >>> < >>> >> https://solr.apache.org/guide/6_6/the-standard-query-parser.html#TheStandardQueryParser-FuzzySearches >>>> >>> . >>> >>> I have indexed two items (1) "term-with-hyphens" and (2) "term with >>> hyphens". When I query ("q") for "term-with-hyphens" or "term with >> hyphens" >>> both items are returned as expected. The same is the case for escaped >>> hyphens "term\-with\-hyphens". >>> >>> The problem: When I add the fuzzy search parameter (i.e., >>> "term-with-hyphens~1" or "term\-with\-hyphens~1"). I get zero results >> back. >>> >>> I struggle to understand the results, or how to solve this problem. My >>> intuition tells me that adding a fuzzy search parameter should surely >>> increase the size of the set of results. I am happy for any help on this! >>> >>> Our current setup is using the "Extended DisMax Query Parser" >>> <https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html >>> >>> however we observe the same behaviour using the "Standard Query Parser >>> <https://solr.apache.org/guide/6_6/the-standard-query-parser.html>". We >>> are >>> using the "Standard Tokenizer >>> < >>> >> https://solr.apache.org/guide/6_6/tokenizers.html#Tokenizers-StandardTokenizer >>>> ", >>> which splits at hyphens. Does this relate to this problem? >>> >>> Thank you! >>> >>> -- >>> >>> *Julian Hugo* >>> >>> Working Student >>> Backend Development >>> >>> (he/his) >>> >>> >>> julian.h...@data4life.care >>> >>> >>> D4L data4life gGmbH >>> Charlottenstraße 109 >>> 14467 Potsdam, Germany >>> >>> www.data4life.care >>> >>> >>> Amtsgericht Potsdam, HRB 30667 >>> >>> Managing Director: Christian-Cornelius Weiß >>> >>> >>> We are Data4Life. We've been certified by the German Federal Office for >>> Information Security (BSI) in accordance with ISO 27001 on the basis of >>> "IT-Grundschutz". >>> >>> >>> Diversity is the driving force behind our work towards a society where >>> digital health improves quality of life for everyone. >>> Data4Life warmly welcomes applicants from the LGBTQI+ community, people >>> with a migration background, People of Color, and individuals with >>> disabilities or chronic illnesses to the team. >>> >>> >>> Climate neutral since 2019 <https://wtca.lfca.earth/e/data4life> >>> >> > > > -- > > *Morten Ernebjerg, Ph.D.* > > Senior Developer > > > morten.ernebj...@data4life.care > > D4L data4life gGmbH > > Charlottenstraße 109 > > 14467 Potsdam, Germany > > www.data4life.care > > Amtsgericht Potsdam, HRB 30667 > > Managing Director: Christian-Cornelius Weiß > > > We are Data4Life. We've been certified by the German Federal Office for > Information Security (BSI) in accordance with ISO 27001 on the basis of > "IT-Grundschutz". > > > Climate neutral since 2019 <https://wtca.lfca.earth/e/data4life>