(replying on behalf of  my colleague Julius who wrote this question who is
unable to reply for technical reasons)
Hi David,

Thanks for the reply! I think your question may point to something we
overlooked. We are actually using Solr 8.11 and we want to use fuzzy search
(
https://solr.apache.org/guide/8_11/the-standard-query-parser.html#fuzzy-searches),
i.e. find words that differ from the query by one or a few characters. Our
understanding was that to get matches that differ by max two chars from
(using separate line to avoid adding confusing quotation marks)

term-with-hyphens

we should send the following query (without any quotation marks):

term-with-hyphens~2

Our thinking was that the hyphenated term is one word so there is no need
to quote it. We had a quick try quoting the hyphenated term in the query as
you suggested and it looks like it works (i.e. returns matches). Since the
standard tokenizer splits on hyphens, I'm wondering the unquoted query
somehow gets converted to the *proximity search* query

"term with hyphens"~2

which then fails (though it looks like it should still match
term-with-hyphens). Would be great to understand what is happening.

Best,

Morten



On Tue, 23 Aug 2022 at 16:30, David Hastings <hastings.recurs...@gmail.com>
wrote:

> I’m not certain of course of your tokenizer but shouldn’t it be
> “terms-with-hyphens”~1
>
> ? Just a syntax thing that may not have translated over email but curious
>
> On Tue, Aug 23, 2022 at 10:12 AM Julian Hugo <julian.h...@data4life.care>
> wrote:
>
> > Hello,
> >
> > I am getting peculiar results when querying for a term containing hyphens
> > and add fuzzy search
> > <
> >
> https://solr.apache.org/guide/6_6/the-standard-query-parser.html#TheStandardQueryParser-FuzzySearches
> > >
> > .
> >
> > I have indexed two items (1) "term-with-hyphens" and (2) "term with
> > hyphens". When I query ("q") for "term-with-hyphens" or "term with
> hyphens"
> > both items are returned as expected. The same is the case for escaped
> > hyphens "term\-with\-hyphens".
> >
> > The problem: When I add the fuzzy search parameter (i.e.,
> > "term-with-hyphens~1" or "term\-with\-hyphens~1"). I get zero results
> back.
> >
> > I struggle to understand the results, or how to solve this problem. My
> > intuition tells me that adding a fuzzy search parameter should surely
> > increase the size of the set of results. I am happy for any help on this!
> >
> > Our current setup is using the "Extended DisMax Query Parser"
> > <https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html
> >
> > however we observe the same behaviour using the "Standard Query Parser
> > <https://solr.apache.org/guide/6_6/the-standard-query-parser.html>". We
> > are
> > using the "Standard Tokenizer
> > <
> >
> https://solr.apache.org/guide/6_6/tokenizers.html#Tokenizers-StandardTokenizer
> > >",
> > which splits at hyphens. Does this relate to this problem?
> >
> > Thank you!
> >
> > --
> >
> > *Julian Hugo*
> >
> > Working Student
> > Backend Development
> >
> > (he/his)
> >
> >
> > julian.h...@data4life.care
> >
> >
> > D4L data4life gGmbH
> > Charlottenstraße 109
> > 14467 Potsdam, Germany
> >
> > www.data4life.care
> >
> >
> > Amtsgericht Potsdam, HRB 30667
> >
> > Managing Director: Christian-Cornelius Weiß
> >
> >
> > We are Data4Life. We've been certified by the German Federal Office for
> > Information Security (BSI) in accordance with ISO 27001 on the basis of
> > "IT-Grundschutz".
> >
> >
> > Diversity is the driving force behind our work towards a society where
> > digital health improves quality of life for everyone.
> > Data4Life warmly welcomes applicants from the LGBTQI+ community, people
> > with a migration background, People of Color, and individuals with
> > disabilities or chronic illnesses to the team.
> >
> >
> > Climate neutral since 2019 <https://wtca.lfca.earth/e/data4life>
> >
>


-- 

*Morten Ernebjerg, Ph.D.*

Senior Developer


morten.ernebj...@data4life.care

D4L data4life gGmbH

Charlottenstraße 109

14467 Potsdam, Germany

www.data4life.care

Amtsgericht Potsdam, HRB 30667

Managing Director: Christian-Cornelius Weiß


We are Data4Life. We've been certified by the German Federal Office for
Information Security (BSI) in accordance with ISO 27001 on the basis of
"IT-Grundschutz".


Climate neutral since 2019 <https://wtca.lfca.earth/e/data4life>

Reply via email to