Re: Terms with hyphens and fuzzy search

2022-08-24 Thread Morten Ernebjerg
Hi David & Markus Thanks for the input! - I think we should now have the tools to work out a solution for this. Best, Morten On Tue, 23 Aug 2022 at 18:19, David Hastings wrote: > And if you want to get really fun, use a natural language/entity > extraction, mix just those values into an index fi

Re: Terms with hyphens and fuzzy search

2022-08-23 Thread David Hastings
And if you want to get really fun, use a natural language/entity extraction, mix just those values into an index field, with stop words killed, and then bring in shingles, up the shingle to about four, and boost it with the pf. I promise you won’t get bored. Your index size will grow but you should

Re: Terms with hyphens and fuzzy search

2022-08-23 Thread Dave
Yea now I think you’re getting the concept. The dash is effectively white space and means nothing, like a period or comma. So it’s now three separate words. And to quote: Once the list of matching documents has been identified using the fq and qf parameters, the pf parameter can be used to "boo

Re: Terms with hyphens and fuzzy search

2022-08-23 Thread Markus Jelsma
It's a while ago but i think to remember that fuzzy queries are not analyzed. That means that you are looking for term-with-hyphens as a single token, with a maximum of 1 edit distance. But because you use an analyzer that splits hyphens, you have no term with a hyphen in your index. If you move t

Re: Terms with hyphens and fuzzy search

2022-08-23 Thread Morten Ernebjerg
Hi again OK, so I think this is starting to make sense, What was confusing us was that we indeed thought of a hyphenated term (like: term-with-hyphens) as just a single term, meaning that fuzzy search should apply as usual. However, if I understand you correctly, it sounds like the correct stateme

Re: Terms with hyphens and fuzzy search

2022-08-23 Thread Dave
Ok so from what I’m looking at you have a proximity search so the terms have to be within the distance value of each other. In my example, 2, which obviously won’t work since there are three terms. A fuzzy search is based on a single term/token. So you need to add ~2 to each term if that’s what

Re: Terms with hyphens and fuzzy search

2022-08-23 Thread Morten Ernebjerg
(replying on behalf of my colleague Julius who wrote this question who is unable to reply for technical reasons) Hi David, Thanks for the reply! I think your question may point to something we overlooked. We are actually using Solr 8.11 and we want to use fuzzy search ( https://solr.apache.org/gu

Re: Terms with hyphens and fuzzy search

2022-08-23 Thread David Hastings
I’m not certain of course of your tokenizer but shouldn’t it be “terms-with-hyphens”~1 ? Just a syntax thing that may not have translated over email but curious On Tue, Aug 23, 2022 at 10:12 AM Julian Hugo wrote: > Hello, > > I am getting peculiar results when querying for a term containing hyp