Thanks a lot Annika for your explanation with links. We'll check that out.
And thanks to Charlie too.
Best regards,
Elisabeth
Le lun. 11 mars 2024 à 10:32, Charlie Hull
a écrit :
> Hi Annika,
>
> Glad you like Bertrand's Haystack presentation! My colleague Daniel
> Wrigley recently wrote an overview blog on synonyms here
>
> https://opensourceconnections.com/blog/2023/03/29/applying-synonyms-types-strategies-tools-and-a-glimpse-into-the-future/
> which links to several other synonym blogs on our site.
>
> Best
>
> Charlie
>
> On 08/03/2024 15:09, Annika Gable wrote:
> > Hi Mikhail, Elisabeth, and Atin,
> >
> > Thank you for your inputs. After working on this issue for days, I
> finally
> > found the main culprits, and I've taken the following steps:
> >
> > 1. Doing synonym expansion only at query-time, not at index time, in
> order
> > to get correct multi-word synonyms.
> > 2. Using WhiteSpaceTokenizer instead of StandardTokenizer or
> > KeywordTokenizer, otherwise, hyphenated words like immuno-oncology will
> > always be split into immuno and oncology, which will not be found in the
> > synonyms definitions!
> > 3. Using the SnowballPorterFilter for stemming only after
> > SynonymGraphFilter => otherwise, immuno-oncology will be stemmed into
> > immuno-oncolog, which does not match the immuno-oncology in the
> synonyms.txt
> > file.
> >
> > I found this presentation
> >
> https://www.slideshare.net/BertrandRigaldies/the-solr-multiterms-synonyms-maze-graphs
> > incredibly helpful, as well as setting up a minimal example of an index
> > containing only 5 documents.
> > It may turn out at a later point that I need to use synonyms at
> index-time
> > for speed, in which case I would only index the single-word synonyms
> there,
> > as suggested by Bertrand Rigaldies in the above presentation.
> >
> > @Mikhail "It's usually tough." I've noticed :)
> > @Elisabeth: Thank you for your suggestion. From the description, it seems
> > like this fixes query-time expansion of synonyms, which the
> > SynonymGraphFilter and the query parser handle correctly in newer Solr
> > versions.
> >
> > Best regards,
> > Annika
> >
> >
> >
> > On Thu, Mar 7, 2024 at 12:56 PM atin janki wrote:
> >
> >> Hi Annika,
> >>
> >> Can you please share a sample query and how it is being expanded.
> >> Also, share how you expect it to be expanded.
> >> It would help to replicate your scenario and understand the problem
> better.
> >>
> >> Best Regards,
> >> Atin Janki
> >>
> >>
> >> On Tue, Mar 5, 2024 at 4:21 PM elisabeth benoit <
> elisaelisael...@gmail.com
> >> wrote:
> >>
> >>> Hello Annika,
> >>>
> >>> For multiwords synonyms, we have been using
> >>>
> >>
> https://checkpoint.url-protection.com/v1/url?o=https%3A//github.com/healthonnet/hon-lucene-synonyms&g=ZWU1ZmU1OWFjYWFmNTdhYw==&h=ZGJiZjQzY2Q3MTYwZDU3MmQ5OGViZDAzMTQ2YzRiZWRmMjUyODNmM2YzZjViMTA2ZjJlZWE2OTQ2NjRiMTdhZQ==&p=YzJlOmltbXVuYWk6YzpnOjhhNTQzYzk1Y2IyYTVmMWRmMjk0NTJmMWQxMDk0NTg4OnYxOnA6VA==
> >> jar, that we just
> >>> rebuild with solr 9.2.1 (a modification is needed, if you ever need
> >>> details).
> >>>
> >>> It overrides edismax query parser and expands multiwords synonyms at
> >> query
> >>> time.
> >>>
> >>> We didnt want to expand synonyms at index time cause we had this
> problem:
> >>>
> >>> in the index: mairie
> >>> synonym: hotel de ville
> >>>
> >>> and then at query time, with query 'hotel', mairie would match.
> >>>
> >>> With hon-lucene, when user asks for "hotel de ville", we match with
> >> mairie,
> >>> but "hotel" doesnt match with mairie.
> >>>
> >>> You might have performance issues with hon-lucene if you have hundred
> of
> >>> synonyms. But it's worth testing.
> >>>
> >>> Best regards,
> >>> Elisabeth
> >>>
> >>> Le lun. 4 mars 2024 à 17:16, Mikhail Khludnev a
> écrit
> >> :
> Hello Annika,
> You may use SolrAdmin/Analysys page, debugQuery and explainOther
> params
> >>> to
> dig into particular case. It's usually tough.
> I've found one clue in the ref guide:
> To get fully correct positional queries when your synonym
> replacements
> >>> are
> multiple tokens, you should instead apply synonyms using this filter
> at
> query time.
> Probably you may start from something simple.
>
> On Mon, Mar 4, 2024 at 5:23 PM Annika Gable
> wrote:
>
> > Hello,
> >
> > I'm using Solr 9.1, and I'm trying to set up synonyms. I managed to
> >> get
> > synonyms to work for single-word synonyms, but not for multiword and
> > hyphenated synonyms.
> >
> > In the final state, I am planning on having a very extensive synonym
> >>> file
> > (hundreds, if not thousands of lines) because I want to always find
> results
> > for all child terms and other synonyms of a given search term. This
> >> is
> why
> > I thought it may make sense to list all synonyms in the index. But
> getting
> > it to work with query-time synonym expansion would also be grea