Re: Multi-word synonyms not working

2024-03-14 Thread elisabeth benoit
Thanks a lot Annika for your explanation with links. We'll check that out.
And thanks to Charlie too.

Best regards,
Elisabeth

Le lun. 11 mars 2024 à 10:32, Charlie Hull 
a écrit :

> Hi Annika,
>
> Glad you like Bertrand's Haystack presentation! My colleague Daniel
> Wrigley recently wrote an overview blog on synonyms here
>
> https://opensourceconnections.com/blog/2023/03/29/applying-synonyms-types-strategies-tools-and-a-glimpse-into-the-future/
> which links to several other synonym blogs on our site.
>
> Best
>
> Charlie
>
> On 08/03/2024 15:09, Annika Gable wrote:
> > Hi Mikhail, Elisabeth, and Atin,
> >
> > Thank you for your inputs. After working on this issue for days, I
> finally
> > found the main culprits, and I've taken the following steps:
> >
> > 1. Doing synonym expansion only at query-time, not at index time, in
> order
> > to get correct multi-word synonyms.
> > 2. Using WhiteSpaceTokenizer instead of StandardTokenizer or
> > KeywordTokenizer, otherwise, hyphenated words like immuno-oncology will
> > always be split into immuno and oncology, which will not be found in the
> > synonyms definitions!
> > 3. Using the SnowballPorterFilter for stemming only after
> > SynonymGraphFilter => otherwise, immuno-oncology will be stemmed into
> > immuno-oncolog, which does not match the immuno-oncology in the
> synonyms.txt
> >   file.
> >
> > I found this presentation
> >
> https://www.slideshare.net/BertrandRigaldies/the-solr-multiterms-synonyms-maze-graphs
> > incredibly helpful, as well as setting up a minimal example of an index
> > containing only 5 documents.
> > It may turn out at a later point that I need to use synonyms at
> index-time
> > for speed, in which case I would only index the single-word synonyms
> there,
> > as suggested by Bertrand Rigaldies in the above presentation.
> >
> > @Mikhail "It's usually tough." I've noticed :)
> > @Elisabeth: Thank you for your suggestion. From the description, it seems
> > like this fixes query-time expansion of synonyms, which the
> > SynonymGraphFilter and the query parser handle correctly in newer Solr
> > versions.
> >
> > Best regards,
> > Annika
> >
> >
> >
> > On Thu, Mar 7, 2024 at 12:56 PM atin janki  wrote:
> >
> >> Hi Annika,
> >>
> >> Can you please share a sample query and how it is being expanded.
> >> Also, share how you expect it to be expanded.
> >> It would help to replicate your scenario and understand the problem
> better.
> >>
> >> Best Regards,
> >> Atin Janki
> >>
> >>
> >> On Tue, Mar 5, 2024 at 4:21 PM elisabeth benoit <
> elisaelisael...@gmail.com
> >> wrote:
> >>
> >>> Hello Annika,
> >>>
> >>> For multiwords synonyms, we have been using
> >>>
> >>
> https://checkpoint.url-protection.com/v1/url?o=https%3A//github.com/healthonnet/hon-lucene-synonyms&g=ZWU1ZmU1OWFjYWFmNTdhYw==&h=ZGJiZjQzY2Q3MTYwZDU3MmQ5OGViZDAzMTQ2YzRiZWRmMjUyODNmM2YzZjViMTA2ZjJlZWE2OTQ2NjRiMTdhZQ==&p=YzJlOmltbXVuYWk6YzpnOjhhNTQzYzk1Y2IyYTVmMWRmMjk0NTJmMWQxMDk0NTg4OnYxOnA6VA==
> >> jar, that we just
> >>> rebuild with solr 9.2.1 (a modification is needed, if you ever need
> >>> details).
> >>>
> >>> It overrides edismax query parser and expands multiwords synonyms at
> >> query
> >>> time.
> >>>
> >>> We didnt want to expand synonyms at index time cause we had this
> problem:
> >>>
> >>> in the index: mairie
> >>> synonym: hotel de ville
> >>>
> >>> and then at query time, with query 'hotel', mairie would match.
> >>>
> >>> With hon-lucene, when user asks for "hotel de ville", we match with
> >> mairie,
> >>> but "hotel" doesnt match with mairie.
> >>>
> >>> You might have performance issues with hon-lucene if you have hundred
> of
> >>> synonyms. But it's worth testing.
> >>>
> >>> Best regards,
> >>> Elisabeth
> >>>
> >>> Le lun. 4 mars 2024 à 17:16, Mikhail Khludnev  a
> écrit
> >> :
>  Hello Annika,
>  You may use SolrAdmin/Analysys page, debugQuery and explainOther
> params
> >>> to
>  dig into particular case. It's usually tough.
>    I've found one clue in the ref guide:
>    To get fully correct positional queries when your synonym
> replacements
> >>> are
>  multiple tokens, you should instead apply synonyms using this filter
> at
>  query time.
>  Probably you may start from something simple.
> 
>  On Mon, Mar 4, 2024 at 5:23 PM Annika Gable
>   wrote:
> 
> > Hello,
> >
> > I'm using Solr 9.1, and I'm trying to set up synonyms. I managed to
> >> get
> > synonyms to work for single-word synonyms, but not for multiword and
> > hyphenated synonyms.
> >
> > In the final state, I am planning on having a very extensive synonym
> >>> file
> > (hundreds, if not thousands of lines) because I want to always find
>  results
> > for all child terms and other synonyms of a given search term. This
> >> is
>  why
> > I thought it may make sense to list all synonyms in the index. But
>  getting
> > it to work with query-time synonym expansion would also be grea

Re: highlighting performance

2024-03-14 Thread David Smiley
Hi Maria,

Did you read the reference guide?:
https://solr.apache.org/guide/8_11/highlighting.html#schema-options-and-performance-considerations
In particular, you're going to get the best performance with
hl.method=unified and storeOffsetsWithPositions=true.  I wouldn't
bother with termPositions or termOffsets options but termVectors is
great if you have wildcards in your queries; otherwise remove this too
as it'll be needless bloat.

Try setting hl.fragsize=0 which can short-circuit some expensive
sentence fragmentation that I suspect might not be useful if your
values are only 200 characters.

2-3M values -- wow that's something.  Solr has to pull all that data
back per doc that's matched.

Good luck,
  ~ David Smiley

On Mon, Mar 11, 2024 at 5:53 PM Maria Muslea  wrote:
>
> Hi,
>
> I am having trouble with highlighting being very slow. I tried all
> suggestions that I could find online, but it is still much slower than I
> would like it to be.
>
> This happens on multi-valued fields that can sometimes have ~2-3M values of
> ~200 characters each value.
>
> I am using hl.method=unified with SOLR 8.11.3, and the highlighting can
> take 10-20 seconds.
>
> I tried indexing all the fields that I highlight with: termVectors="true"
> termPositions="true" termOffsets="true", but this didn't seem to help with
> hl.method.unified or hl.method=fastVector.
>
> I tried hl.fragAlignRation=0.0 and also hl.snippets=1.
>
> Can you suggest anything else that I could try?
>
> I appreciate your help.
>
> Thank you,
> Maria


Re: Migrating from solr8.11.3 to solr 9.5.0

2024-03-14 Thread David Smiley
Also look at the logs for warnings.  Use of legacy/deprecated stuff is
often logged.

On Wed, Mar 13, 2024 at 5:08 AM Isabella Trevisan
 wrote:
>
> Hi ,
>   I am plannig to migrate from solr 8.11.3 to 9.5.0 in solrcloud
> architecture.
>
> The idea is to create a new line of solrcloud 9.5 where to create the
> collections present in solrcloud 8.11.3 and reindex all the content on the
> new platform.
> My doubt is about the config with which to create the collections in
> solr9.5.
> Can I keep the config present in solr 8.11.3 or do I necessarily have to
> adopt the new one present in the default configs and report in this what we
> defined in solr8.11.3?
> Thank you
> regards
> --
> Isabella Trevisan