Re: join query parser performance

Deepak Goel Thu, 25 May 2023 06:30:25 -0700

Ron,

please post actual figures:


1. Cpu, mem, disk, network utilisation
2. Response times
3. Load
4. Hardware config of server
5. Software config of server


On Thu, 25 May 2023, 18:51 Joel Bernstein, <[email protected]> wrote:

> One thing to understand about the topLevelDv approach is you'll need to
> warm both sides of the join. You can do this by adding a  static warming
> query that facets on 'group_id_mv'  and 'group_member_id' in both
> collections.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, May 25, 2023 at 9:17 AM Joel Bernstein <[email protected]> wrote:
>
> > If you are using a recent version of Solr try adding the parameter
> >
> > method=topLevelDV
> >
> > Let us know how this effects performance in your use case.
> >
> > What matters most here is the number of documents the from side of join
> > matches.
> >
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Thu, May 25, 2023 at 8:52 AM Ron Haines <[email protected]> wrote:
> >
> >> I've been using the 'join' query parser to 'filter out' related
> documents
> >> that should not be part of the result set.  Functionally, it is working
> >> fine.  However, when we throw a 'real' level of customer traffic at it,
> it
> >> pretty much brings Solr to its knees.  CPU increases ALOT.  Close to 3X,
> >> when I enable this feature in our system.  Solr response times shoot up,
> >> and thread counts shoot up.  Before I 'give up' on the join query
> parser,
> >> I
> >> thought I'd seek some advice here.
> >>
> >> So, when this feature is enabled, this negative &fq gets added:
> >> -{!join fromIndex=primary_rollup from=group_id_mv to=group_member_id
> >> score=none}${q}
> >>
> >> The 'local' collection size is about 27 million docs, but the number of
> >> docs that actually contain a 'group_member_id' is only about 125k.  And,
> >> in
> >> the 'fromIndex' collection, there are only 80k documents in that
> >> collection, and they all have the 'group_id_mv' field.  The 'fromIndex'
> >> collection is a single shard, with a replica on each shard of the local
> >> collection.  The local collection only has about 300k docs per shard, at
> >> 96
> >> shards.
> >>
> >> I guess I'm just trying to understand why this appears to be causing
> such
> >> problems for Solr, as the amount of work (the # of documents involved)
> >> seems relatively small.
> >>
> >> I hope I'm missing something...
> >> Thanks for any input.
> >>
> >
>

Re: join query parser performance

Reply via email to