Fyi, I am finally getting back to this.  I apologize for the delay.


I am going to try using the ‘method=topLevelDV’ option to see if that makes
a difference.  I will run same tests used below, and follow up with results.



As far as more details about this scenario:

   - Per the ‘user query’.  Some of them are quite simple, edismax,
   q=Maricopa county ethel
   - from a content point of view, updates are not happening very
   frequently.  Typically get batches of updates spread out over the course of
   the day.
   - not quite sure what you are asking for per the 'collection
   definitions'.  The main collection is about 27 million docs, across 96
   shards, 2 replicas. The fromIndex 'join' collection is quite small...about
   80k docs, single shard, but replicated across the 96 shards.
   - in the table below are the qtimes, response times, run both
   with/without using the ‘join’.  Also have resultCount, for reference.
   - it is a small test sample iof 12 queries, single-threaded,
      - Note, the qtimes…on average, for this small query set, increases
      about 40% with the join


search_qtime - no join

responseTime - no join

search_qtime - with join

responseTime - with join

resultCount

1748

3179

2834

4292

471894

1557

2865

1794

3108

332

929

2278

1261

2654

541282

813

2107

1036

2322

15347

413

1730

539

1838

42

388

1725

678

2027

313

1095

2481

1453

2821

435627

829

2263

1310

2739

299

838

2103

1081

2358

86049

1236

2610

1911

3283

77881

950

2274

1313

2661

15160

763

2066

885

2184

738

What is most concerning is the cpu increase that we see in Solr.   Here is
a more ‘concurrent' test, at about 12 qps, but it is not at a 'full'
load...maybe 50%.  This test 'held up', meaning we did not get into any
trouble.


Hope these images comes thru...but, here is a cpu profile for a 1 hour test
with no 'join' being used,


[image: image.png]

And, here is the same 1 hour test, using the 'join', run twice.  Not the
difference in 'scale' of cpu of these 2 tests vs. the one above, from a
'cores' point of view:
[image: image.png]

Like I said, I'll run these same tests with the ‘method=topLevelDV’, and
see if it changes behavior.

Thx

Ron Haines

On Thu, May 25, 2023 at 4:29 PM Mikhail Khludnev <m...@apache.org> wrote:

> Ron, how often both indices are updated? Presumably if they are static,
> filter cache may help.
> It's worth making sure that the app gives a chance to filter cache.;
> To better understand the problem it is worth taking a few treadumps under
> load: a deep stack gives a clue for hotspot (or just take a sampling
> profile). Once we know the hot spot we can think about a workaround.
> https://issues.apache.org/jira/browse/SOLR-16717 about sharding
> "fromIndex"
> https://issues.apache.org/jira/browse/SOLR-16242 about keeping "local/to"
> index cache when fromIndex is updated.
>
> On Thu, May 25, 2023 at 5:01 PM Andy Lester <a...@petdance.com> wrote:
>
> >
> >
> > > On May 25, 2023, at 7:51 AM, Ron Haines <mickr...@gmail.com> wrote:
> > >
> > > So, when this feature is enabled, this negative &fq gets added:
> > > -{!join fromIndex=primary_rollup from=group_id_mv to=group_member_id
> > > score=none}${q}
> >
> >
> > Can we see collection definitions of both the source collection and the
> > join? Also, a sample query, not just the one parameter? Also, how often
> are
> > either of these collections updated? One thing that killed off an entire
> > project that we were doing was that the join table was getting updated
> > about once a minute, and this destroyed all our caching, and made the
> > queries we wanted to do unusable.
> >
> >
> > Thanks,
> > Andy
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Reply via email to