Fyi, I am finally getting back to this. I apologize for the delay.
I am going to try using the ‘method=topLevelDV’ option to see if that makes a difference. I will run same tests used below, and follow up with results. As far as more details about this scenario: - Per the ‘user query’. Some of them are quite simple, edismax, q=Maricopa county ethel - from a content point of view, updates are not happening very frequently. Typically get batches of updates spread out over the course of the day. - not quite sure what you are asking for per the 'collection definitions'. The main collection is about 27 million docs, across 96 shards, 2 replicas. The fromIndex 'join' collection is quite small...about 80k docs, single shard, but replicated across the 96 shards. - in the table below are the qtimes, response times, run both with/without using the ‘join’. Also have resultCount, for reference. - it is a small test sample iof 12 queries, single-threaded, - Note, the qtimes…on average, for this small query set, increases about 40% with the join search_qtime - no join responseTime - no join search_qtime - with join responseTime - with join resultCount 1748 3179 2834 4292 471894 1557 2865 1794 3108 332 929 2278 1261 2654 541282 813 2107 1036 2322 15347 413 1730 539 1838 42 388 1725 678 2027 313 1095 2481 1453 2821 435627 829 2263 1310 2739 299 838 2103 1081 2358 86049 1236 2610 1911 3283 77881 950 2274 1313 2661 15160 763 2066 885 2184 738 What is most concerning is the cpu increase that we see in Solr. Here is a more ‘concurrent' test, at about 12 qps, but it is not at a 'full' load...maybe 50%. This test 'held up', meaning we did not get into any trouble. Hope these images comes thru...but, here is a cpu profile for a 1 hour test with no 'join' being used, [image: image.png] And, here is the same 1 hour test, using the 'join', run twice. Not the difference in 'scale' of cpu of these 2 tests vs. the one above, from a 'cores' point of view: [image: image.png] Like I said, I'll run these same tests with the ‘method=topLevelDV’, and see if it changes behavior. Thx Ron Haines On Thu, May 25, 2023 at 4:29 PM Mikhail Khludnev <m...@apache.org> wrote: > Ron, how often both indices are updated? Presumably if they are static, > filter cache may help. > It's worth making sure that the app gives a chance to filter cache.; > To better understand the problem it is worth taking a few treadumps under > load: a deep stack gives a clue for hotspot (or just take a sampling > profile). Once we know the hot spot we can think about a workaround. > https://issues.apache.org/jira/browse/SOLR-16717 about sharding > "fromIndex" > https://issues.apache.org/jira/browse/SOLR-16242 about keeping "local/to" > index cache when fromIndex is updated. > > On Thu, May 25, 2023 at 5:01 PM Andy Lester <a...@petdance.com> wrote: > > > > > > > > On May 25, 2023, at 7:51 AM, Ron Haines <mickr...@gmail.com> wrote: > > > > > > So, when this feature is enabled, this negative &fq gets added: > > > -{!join fromIndex=primary_rollup from=group_id_mv to=group_member_id > > > score=none}${q} > > > > > > Can we see collection definitions of both the source collection and the > > join? Also, a sample query, not just the one parameter? Also, how often > are > > either of these collections updated? One thing that killed off an entire > > project that we were doing was that the join table was getting updated > > about once a minute, and this destroyed all our caching, and made the > > queries we wanted to do unusable. > > > > > > Thanks, > > Andy > > > > -- > Sincerely yours > Mikhail Khludnev >