Note: images are shredded in the mailing list. Well, if we apply heavy operation (join) it's reasonable that it warm CPU. It should impact number of results. Does it? Overall, the usage seems non-typical: query looks like role based access control (or group membership problem), but has dismax as a sub-query. Can't docs be remodelled somehow in a more efficient manner? It's worth understanding what keeps CPU busy, usually a few thread dumps under load gives a useful clue. Also, if "to" side is huge and highly sharded, and "from" is small, and updates are rare, index-time join via {!parent} may work well. Caveat - it may be cumbersome.. PS, I suggested two jiras earlier, I don't think they are applicable here.
On Wed, Jun 14, 2023 at 8:26 PM Ron Haines <mickr...@gmail.com> wrote: > Fyi, I am finally getting back to this. I apologize for the delay. > > > > I am going to try using the ‘method=topLevelDV’ option to see if that > makes a difference. I will run same tests used below, and follow up with > results. > > > > As far as more details about this scenario: > > - Per the ‘user query’. Some of them are quite simple, edismax, > q=Maricopa county ethel > - from a content point of view, updates are not happening very > frequently. Typically get batches of updates spread out over the course of > the day. > - not quite sure what you are asking for per the 'collection > definitions'. The main collection is about 27 million docs, across 96 > shards, 2 replicas. The fromIndex 'join' collection is quite small...about > 80k docs, single shard, but replicated across the 96 shards. > - in the table below are the qtimes, response times, run both > with/without using the ‘join’. Also have resultCount, for reference. > - it is a small test sample iof 12 queries, single-threaded, > - Note, the qtimes…on average, for this small query set, increases > about 40% with the join > > > search_qtime - no join > > responseTime - no join > > search_qtime - with join > > responseTime - with join > > resultCount > > 1748 > > 3179 > > 2834 > > 4292 > > 471894 > > 1557 > > 2865 > > 1794 > > 3108 > > 332 > > 929 > > 2278 > > 1261 > > 2654 > > 541282 > > 813 > > 2107 > > 1036 > > 2322 > > 15347 > > 413 > > 1730 > > 539 > > 1838 > > 42 > > 388 > > 1725 > > 678 > > 2027 > > 313 > > 1095 > > 2481 > > 1453 > > 2821 > > 435627 > > 829 > > 2263 > > 1310 > > 2739 > > 299 > > 838 > > 2103 > > 1081 > > 2358 > > 86049 > > 1236 > > 2610 > > 1911 > > 3283 > > 77881 > > 950 > > 2274 > > 1313 > > 2661 > > 15160 > > 763 > > 2066 > > 885 > > 2184 > > 738 > > What is most concerning is the cpu increase that we see in Solr. Here is > a more ‘concurrent' test, at about 12 qps, but it is not at a 'full' > load...maybe 50%. This test 'held up', meaning we did not get into any > trouble. > > > Hope these images comes thru...but, here is a cpu profile for a 1 hour > test with no 'join' being used, > > > [image: image.png] > > And, here is the same 1 hour test, using the 'join', run twice. Not the > difference in 'scale' of cpu of these 2 tests vs. the one above, from a > 'cores' point of view: > [image: image.png] > > Like I said, I'll run these same tests with the ‘method=topLevelDV’, and > see if it changes behavior. > > Thx > > Ron Haines > > On Thu, May 25, 2023 at 4:29 PM Mikhail Khludnev <m...@apache.org> wrote: > >> Ron, how often both indices are updated? Presumably if they are static, >> filter cache may help. >> It's worth making sure that the app gives a chance to filter cache.; >> To better understand the problem it is worth taking a few treadumps under >> load: a deep stack gives a clue for hotspot (or just take a sampling >> profile). Once we know the hot spot we can think about a workaround. >> https://issues.apache.org/jira/browse/SOLR-16717 about sharding >> "fromIndex" >> https://issues.apache.org/jira/browse/SOLR-16242 about keeping "local/to" >> index cache when fromIndex is updated. >> >> On Thu, May 25, 2023 at 5:01 PM Andy Lester <a...@petdance.com> wrote: >> >> > >> > >> > > On May 25, 2023, at 7:51 AM, Ron Haines <mickr...@gmail.com> wrote: >> > > >> > > So, when this feature is enabled, this negative &fq gets added: >> > > -{!join fromIndex=primary_rollup from=group_id_mv to=group_member_id >> > > score=none}${q} >> > >> > >> > Can we see collection definitions of both the source collection and the >> > join? Also, a sample query, not just the one parameter? Also, how often >> are >> > either of these collections updated? One thing that killed off an entire >> > project that we were doing was that the join table was getting updated >> > about once a minute, and this destroyed all our caching, and made the >> > queries we wanted to do unusable. >> > >> > >> > Thanks, >> > Andy >> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> > -- Sincerely yours Mikhail Khludnev