yes, we would return 'D'. So, are you asking why not just do the join in the main index? I started that way, then realized that a document that 'belongs' to another doc both need to be on the same shard for the join to work. That's when I moved to the 'fromIndex' approach and created the small 'fromIndex' collection (uner 200k docs), single-sharded, replicated across all of the shards of the main collection.
On Thu, Jun 15, 2023 at 5:57 AM Mikhail Khludnev <m...@apache.org> wrote: > Thanks for the clarification, Ron. > Why the membership is extracted into a separate index? > Join is heavy anyway, but run it cross core is even more heavier. > > Example you give is not really specific. I can implement it via > fq=-group_member_id:* > > Let's extend it > doc# group_id. group_member_id > 1. A. C > 2. B - > 3. C - > 4. D *G* > 5. E B > 6. F. - > 7. G > > So, if a user runs a query that finds docs A,B,C,D,E,F. (not G) > Should it return D? > > > On Thu, Jun 15, 2023 at 6:01 AM Ron Haines <mickr...@gmail.com> wrote: > > > adding more context as to why we are using the 'join'. > > > > We have a collection of documents where all documents have a 'group_id' > > (which is essentially the doc's id). And, some docs have a > > 'group_member_id' that indicates if that doc belongs to a 'group_id'. > For > > example: > > > > doc# group_id. group_member_id > > 1. A. C > > 2. B - > > 3. C - > > 4. D C > > 5. E B > > 6. F. - > > > > So, if a user runs a query that finds docs A,B,C,D,E,F we do not want to > > include any of the documents that belong to any of the group_id's. So, > for > > this search we really want a result count of 3 (docs B, C, F). > > We want to exclude: > > A because it belongs to C > > D because it belongs to C > > E because it belongs to B > > > > This negative 'join' &fq is how we are excluding these docs. Note that a > > document can 'belong' to more than 1 document. So, yes, it does affect > the > > result count, if that was a question. > > > > Thanks for the suggestions. I still have to run the test with the > > 'method=topLevelDv', and I will pursue getting ThreadDumps. Thx. More > to > > come.... > > > > On Wed, Jun 14, 2023 at 4:26 PM Mikhail Khludnev <m...@apache.org> > wrote: > > > > > Note: images are shredded in the mailing list. > > > Well, if we apply heavy operation (join) it's reasonable that it warm > > CPU. > > > It should impact number of results. Does it? > > > Overall, the usage seems non-typical: query looks like role based > access > > > control (or group membership problem), but has dismax as a sub-query. > > Can't > > > docs be remodelled somehow in a more efficient manner? > > > It's worth understanding what keeps CPU busy, usually a few thread > dumps > > > under load gives a useful clue. > > > Also, if "to" side is huge and highly sharded, and "from" is small, and > > > updates are rare, index-time join via {!parent} may work well. Caveat - > > it > > > may be cumbersome.. > > > PS, I suggested two jiras earlier, I don't think they are applicable > > here. > > > > > > On Wed, Jun 14, 2023 at 8:26 PM Ron Haines <mickr...@gmail.com> wrote: > > > > > > > Fyi, I am finally getting back to this. I apologize for the delay. > > > > > > > > > > > > > > > > I am going to try using the ‘method=topLevelDV’ option to see if that > > > > makes a difference. I will run same tests used below, and follow up > > with > > > > results. > > > > > > > > > > > > > > > > As far as more details about this scenario: > > > > > > > > - Per the ‘user query’. Some of them are quite simple, edismax, > > > > q=Maricopa county ethel > > > > - from a content point of view, updates are not happening very > > > > frequently. Typically get batches of updates spread out over the > > > course of > > > > the day. > > > > - not quite sure what you are asking for per the 'collection > > > > definitions'. The main collection is about 27 million docs, > across > > 96 > > > > shards, 2 replicas. The fromIndex 'join' collection is quite > > > small...about > > > > 80k docs, single shard, but replicated across the 96 shards. > > > > - in the table below are the qtimes, response times, run both > > > > with/without using the ‘join’. Also have resultCount, for > > reference. > > > > - it is a small test sample iof 12 queries, single-threaded, > > > > - Note, the qtimes…on average, for this small query set, > > increases > > > > about 40% with the join > > > > > > > > > > > > search_qtime - no join > > > > > > > > responseTime - no join > > > > > > > > search_qtime - with join > > > > > > > > responseTime - with join > > > > > > > > resultCount > > > > > > > > 1748 > > > > > > > > 3179 > > > > > > > > 2834 > > > > > > > > 4292 > > > > > > > > 471894 > > > > > > > > 1557 > > > > > > > > 2865 > > > > > > > > 1794 > > > > > > > > 3108 > > > > > > > > 332 > > > > > > > > 929 > > > > > > > > 2278 > > > > > > > > 1261 > > > > > > > > 2654 > > > > > > > > 541282 > > > > > > > > 813 > > > > > > > > 2107 > > > > > > > > 1036 > > > > > > > > 2322 > > > > > > > > 15347 > > > > > > > > 413 > > > > > > > > 1730 > > > > > > > > 539 > > > > > > > > 1838 > > > > > > > > 42 > > > > > > > > 388 > > > > > > > > 1725 > > > > > > > > 678 > > > > > > > > 2027 > > > > > > > > 313 > > > > > > > > 1095 > > > > > > > > 2481 > > > > > > > > 1453 > > > > > > > > 2821 > > > > > > > > 435627 > > > > > > > > 829 > > > > > > > > 2263 > > > > > > > > 1310 > > > > > > > > 2739 > > > > > > > > 299 > > > > > > > > 838 > > > > > > > > 2103 > > > > > > > > 1081 > > > > > > > > 2358 > > > > > > > > 86049 > > > > > > > > 1236 > > > > > > > > 2610 > > > > > > > > 1911 > > > > > > > > 3283 > > > > > > > > 77881 > > > > > > > > 950 > > > > > > > > 2274 > > > > > > > > 1313 > > > > > > > > 2661 > > > > > > > > 15160 > > > > > > > > 763 > > > > > > > > 2066 > > > > > > > > 885 > > > > > > > > 2184 > > > > > > > > 738 > > > > > > > > What is most concerning is the cpu increase that we see in Solr. > Here > > > is > > > > a more ‘concurrent' test, at about 12 qps, but it is not at a 'full' > > > > load...maybe 50%. This test 'held up', meaning we did not get into > any > > > > trouble. > > > > > > > > > > > > Hope these images comes thru...but, here is a cpu profile for a 1 > hour > > > > test with no 'join' being used, > > > > > > > > > > > > [image: image.png] > > > > > > > > And, here is the same 1 hour test, using the 'join', run twice. Not > > the > > > > difference in 'scale' of cpu of these 2 tests vs. the one above, > from a > > > > 'cores' point of view: > > > > [image: image.png] > > > > > > > > Like I said, I'll run these same tests with the ‘method=topLevelDV’, > > and > > > > see if it changes behavior. > > > > > > > > Thx > > > > > > > > Ron Haines > > > > > > > > On Thu, May 25, 2023 at 4:29 PM Mikhail Khludnev <m...@apache.org> > > > wrote: > > > > > > > >> Ron, how often both indices are updated? Presumably if they are > > static, > > > >> filter cache may help. > > > >> It's worth making sure that the app gives a chance to filter cache.; > > > >> To better understand the problem it is worth taking a few treadumps > > > under > > > >> load: a deep stack gives a clue for hotspot (or just take a sampling > > > >> profile). Once we know the hot spot we can think about a workaround. > > > >> https://issues.apache.org/jira/browse/SOLR-16717 about sharding > > > >> "fromIndex" > > > >> https://issues.apache.org/jira/browse/SOLR-16242 about keeping > > > "local/to" > > > >> index cache when fromIndex is updated. > > > >> > > > >> On Thu, May 25, 2023 at 5:01 PM Andy Lester <a...@petdance.com> > > wrote: > > > >> > > > >> > > > > >> > > > > >> > > On May 25, 2023, at 7:51 AM, Ron Haines <mickr...@gmail.com> > > wrote: > > > >> > > > > > >> > > So, when this feature is enabled, this negative &fq gets added: > > > >> > > -{!join fromIndex=primary_rollup from=group_id_mv > > to=group_member_id > > > >> > > score=none}${q} > > > >> > > > > >> > > > > >> > Can we see collection definitions of both the source collection > and > > > the > > > >> > join? Also, a sample query, not just the one parameter? Also, how > > > often > > > >> are > > > >> > either of these collections updated? One thing that killed off an > > > entire > > > >> > project that we were doing was that the join table was getting > > updated > > > >> > about once a minute, and this destroyed all our caching, and made > > the > > > >> > queries we wanted to do unusable. > > > >> > > > > >> > > > > >> > Thanks, > > > >> > Andy > > > >> > > > >> > > > >> > > > >> -- > > > >> Sincerely yours > > > >> Mikhail Khludnev > > > >> > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > > > > > > -- > Sincerely yours > Mikhail Khludnev >