Thanks again for your input, Mikhail. I need to look more into this debugOutput and the way we use `!parent which`.
Can you maybe elaborate on which part of the debug output I should look at in order to say "how is it parsed"? Is that output documented somewhere (other than the solr source code)? Best regards, /Noah -- Noah Torp-Smith (n...@dbc.dk) ________________________________ Fra: Mikhail Khludnev <m...@apache.org> Sendt: 3. januar 2023 19:29 Til: users@solr.apache.org <users@solr.apache.org> Emne: Re: Slowness when searching in child documents. Hold on. > I remove the first part of the filter (the one with parent which), Noah, what's the performance of the child subquery alone? q=pid.material_type:(\"lydbog\" \"artikel\") What's qtime and how is it parsed? On Tue, Jan 3, 2023 at 5:55 PM Noah Torp-Smith <n...@dbc.dk.invalid> wrote: > Thanks for the response. Here is a more hands-on example with measures > that maybe illustrates better: > > We are on solr 9.0.1 > > We send this to solr (it's an equals sign, not a colon after parent which, > sorry for the confusion on my part): > > { > "query": "flotte huse", > "filter": [ > "{!parent which='doc_type:work'}(pid.material_type:(\"lydbog\" > \"artikel\"))", > "doc_type:work" > ], > "fields": "work.workid", > "offset": 0, > "limit": 10, > "params": { > "defType": "edismax", > "qf": [ > "work.creator^100", > "work.creator_fuzzy^0.001", > "work.series^75", > "work.subject_bibdk", > "work.subject_fuzzy^0.001", > "work.title^100", > "work.title_fuzzy^0.001" > ], > "pf": [ > "work.creator^200", > "work.fictive_character", > "work.series^175", > "work.title^1000" > ], > "pf2": [ > "work.creator^200", > "work.fictive_character", > "work.series^175", > "work.title^1000" > ], > "pf3": [ > "work.creator^200", > "work.fictive_character", > "work.series^175", > "work.title^1000" > ], > "mm": "2<80%", > "mm.autoRelax": "true", > "ps": 5, > "ps2": 5, > "ps3": 5 > } > } > > > This fetches 21 workids and it takes more than 20 seconds. If I remove the > first part of the filter (the one with parent which), it fetches 33 workids > in less than 200 miliseconds. I does not matter if I do it with or without > the filtering to material types first (as long as I come up with new > examples so the filter cache is not being used). > > So it does not seem to depend on the number of returned documents. > > Thanks again for your help, it is much appreciated. > > > -- > > Noah Torp-Smith (n...@dbc.dk) > > ________________________________ > Fra: Mikhail Khludnev <m...@apache.org> > Sendt: 3. januar 2023 14:09 > Til: users@solr.apache.org <users@solr.apache.org> > Emne: Re: Slowness when searching in child documents. > > Hello, Noah. > > A few notes: Query time depends on the number of results. When one query is > slower than another, we can find an excuse in a bigger number of enumerated > docs. > Examine how the query is parsed in debugQuery output. There are many tricks > and pitfalls in query parsers. eg I'm not sure why you put colon after > which, whether you put it so into Solr and how it interprets it. > Which version of Solr/Lucene are you running? Some time ago Lucene had no > two phase iteration, and was prone to redundant enumerations. > > > if there is some way to evaluate the search at the work level first, and > then do the filtering for those works that have manifestations matching the > child requirements afterwards? > That's how it's expected to work. You can confirm your hypothesis by > intersecting {!parent ..}.. with work_id:123 whether via fq or +. It should > turn around in a moment. > > So, if everything is right you might run just too large indices and have to > break it into many shards. > > > On Tue, Jan 3, 2023 at 1:12 PM Noah Torp-Smith <n...@dbc.dk.invalid> > wrote: > > > We are facing a performance issuw when searching in child documents. In > > order to explain the issue, I will provide a very simplified excerpt of > our > > data model. > > > > We are making a search engine for libraries. What we want to deliver to > > the users are "works". An example of a work could be Harry Potter and the > > Goblet of fire. Each work can have several manifestations; for example > > there is a book version of the work, an audiobook, and maybe an e-book. > Of > > course, there are properties at the work level (like creator, title, > > subjects, etc) and other properties at the manifestation level (like > > publication year, material type, etc). > > > > We have modelled this with parent documents and child documents in solr, > > and have built a search engine on it. The search engine can search for > > things like creators, titles, and subjects at the work level, but users > > should also be allowed to search for things from a specific year or be > able > > to specify that the are only interested in things that are available as > > e-books. > > > > We have around 28 million works in the solr and 41 million > manifestations, > > indexed as child documents (so many works have only one manifestation). > > > > As long as as the user searches for things at the work level, the > > performance is fine. But as you can imagine, when users search for things > > at the manifestation level, the performance worsens. As an example, if we > > make a search for a creator, the search executes in less than 200 ms and > > results in maybe 30 hits. If we add a clause for a material type (with a > > `{!parent which:'doc_type:work'}materialType:"book"` construction), the > > search takes several seconds. In this case we want the filtering to books > > to be part of the ranking, so putting it in a filter query will pose a > > problem. > > > > I am wondering if there is some way to evaluate the search at the work > > level first, and then do the filtering for those works that have > > manifestations matching the child requirements afterwards? I could try to > > do the search for work-level properties first and only fetch IDs and then > > do the full search with the manifestation-level requirements afterwards > and > > an added filter query with the IDs, but I am wondering if there is a > better > > way to do this. > > > > I have also looked at denormalizing ( > > > https://blog.innoventsolutions.com/innovent-solutions-blog/2018/05/avoid-the-parentchild-trap-tips-and-tricks-for-denormalizing-solr-data.html > ) > > and it helps when doing it for a few child fields. But as said, there are > > more properties in the real model than those I have mentioned here, so > that > > also involves some complications. > > > > Kind regards, > > > > /Noah > > > > > > -- > > > > Noah Torp-Smith (n...@dbc.dk) > > > > > -- > Sincerely yours > Mikhail Khludnev > https://t.me/MUST_SEARCH > A caveat: Cyrillic! > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!