Hi Alessandro, Thank you for your detailed answer. As I understand it, nested documents speed up searches on hierarchical documents compared to Query Time Joins andc onsume fewer resources but could create problems during reindexing. Are there any studies to understand if there are limits to the number of children or size of documents within which to avoid having indexing problems?
Thank you Regards, Isabella Il giorno mer 31 gen 2024 alle ore 12:18 Alessandro Benedetti < a.benede...@sease.io> ha scritto: > Hi Isabella, > back in the day I wrote a blog post about nested documents, not strictly > related to pros and cons but can be useful: > https://sease.io/2019/06/apache-solr-childfilter-transformer.html > > In terms of pros and cons, exploring the details of nested documents will > surely require a bit of time but I would summarise my considerations: > > BLOCK JOIN (Index time Join) > *PROs* > > - enable the ability to map hierarchical relations between documents, > parent-children but also multi-layered > - decently fast > > *CONs* > > - you need to follow strict indexing rules and index/reindex in blocks > (parent + descendants) > - behind the scenes, a nested document is still a Solr document > - extra care is needed when handling unique ids(see the blog) and > deletions (no descendant should be left pending with no ancestor) > - even if faster than the query-time approach, using nested documents > brings performance implications and add complexities in comparison to > standard document modelling in Solr > > Based on my experience I always spend some time to carefully assess if > nested documents are really necessary and beneficial or if I could solve > the problem using standard flat representation + grouping/collapsing. > Don't get me wrong, nested docs are an all-right feature in Apache Solr > and I used them both in experiments and production solutions in the past, > but they introduce additional complexities and performance considerations > that may not be ideal or worth it. > > Regarding Query time join, I'll be brief: it's more flexible because it > doesn't require any particular indexing approach, but much more expensive > in query time and resources. > > *Apache Solr versions* > There have been changes over the nested documents implementation over the > years, not massive but some happened: > https://github.com/apache/lucene/labels/module%3Ajoin > > SOLR-12768: *Improved nested document support* > > *Category*: Solr Standalone Feature > > *It is interesting for*: Nested documents/ updates > > Enabled in the default schema with the presence of _nest_path_. When this > field is > > present, certain things happen automatically. An internal URP is > automatically used to > > populate it. The [child] (doc transformer) will return a hierarchy with > relationships; no > > params needed. The relationship path is indexed for use in queries (can be > disabled if not > > needed). Also, child documents needn't provide a uniqueKey value as Solr > will supply one > > automatically by concatenating a path to that of the parent document's key. > > > SOLR-12638: *Nested Documents Atomic Updates* > > *Category*: Solr Standalone Feature > > *It is interesting for*: Nested documents/ updates > > Partial/Atomic Updates for nested documents. This enables atomic updates > for nested > > documents, without the need to supply the whole nested hierarchy (which > would be > > overwritten if absent). This is done by fetching the whole document > hierarchy, updating the > > specific doc in the path that is to be updated, removing the old document > hierarchy and > > indexing the new one with the atomic update merged into it. Also, [child] > Doc Transformer > > now works with RealTimeGet. > > > LUCENE-8701*: Block Join Improvement* > > *Category*: Solr Internal Optimisation > > *It is interesting for*: Speeding up nested documents search > > ToParentBlockJoinQuery now creates a child scorer that disallows skipping > over non- > > competitive documents if the score of a parent depends on the score of > multiple children > > (avg, max, min). Additionally the score mode `none` that assigns a > constant score to each > > parent can early terminate top scores's collection. > > > > Possibly there have been other changes, I remember some stuff in Solr 9.x > by Mikhail, but to list all of them in a nice report I should spend some > time doing the proper homework. > > Hope his helps! > > Cheers > > -------------------------- > *Alessandro Benedetti* > Director @ Sease Ltd. > *Apache Lucene/Solr Committer* > *Apache Solr PMC Member* > > e-mail: a.benede...@sease.io > > > *Sease* - Information Retrieval Applied > Consulting | Training | Open Source > > Website: Sease.io <http://sease.io/> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > <https://twitter.com/seaseltd> | Youtube > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github > <https://github.com/seaseltd> > > > On Wed, 31 Jan 2024 at 10:31, Isabella Trevisan > <isabella.trevi...@infocamere.it.invalid> wrote: > >> Hi, >> We are studying a solution that takes advantage of nested documents and >> therefore we are looking for information on the pros and cons and >> limitations that this solution offers. >> Furthermore, we wish to understand in which case is better to use nested >> documents or query time joins. >> Further Have there been any evolutions from solr 5 to solr8 or 9 regarding >> this topic? >> >> Thank you >> Isabella Trevisan >> >