Hi Isabella,
back in the day I wrote a blog post about nested documents, not strictly
related to pros and cons but can be useful:
https://sease.io/2019/06/apache-solr-childfilter-transformer.html

In terms of pros and cons, exploring the details of nested documents will
surely require a bit of time but I would summarise my considerations:

BLOCK JOIN (Index time Join)
*PROs*

   - enable the ability to map hierarchical relations between documents,
   parent-children but also multi-layered
   - decently fast

*CONs*

   - you need to follow strict indexing rules and index/reindex in blocks
   (parent + descendants)
   - behind the scenes, a nested document is still a Solr document
   - extra care is needed when handling unique ids(see the blog) and
   deletions (no descendant should be left pending with no ancestor)
   - even if faster than the query-time approach, using nested documents
   brings performance implications and add complexities in comparison to
   standard document modelling in Solr

Based on my experience I always spend some time to carefully assess if
nested documents are really necessary and beneficial or if I could solve
the problem using standard flat representation + grouping/collapsing.
Don't get me wrong, nested docs are an all-right feature in Apache Solr and
I used them both in experiments and production solutions in the past, but
they introduce additional complexities and performance considerations that
may not be ideal or worth it.

Regarding Query time join, I'll be brief: it's more flexible because it
doesn't require any particular indexing approach, but much more expensive
in query time and resources.

*Apache Solr versions*
There have been changes over the nested documents implementation over the
years, not massive but some happened:
https://github.com/apache/lucene/labels/module%3Ajoin

SOLR-12768: *Improved nested document support*

*Category*: Solr Standalone Feature

*It is interesting for*: Nested documents/ updates

Enabled in the default schema with the presence of _nest_path_. When this
field is

present, certain things happen automatically. An internal URP is
automatically used to

populate it. The [child] (doc transformer) will return a hierarchy with
relationships; no

params needed. The relationship path is indexed for use in queries (can be
disabled if not

needed). Also, child documents needn't provide a uniqueKey value as Solr
will supply one

automatically by concatenating a path to that of the parent document's key.


SOLR-12638: *Nested Documents Atomic Updates*

*Category*: Solr Standalone Feature

*It is interesting for*: Nested documents/ updates

Partial/Atomic Updates for nested documents. This enables atomic updates
for nested

documents, without the need to supply the whole nested hierarchy (which
would be

overwritten if absent). This is done by fetching the whole document
hierarchy, updating the

specific doc in the path that is to be updated, removing the old document
hierarchy and

indexing the new one with the atomic update merged into it. Also, [child]
Doc Transformer

now works with RealTimeGet.


LUCENE-8701*: Block Join Improvement*

*Category*: Solr Internal Optimisation

*It is interesting for*: Speeding up nested documents search

ToParentBlockJoinQuery now creates a child scorer that disallows skipping
over non-

competitive documents if the score of a parent depends on the score of
multiple children

(avg, max, min). Additionally the score mode `none` that assigns a constant
score to each

parent can early terminate top scores's collection.



Possibly there have been other changes, I remember some stuff in Solr 9.x
by Mikhail, but to list all of them in a nice report I should spend some
time doing the proper homework.

Hope his helps!

Cheers

--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Wed, 31 Jan 2024 at 10:31, Isabella Trevisan
<isabella.trevi...@infocamere.it.invalid> wrote:

> Hi,
> We are studying a solution that takes advantage of nested documents and
> therefore we are looking for information on the pros and cons and
> limitations that this solution offers.
> Furthermore, we wish to understand in which case is better to use nested
> documents or query time joins.
> Further Have there been any evolutions from solr 5 to solr8 or 9 regarding
> this topic?
>
> Thank you
> Isabella Trevisan
>

Reply via email to