Hello Alain,

If I was going to implement queries on deeply nested documents, I can see
quite a few examples here :
https://solr.apache.org/guide/8_8/searching-nested-documents.html#parent-query-parser


In a hierarchy like below the following query with a criteria on
_nest_path_ suppose to retrieve parent documents (Products) based on
attributes of Manual: * q={!parent which="*:*
-_nest_path_:*"}(+_nest_path_:\/skus\/manuals +pages_i:1)*
Product
     SKU
        manuals

In the "fl" parameter then you would probably need to initiate *[child] -
ChildDocTransformerFactory* to retrieve the hierarchy with essentially
repeating the query chain to avoid revealing child documents that are not
matching with parameters sent in "q".

Sorry if that comes off as clumsy or trivial, but in our implementation, we
do not index documents in a Parent / Child way. In order to generate nested
JSON responses, we use [subquery] document transformer , that does not go
with BlockJoin Parsers, so when it comes to filtering we rely on some
denormalized set of attributes available for all objects (not in RDBMS
source system but in SOLR), alternatively at query time the hierarchy would
change say Steps -> Interactions -> Customers (to use your analogy) if
filtering needed for (a leaf) object specific attribute.

Norbert

On Tue, 4 May 2021 at 21:01, Timothy Potter <thelabd...@gmail.com> wrote:

> Did you try routing all docs for the same customer ID to the same
> shard (not block join), just docs of different types in the same shard
> based on customer ID and then using Joins?
>
> On Tue, May 4, 2021 at 4:06 AM Alain Rogister <alain.rogis...@gmail.com>
> wrote:
> >
> > It sounds like nobody knows. Just one last try. Up !
> >
> > Tx
> >
> > On 2021/04/21 10:08:16, Alain Rogister <alain.rogis...@gmail.com> wrote:
> > > Hi,
> > >
> > > Imagine that you have been tasked with designing a Solr-based system
> that indexes and queries data aggregated from several relational databases.
> All the data items are ultimately related to a common object type, so the
> logical data schema looks like :
> > >
> > > Customer
> > >      Profile
> > >      Events
> > >      Interactions
> > >            Steps
> > >      ActionPlans
> > >            Objectives
> > >                     Actions
> > > (and quite a few more, with up to 3 or 4 levels of nesting)
> > >
> > > Each of these has a number of attributes. What you need to do is the
> Solr equivalent of a SQL query that would find all Customers where
> Profile.item1='...' and there is some Event where event.item1='..' and
> event.item2='..' and there is some ActionPlan that has an Objective where
> Objective.item3='..' and this Objective has an Action where
> Action.item1='..' and Action.item2='..'
> > >
> > > This is fairly trivial in a relational DB. But is it achievable in
> Solr ? Needless to say, there is no way a flat schema can work here. So I
> have assumed that nested documents were the only way to go. I have created
> a schema that includes these prerequisites, where "docType" is meant to
> contain the document type i.e. Profile, Event, Interaction etc :
> > >
> > >  <field name="_version_" type="plong" indexed="false" stored="false"/>
> > >  <field name="_nest_path_" type="_nest_path_" indexed="true"
> stored="true" docValues="true"/>
> > >  <fieldType name="_nest_path_" class="solr.NestPathField" />
> > >  <field name="_nest_parent_" type="string" indexed="true"
> stored="true" docValues="true"/>
> > >  <field name="_root_" type="string" indexed="true" stored="true"
> docValues="true" />
> > >  <field name="docType" type="string" indexed="true" stored="true"
> docValues="true" required="true"/>
> > >  <field name="id" type="string" indexed="true" stored="true"
> docValues="true"/>
> > >  <uniqueKey>id</uniqueKey>
> > >
> > > and I have used naming conventions on the other fields to reflect the
> structure, e.g.
> > >
> > > <field name="customerId" type="string" indexed="true" stored="true"
> docValues="true"/>
> > > <field name="customerName" type="string" indexed="true" stored="true"
> docValues="true"/>
> > > <field name="interaction.code" type="string" indexed="true"
> stored="true" docValues="true"/>
> > > <field name="interaction.result" type="string" indexed="true"
> stored="true" docValues="true"/>
> > > <field name="step.code" type="string" indexed="true" stored="true"
> docValues="true"/>
> > > <field name="step.result" type="string" indexed="true" stored="true"
> docValues="true"/>
> > > <field name="event.type" type="string" indexed="true" stored="true"
> docValues="true"/>
> > > <field name="event.date" type="pdate" indexed="true" stored="true"
> docValues="true"/>
> > >
> > > I have been able to index my documents correctly, i.e. retrieving a
> Customer by id with fl=*,[child limit=999] returns the customer and all
> dependent objects in the correct hierarchy.
> > >
> > > I have also been able to express queries on two levels, such as: find
> all Customers that have an Interaction with code='..' and result='..' and
> that have an Event with type='..' and date before '...'. The syntax looks
> like this, using two filter queries and block joins :
> > >
> > > q= docType:Customer AND (<simple filtering clauses>)
> > > fq= {!parent which=docType:Customer} +docType:Interaction
> +interaction.code:C01 +interaction.result:COMPLETE
> > > fq= {!parent which=docType:Customer} +docType:Event
> +event.type:EVTTYPE1 +event.date:[NOW-1YEAR TO NOW]
> > >
> > > That seems to return correct results. However, I can't quite figure
> out the proper way to take it one level deeper e.g. add filtering
> conditions on the Steps of the Interactions selected by the first filter
> query.
> > >
> > > For the record, I am able to do so using streaming expressions, but
> these bring their own complexities so I am trying to find out first if
> something simpler can save me.
> > >
> > > Thanks !
> > >
> > >
>

Reply via email to