Read this before going deeper
https://lucidworks.com/post/solr-and-joins/

Good luck

On Wed, 21 Apr 2021, 13:55 Alain Rogister, <alain.rogis...@gmail.com> wrote:

> Hi,
>
> Imagine that you have been tasked with designing a Solr-based system that
> indexes and queries data aggregated from several relational databases. All
> the data items are ultimately related to a common object type, so the
> logical data schema looks like :
>
> Customer
>      Profile
>      Events
>      Interactions
>            Steps
>      ActionPlans
>            Objectives
>                     Actions
> (and quite a few more, with up to 3 or 4 levels of nesting)
>
> Each of these has a number of attributes. What you need to do is the Solr
> equivalent of a SQL query that would find all Customers where
> Profile.item1='...' and there is some Event where event.item1='..' and
> event.item2='..' and there is some ActionPlan that has an Objective where
> Objective.item3='..' and this Objective has an Action where
> Action.item1='..' and Action.item2='..'
>
> This is fairly trivial in a relational DB. But is it achievable in Solr ?
> Needless to say, there is no way a flat schema can work here. So I have
> assumed that nested documents were the only way to go. I have created a
> schema that includes these prerequisites, where "docType" is meant to
> contain the document type i.e. Profile, Event, Interaction etc :
>
>  <field name="_version_" type="plong" indexed="false" stored="false"/>
>  <field name="_nest_path_" type="_nest_path_" indexed="true" stored="true"
> docValues="true"/>
>  <fieldType name="_nest_path_" class="solr.NestPathField" />
>  <field name="_nest_parent_" type="string" indexed="true" stored="true"
> docValues="true"/>
>  <field name="_root_" type="string" indexed="true" stored="true"
> docValues="true" />
>  <field name="docType" type="string" indexed="true" stored="true"
> docValues="true" required="true"/>
>  <field name="id" type="string" indexed="true" stored="true"
> docValues="true"/>
>  <uniqueKey>id</uniqueKey>
>
> and I have used naming conventions on the other fields to reflect the
> structure, e.g.
>
> <field name="customerId" type="string" indexed="true" stored="true"
> docValues="true"/>
> <field name="customerName" type="string" indexed="true" stored="true"
> docValues="true"/>
> <field name="interaction.code" type="string" indexed="true" stored="true"
> docValues="true"/>
> <field name="interaction.result" type="string" indexed="true"
> stored="true" docValues="true"/>
> <field name="step.code" type="string" indexed="true" stored="true"
> docValues="true"/>
> <field name="step.result" type="string" indexed="true" stored="true"
> docValues="true"/>
> <field name="event.type" type="string" indexed="true" stored="true"
> docValues="true"/>
> <field name="event.date" type="pdate" indexed="true" stored="true"
> docValues="true"/>
>
> I have been able to index my documents correctly, i.e. retrieving a
> Customer by id with fl=*,[child limit=999] returns the customer and all
> dependent objects in the correct hierarchy.
>
> I have also been able to express queries on two levels, such as: find all
> Customers that have an Interaction with code='..' and result='..' and that
> have an Event with type='..' and date before '...'. The syntax looks like
> this, using two filter queries and block joins :
>
> q= docType:Customer AND (<simple filtering clauses>)
> fq= {!parent which=docType:Customer} +docType:Interaction
> +interaction.code:C01 +interaction.result:COMPLETE
> fq= {!parent which=docType:Customer} +docType:Event +event.type:EVTTYPE1
> +event.date:[NOW-1YEAR TO NOW]
>
> That seems to return correct results. However, I can't quite figure out
> the proper way to take it one level deeper e.g. add filtering conditions on
> the Steps of the Interactions selected by the first filter query.
>
> For the record, I am able to do so using streaming expressions, but these
> bring their own complexities so I am trying to find out first if something
> simpler can save me.
>
> Thanks !
>
>

Reply via email to