Read this before going deeper https://lucidworks.com/post/solr-and-joins/
Good luck On Wed, 21 Apr 2021, 13:55 Alain Rogister, <alain.rogis...@gmail.com> wrote: > Hi, > > Imagine that you have been tasked with designing a Solr-based system that > indexes and queries data aggregated from several relational databases. All > the data items are ultimately related to a common object type, so the > logical data schema looks like : > > Customer > Profile > Events > Interactions > Steps > ActionPlans > Objectives > Actions > (and quite a few more, with up to 3 or 4 levels of nesting) > > Each of these has a number of attributes. What you need to do is the Solr > equivalent of a SQL query that would find all Customers where > Profile.item1='...' and there is some Event where event.item1='..' and > event.item2='..' and there is some ActionPlan that has an Objective where > Objective.item3='..' and this Objective has an Action where > Action.item1='..' and Action.item2='..' > > This is fairly trivial in a relational DB. But is it achievable in Solr ? > Needless to say, there is no way a flat schema can work here. So I have > assumed that nested documents were the only way to go. I have created a > schema that includes these prerequisites, where "docType" is meant to > contain the document type i.e. Profile, Event, Interaction etc : > > <field name="_version_" type="plong" indexed="false" stored="false"/> > <field name="_nest_path_" type="_nest_path_" indexed="true" stored="true" > docValues="true"/> > <fieldType name="_nest_path_" class="solr.NestPathField" /> > <field name="_nest_parent_" type="string" indexed="true" stored="true" > docValues="true"/> > <field name="_root_" type="string" indexed="true" stored="true" > docValues="true" /> > <field name="docType" type="string" indexed="true" stored="true" > docValues="true" required="true"/> > <field name="id" type="string" indexed="true" stored="true" > docValues="true"/> > <uniqueKey>id</uniqueKey> > > and I have used naming conventions on the other fields to reflect the > structure, e.g. > > <field name="customerId" type="string" indexed="true" stored="true" > docValues="true"/> > <field name="customerName" type="string" indexed="true" stored="true" > docValues="true"/> > <field name="interaction.code" type="string" indexed="true" stored="true" > docValues="true"/> > <field name="interaction.result" type="string" indexed="true" > stored="true" docValues="true"/> > <field name="step.code" type="string" indexed="true" stored="true" > docValues="true"/> > <field name="step.result" type="string" indexed="true" stored="true" > docValues="true"/> > <field name="event.type" type="string" indexed="true" stored="true" > docValues="true"/> > <field name="event.date" type="pdate" indexed="true" stored="true" > docValues="true"/> > > I have been able to index my documents correctly, i.e. retrieving a > Customer by id with fl=*,[child limit=999] returns the customer and all > dependent objects in the correct hierarchy. > > I have also been able to express queries on two levels, such as: find all > Customers that have an Interaction with code='..' and result='..' and that > have an Event with type='..' and date before '...'. The syntax looks like > this, using two filter queries and block joins : > > q= docType:Customer AND (<simple filtering clauses>) > fq= {!parent which=docType:Customer} +docType:Interaction > +interaction.code:C01 +interaction.result:COMPLETE > fq= {!parent which=docType:Customer} +docType:Event +event.type:EVTTYPE1 > +event.date:[NOW-1YEAR TO NOW] > > That seems to return correct results. However, I can't quite figure out > the proper way to take it one level deeper e.g. add filtering conditions on > the Steps of the Interactions selected by the first filter query. > > For the record, I am able to do so using streaming expressions, but these > bring their own complexities so I am trying to find out first if something > simpler can save me. > > Thanks ! > >