Did you try routing all docs for the same customer ID to the same shard (not block join), just docs of different types in the same shard based on customer ID and then using Joins?
On Tue, May 4, 2021 at 4:06 AM Alain Rogister <alain.rogis...@gmail.com> wrote: > > It sounds like nobody knows. Just one last try. Up ! > > Tx > > On 2021/04/21 10:08:16, Alain Rogister <alain.rogis...@gmail.com> wrote: > > Hi, > > > > Imagine that you have been tasked with designing a Solr-based system that > > indexes and queries data aggregated from several relational databases. All > > the data items are ultimately related to a common object type, so the > > logical data schema looks like : > > > > Customer > > Profile > > Events > > Interactions > > Steps > > ActionPlans > > Objectives > > Actions > > (and quite a few more, with up to 3 or 4 levels of nesting) > > > > Each of these has a number of attributes. What you need to do is the Solr > > equivalent of a SQL query that would find all Customers where > > Profile.item1='...' and there is some Event where event.item1='..' and > > event.item2='..' and there is some ActionPlan that has an Objective where > > Objective.item3='..' and this Objective has an Action where > > Action.item1='..' and Action.item2='..' > > > > This is fairly trivial in a relational DB. But is it achievable in Solr ? > > Needless to say, there is no way a flat schema can work here. So I have > > assumed that nested documents were the only way to go. I have created a > > schema that includes these prerequisites, where "docType" is meant to > > contain the document type i.e. Profile, Event, Interaction etc : > > > > <field name="_version_" type="plong" indexed="false" stored="false"/> > > <field name="_nest_path_" type="_nest_path_" indexed="true" stored="true" > > docValues="true"/> > > <fieldType name="_nest_path_" class="solr.NestPathField" /> > > <field name="_nest_parent_" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="_root_" type="string" indexed="true" stored="true" > > docValues="true" /> > > <field name="docType" type="string" indexed="true" stored="true" > > docValues="true" required="true"/> > > <field name="id" type="string" indexed="true" stored="true" > > docValues="true"/> > > <uniqueKey>id</uniqueKey> > > > > and I have used naming conventions on the other fields to reflect the > > structure, e.g. > > > > <field name="customerId" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="customerName" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="interaction.code" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="interaction.result" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="step.code" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="step.result" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="event.type" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="event.date" type="pdate" indexed="true" stored="true" > > docValues="true"/> > > > > I have been able to index my documents correctly, i.e. retrieving a > > Customer by id with fl=*,[child limit=999] returns the customer and all > > dependent objects in the correct hierarchy. > > > > I have also been able to express queries on two levels, such as: find all > > Customers that have an Interaction with code='..' and result='..' and that > > have an Event with type='..' and date before '...'. The syntax looks like > > this, using two filter queries and block joins : > > > > q= docType:Customer AND (<simple filtering clauses>) > > fq= {!parent which=docType:Customer} +docType:Interaction > > +interaction.code:C01 +interaction.result:COMPLETE > > fq= {!parent which=docType:Customer} +docType:Event +event.type:EVTTYPE1 > > +event.date:[NOW-1YEAR TO NOW] > > > > That seems to return correct results. However, I can't quite figure out the > > proper way to take it one level deeper e.g. add filtering conditions on the > > Steps of the Interactions selected by the first filter query. > > > > For the record, I am able to do so using streaming expressions, but these > > bring their own complexities so I am trying to find out first if something > > simpler can save me. > > > > Thanks ! > > > >