Hi, Thanks, I knew about that one, but note the date : 2012. A lot has changed in the meantime, specifically block joins and improved nested documents support. But I wonder about what can be expressed, even if we leave performance issues aside for another post. I can't find a comprehensive example matching my own anywhere. Let's hope some brave soul steps in !
On 2021/04/22 19:54:38, Eran Buchnick <buchni...@gmail.com> wrote: > Read this before going deeper > https://lucidworks.com/post/solr-and-joins/ > > Good luck > > On Wed, 21 Apr 2021, 13:55 Alain Rogister, <alain.rogis...@gmail.com> wrote: > > > Hi, > > > > Imagine that you have been tasked with designing a Solr-based system that > > indexes and queries data aggregated from several relational databases. All > > the data items are ultimately related to a common object type, so the > > logical data schema looks like : > > > > Customer > > Profile > > Events > > Interactions > > Steps > > ActionPlans > > Objectives > > Actions > > (and quite a few more, with up to 3 or 4 levels of nesting) > > > > Each of these has a number of attributes. What you need to do is the Solr > > equivalent of a SQL query that would find all Customers where > > Profile.item1='...' and there is some Event where event.item1='..' and > > event.item2='..' and there is some ActionPlan that has an Objective where > > Objective.item3='..' and this Objective has an Action where > > Action.item1='..' and Action.item2='..' > > > > This is fairly trivial in a relational DB. But is it achievable in Solr ? > > Needless to say, there is no way a flat schema can work here. So I have > > assumed that nested documents were the only way to go. I have created a > > schema that includes these prerequisites, where "docType" is meant to > > contain the document type i.e. Profile, Event, Interaction etc : > > > > <field name="_version_" type="plong" indexed="false" stored="false"/> > > <field name="_nest_path_" type="_nest_path_" indexed="true" stored="true" > > docValues="true"/> > > <fieldType name="_nest_path_" class="solr.NestPathField" /> > > <field name="_nest_parent_" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="_root_" type="string" indexed="true" stored="true" > > docValues="true" /> > > <field name="docType" type="string" indexed="true" stored="true" > > docValues="true" required="true"/> > > <field name="id" type="string" indexed="true" stored="true" > > docValues="true"/> > > <uniqueKey>id</uniqueKey> > > > > and I have used naming conventions on the other fields to reflect the > > structure, e.g. > > > > <field name="customerId" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="customerName" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="interaction.code" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="interaction.result" type="string" indexed="true" > > stored="true" docValues="true"/> > > <field name="step.code" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="step.result" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="event.type" type="string" indexed="true" stored="true" > > docValues="true"/> > > <field name="event.date" type="pdate" indexed="true" stored="true" > > docValues="true"/> > > > > I have been able to index my documents correctly, i.e. retrieving a > > Customer by id with fl=*,[child limit=999] returns the customer and all > > dependent objects in the correct hierarchy. > > > > I have also been able to express queries on two levels, such as: find all > > Customers that have an Interaction with code='..' and result='..' and that > > have an Event with type='..' and date before '...'. The syntax looks like > > this, using two filter queries and block joins : > > > > q= docType:Customer AND (<simple filtering clauses>) > > fq= {!parent which=docType:Customer} +docType:Interaction > > +interaction.code:C01 +interaction.result:COMPLETE > > fq= {!parent which=docType:Customer} +docType:Event +event.type:EVTTYPE1 > > +event.date:[NOW-1YEAR TO NOW] > > > > That seems to return correct results. However, I can't quite figure out > > the proper way to take it one level deeper e.g. add filtering conditions on > > the Steps of the Interactions selected by the first filter query. > > > > For the record, I am able to do so using streaming expressions, but these > > bring their own complexities so I am trying to find out first if something > > simpler can save me. > > > > Thanks ! > > > > >