Hi, Imagine that you have been tasked with designing a Solr-based system that indexes and queries data aggregated from several relational databases. All the data items are ultimately related to a common object type, so the logical data schema looks like :
Customer Profile Events Interactions Steps ActionPlans Objectives Actions (and quite a few more, with up to 3 or 4 levels of nesting) Each of these has a number of attributes. What you need to do is the Solr equivalent of a SQL query that would find all Customers where Profile.item1='...' and there is some Event where event.item1='..' and event.item2='..' and there is some ActionPlan that has an Objective where Objective.item3='..' and this Objective has an Action where Action.item1='..' and Action.item2='..' This is fairly trivial in a relational DB. But is it achievable in Solr ? Needless to say, there is no way a flat schema can work here. So I have assumed that nested documents were the only way to go. I have created a schema that includes these prerequisites, where "docType" is meant to contain the document type i.e. Profile, Event, Interaction etc : <field name="_version_" type="plong" indexed="false" stored="false"/> <field name="_nest_path_" type="_nest_path_" indexed="true" stored="true" docValues="true"/> <fieldType name="_nest_path_" class="solr.NestPathField" /> <field name="_nest_parent_" type="string" indexed="true" stored="true" docValues="true"/> <field name="_root_" type="string" indexed="true" stored="true" docValues="true" /> <field name="docType" type="string" indexed="true" stored="true" docValues="true" required="true"/> <field name="id" type="string" indexed="true" stored="true" docValues="true"/> <uniqueKey>id</uniqueKey> and I have used naming conventions on the other fields to reflect the structure, e.g. <field name="customerId" type="string" indexed="true" stored="true" docValues="true"/> <field name="customerName" type="string" indexed="true" stored="true" docValues="true"/> <field name="interaction.code" type="string" indexed="true" stored="true" docValues="true"/> <field name="interaction.result" type="string" indexed="true" stored="true" docValues="true"/> <field name="step.code" type="string" indexed="true" stored="true" docValues="true"/> <field name="step.result" type="string" indexed="true" stored="true" docValues="true"/> <field name="event.type" type="string" indexed="true" stored="true" docValues="true"/> <field name="event.date" type="pdate" indexed="true" stored="true" docValues="true"/> I have been able to index my documents correctly, i.e. retrieving a Customer by id with fl=*,[child limit=999] returns the customer and all dependent objects in the correct hierarchy. I have also been able to express queries on two levels, such as: find all Customers that have an Interaction with code='..' and result='..' and that have an Event with type='..' and date before '...'. The syntax looks like this, using two filter queries and block joins : q= docType:Customer AND (<simple filtering clauses>) fq= {!parent which=docType:Customer} +docType:Interaction +interaction.code:C01 +interaction.result:COMPLETE fq= {!parent which=docType:Customer} +docType:Event +event.type:EVTTYPE1 +event.date:[NOW-1YEAR TO NOW] That seems to return correct results. However, I can't quite figure out the proper way to take it one level deeper e.g. add filtering conditions on the Steps of the Interactions selected by the first filter query. For the record, I am able to do so using streaming expressions, but these bring their own complexities so I am trying to find out first if something simpler can save me. Thanks !