Hi,

Imagine that you have been tasked with designing a Solr-based system that 
indexes and queries data aggregated from several relational databases. All the 
data items are ultimately related to a common object type, so the logical data 
schema looks like :

Customer
     Profile
     Events
     Interactions
           Steps
     ActionPlans
           Objectives
                    Actions
(and quite a few more, with up to 3 or 4 levels of nesting)

Each of these has a number of attributes. What you need to do is the Solr 
equivalent of a SQL query that would find all Customers where 
Profile.item1='...' and there is some Event where event.item1='..' and 
event.item2='..' and there is some ActionPlan that has an Objective where 
Objective.item3='..' and this Objective has an Action where Action.item1='..' 
and Action.item2='..'

This is fairly trivial in a relational DB. But is it achievable in Solr ? 
Needless to say, there is no way a flat schema can work here. So I have assumed 
that nested documents were the only way to go. I have created a schema that 
includes these prerequisites, where "docType" is meant to contain the document 
type i.e. Profile, Event, Interaction etc :

 <field name="_version_" type="plong" indexed="false" stored="false"/>
 <field name="_nest_path_" type="_nest_path_" indexed="true" stored="true" 
docValues="true"/>
 <fieldType name="_nest_path_" class="solr.NestPathField" />
 <field name="_nest_parent_" type="string" indexed="true" stored="true" 
docValues="true"/>
 <field name="_root_" type="string" indexed="true" stored="true" 
docValues="true" />
 <field name="docType" type="string" indexed="true" stored="true" 
docValues="true" required="true"/> 
 <field name="id" type="string" indexed="true" stored="true" docValues="true"/> 
 <uniqueKey>id</uniqueKey> 

and I have used naming conventions on the other fields to reflect the 
structure, e.g.

<field name="customerId" type="string" indexed="true" stored="true" 
docValues="true"/>
<field name="customerName" type="string" indexed="true" stored="true" 
docValues="true"/>
<field name="interaction.code" type="string" indexed="true" stored="true" 
docValues="true"/>
<field name="interaction.result" type="string" indexed="true" stored="true" 
docValues="true"/>
<field name="step.code" type="string" indexed="true" stored="true" 
docValues="true"/>
<field name="step.result" type="string" indexed="true" stored="true" 
docValues="true"/>
<field name="event.type" type="string" indexed="true" stored="true" 
docValues="true"/>
<field name="event.date" type="pdate" indexed="true" stored="true" 
docValues="true"/>

I have been able to index my documents correctly, i.e. retrieving a Customer by 
id with fl=*,[child limit=999] returns the customer and all dependent objects 
in the correct hierarchy.

I have also been able to express queries on two levels, such as: find all 
Customers that have an Interaction with code='..' and result='..' and that have 
an Event with type='..' and date before '...'. The syntax looks like this, 
using two filter queries and block joins :

q= docType:Customer AND (<simple filtering clauses>)
fq= {!parent which=docType:Customer} +docType:Interaction +interaction.code:C01 
+interaction.result:COMPLETE
fq= {!parent which=docType:Customer} +docType:Event +event.type:EVTTYPE1 
+event.date:[NOW-1YEAR TO NOW]

That seems to return correct results. However, I can't quite figure out the 
proper way to take it one level deeper e.g. add filtering conditions on the 
Steps of the Interactions selected by the first filter query.

For the record, I am able to do so using streaming expressions, but these bring 
their own complexities so I am trying to find out first if something simpler 
can save me.

Thanks !

Reply via email to