I had a requirement where I needed to index a new document only when there’s a 
change in some of the fields. I implemented it using Solr’s deduplication 
feature : https://solr.apache.org/guide/8_4/de-duplication.html

An example document:
{
  userID: “userid”,
  userName: “usernameA”,
  userLoginCount: 123,
  date: “2020-03-03T13:41:01.104Z”
}

These documents are sent to Solr regularly to be indexed. Requirement was to 
index a new document if user changes his username, otherwise update 
“userLoginCount” and “date” fields. My configuration is:

     <updateRequestProcessorChain name="dedupe" 
processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date">
       <processor class="solr.processor.SignatureUpdateProcessorFactory">
         <bool name="enabled">true</bool>
         <str name="signatureField">id</str>
         <bool name="overwriteDupes">true</bool>
         <str name="fields">userID,userName</str>
         <str name="signatureClass">solr.processor.Lookup3Signature</str>
       </processor>
       <processor class="solr.LogUpdateProcessorFactory" />
       <processor class="solr.RunUpdateProcessorFactory" />
     </updateRequestProcessorChain>

The following document triggers a new index:
{
  userID: “userid”,
  userName: “usernameB”,
  userLoginCount: 128,
  date: “2020-04-03T13:41:01.104Z”
}

This is working nicely with only one problem. If a user changes his username 
from usernameA to usernameB, and later to usernameA again, the older document 
is updated instead. What I was trying to capture was a user’s username changes 
throughout the time, preserving older states, so when this happens there should 
be 3 documents.

Is there a way to achieve this in Solr, or should I find a solution outside 
Solr?

--ufuk yilmaz

Sent from Mail for Windows 10

Reply via email to