Hi Koji, thank you so much for the details.
At first glance, looking at Javadoc, I didn't realize two things: I can use
SignatureUpdateProcessorFactory on a signatureField different from the 'id'
and also, very important, that there was a “overwriteDupes” parameter.
In my current schema I cannot change the id field and there are also
another fields I need to take in account to calculate the document
signature.

Again, in my case I have to set overwriteDupes=“false”, but reading the
Solr guide I see a lot caveats when overwriteDupes=“true”. When there is
the needs to calculate a signature an still overwrite the document? This
should be a niche behavior.

At this point it would be interesting to see how this Processor would
increase the indexing performance when you have many duplicates.

I think this is the part of Solr Reference guide you were looking for:
https://solr.apache.org/guide/8_11/de-duplication.html
There is also a very useful example that explains how to implement
deduplication with all SolrCloud caveats (my case).

Thanks again for sharing this with me, best regards
Vincenzo

On Thu, 4 Aug 2022 at 08:31, Koji Sekiguchi <koji.sekigu...@rondhuit.com>
wrote:

> Hi Vincenzo,
>
> I see. then I still think SignatureUpdateProcessorFactory is the one you
> are looking for.
> I tried to look for the explanation how it works in its javadoc and Solr
> Ref Guide, but no luck.
> Then I found the good one which was written by the contributor when
> SignatureUpdateProcessorFactory
> was contributed.
>
> Please read:
>
> Add support for hash based exact/near duplicate document handling
> https://issues.apache.org/jira/browse/SOLR-799
>
> Deduplication
> https://cwiki.apache.org/confluence/display/solr/Deduplication
>
> Koji
>
> On 2022/08/03 23:40, Vincenzo D'Amore wrote:
> > I mean, the problem I need to solve is how to avoid a second update when
> > there are no changes in the document, in other words to update a document
> > only if one or more fields differs from the stored document.
> >
> > On Tue, Aug 2, 2022 at 6:16 AM Koji Sekiguchi <
> koji.sekigu...@rondhuit.com>
> > wrote:
> >
> >> Hi Vincenzo,
> >>
> >> I cannot understand what "the second update" means...
> >>
> >> Koji
> >>
> >> On 2022/08/02 0:39, Vincenzo D'Amore wrote:
> >>> Koji, on second thought, this SignatureUpdateProcessorFactory does not
> >>> avoid the second update...
> >>>
> >>> On Mon, Aug 1, 2022 at 5:36 PM Vincenzo D'Amore <v.dam...@gmail.com>
> >> wrote:
> >>>
> >>>> Hi Koji, thanks! It is exactly what I was looking for!
> >>>>
> >>>> On Mon, Aug 1, 2022 at 4:28 AM Koji Sekiguchi <
> >> koji.sekigu...@rondhuit.com>
> >>>> wrote:
> >>>>
> >>>>> Hi Vincenzo,
> >>>>>
> >>>>> I think SignatureUpdateProcessor is what you are looking for.
> >>>>>
> >>>>>
> >>>>>
> >>
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/SignatureUpdateProcessorFactory.java
> >>>>>
> >>>>> Koji
> >>>>>
> >>>>> On 2022/07/30 18:41, Vincenzo D'Amore wrote:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> As far as I know it is not possible, but just to be sure I'm asking
> >> from
> >>>>>> your experience, do you know if there is any way, on Solr side, to
> >>>>> update a
> >>>>>> document only if one or more fields differs from the stored
> document?
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Vincenzo
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Vincenzo D'Amore
> >>>>
> >>>>
> >>>
> >>
> >
> >
>
-- 
Vincenzo D'Amore

Reply via email to