Hi Koji, thank you so much for the details. At first glance, looking at Javadoc, I didn't realize two things: I can use SignatureUpdateProcessorFactory on a signatureField different from the 'id' and also, very important, that there was a “overwriteDupes” parameter. In my current schema I cannot change the id field and there are also another fields I need to take in account to calculate the document signature.
Again, in my case I have to set overwriteDupes=“false”, but reading the Solr guide I see a lot caveats when overwriteDupes=“true”. When there is the needs to calculate a signature an still overwrite the document? This should be a niche behavior. At this point it would be interesting to see how this Processor would increase the indexing performance when you have many duplicates. I think this is the part of Solr Reference guide you were looking for: https://solr.apache.org/guide/8_11/de-duplication.html There is also a very useful example that explains how to implement deduplication with all SolrCloud caveats (my case). Thanks again for sharing this with me, best regards Vincenzo On Thu, 4 Aug 2022 at 08:31, Koji Sekiguchi <koji.sekigu...@rondhuit.com> wrote: > Hi Vincenzo, > > I see. then I still think SignatureUpdateProcessorFactory is the one you > are looking for. > I tried to look for the explanation how it works in its javadoc and Solr > Ref Guide, but no luck. > Then I found the good one which was written by the contributor when > SignatureUpdateProcessorFactory > was contributed. > > Please read: > > Add support for hash based exact/near duplicate document handling > https://issues.apache.org/jira/browse/SOLR-799 > > Deduplication > https://cwiki.apache.org/confluence/display/solr/Deduplication > > Koji > > On 2022/08/03 23:40, Vincenzo D'Amore wrote: > > I mean, the problem I need to solve is how to avoid a second update when > > there are no changes in the document, in other words to update a document > > only if one or more fields differs from the stored document. > > > > On Tue, Aug 2, 2022 at 6:16 AM Koji Sekiguchi < > koji.sekigu...@rondhuit.com> > > wrote: > > > >> Hi Vincenzo, > >> > >> I cannot understand what "the second update" means... > >> > >> Koji > >> > >> On 2022/08/02 0:39, Vincenzo D'Amore wrote: > >>> Koji, on second thought, this SignatureUpdateProcessorFactory does not > >>> avoid the second update... > >>> > >>> On Mon, Aug 1, 2022 at 5:36 PM Vincenzo D'Amore <v.dam...@gmail.com> > >> wrote: > >>> > >>>> Hi Koji, thanks! It is exactly what I was looking for! > >>>> > >>>> On Mon, Aug 1, 2022 at 4:28 AM Koji Sekiguchi < > >> koji.sekigu...@rondhuit.com> > >>>> wrote: > >>>> > >>>>> Hi Vincenzo, > >>>>> > >>>>> I think SignatureUpdateProcessor is what you are looking for. > >>>>> > >>>>> > >>>>> > >> > https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/SignatureUpdateProcessorFactory.java > >>>>> > >>>>> Koji > >>>>> > >>>>> On 2022/07/30 18:41, Vincenzo D'Amore wrote: > >>>>>> Hi all, > >>>>>> > >>>>>> As far as I know it is not possible, but just to be sure I'm asking > >> from > >>>>>> your experience, do you know if there is any way, on Solr side, to > >>>>> update a > >>>>>> document only if one or more fields differs from the stored > document? > >>>>>> > >>>>>> Best regards, > >>>>>> Vincenzo > >>>>> > >>>> > >>>> > >>>> -- > >>>> Vincenzo D'Amore > >>>> > >>>> > >>> > >> > > > > > -- Vincenzo D'Amore