Re: Solr update only if field differs

2022-08-05 Thread Vincenzo D'Amore
Unfortunately in my architecture I cannot rely on a database and on a updated/created time field. There is a potentially infinite stream of documents with a possible huge amount of duplication. So avoid the indexing of the duplicate documents (I suppose) should improve the performance. On Fri, 5 A

Re: Solr update only if field differs

2022-08-04 Thread Dave
—— At this point it would be interesting to see how this Processor would increase the indexing performance when you have many duplicates - when it comes to indexing performance with duplicates, there isn’t any difference than a new document. It’s mark as original destroyed, and new one replaces

Re: Solr update only if field differs

2022-08-04 Thread Vincenzo D'Amore
Hi Koji, thank you so much for the details. At first glance, looking at Javadoc, I didn't realize two things: I can use SignatureUpdateProcessorFactory on a signatureField different from the 'id' and also, very important, that there was a “overwriteDupes” parameter. In my current schema I cannot ch

Re: Solr update only if field differs

2022-08-03 Thread Koji Sekiguchi
Hi Vincenzo, I see. then I still think SignatureUpdateProcessorFactory is the one you are looking for. I tried to look for the explanation how it works in its javadoc and Solr Ref Guide, but no luck. Then I found the good one which was written by the contributor when SignatureUpdateProcessorFac

Re: Solr update only if field differs

2022-08-03 Thread Vincenzo D'Amore
I mean, the problem I need to solve is how to avoid a second update when there are no changes in the document, in other words to update a document only if one or more fields differs from the stored document. On Tue, Aug 2, 2022 at 6:16 AM Koji Sekiguchi wrote: > Hi Vincenzo, > > I cannot underst

Re: Solr update only if field differs

2022-08-01 Thread Koji Sekiguchi
Hi Vincenzo, I cannot understand what "the second update" means... Koji On 2022/08/02 0:39, Vincenzo D'Amore wrote: Koji, on second thought, this SignatureUpdateProcessorFactory does not avoid the second update... On Mon, Aug 1, 2022 at 5:36 PM Vincenzo D'Amore wrote: Hi Koji, thanks! It i

Re: Solr update only if field differs

2022-08-01 Thread Mikhail Khludnev
Sorry, Vincenzo. Have no idea. Don't hesitate to post the answer if you find it out. On Tue, Aug 2, 2022 at 1:50 AM Vincenzo D'Amore wrote: > Thanks for sharing this Mikhail. > Do you know how big is the overhead for Solr in handling documents that do > not have a new version? > For example, we

Re: Solr update only if field differs

2022-08-01 Thread Vincenzo D'Amore
Thanks for sharing this Mikhail. Do you know how big is the overhead for Solr in handling documents that do not have a new version? For example, we have to update ten thousand documents, but only 100 of them have a newer version. How does Solr behave? On Sun, Jul 31, 2022 at 2:16 AM Mikhail Khludn

Re: Solr update only if field differs

2022-08-01 Thread Vincenzo D'Amore
Koji, on second thought, this SignatureUpdateProcessorFactory does not avoid the second update... On Mon, Aug 1, 2022 at 5:36 PM Vincenzo D'Amore wrote: > Hi Koji, thanks! It is exactly what I was looking for! > > On Mon, Aug 1, 2022 at 4:28 AM Koji Sekiguchi > wrote: > >> Hi Vincenzo, >> >> I

Re: Solr update only if field differs

2022-08-01 Thread Vincenzo D'Amore
Hi Koji, thanks! It is exactly what I was looking for! On Mon, Aug 1, 2022 at 4:28 AM Koji Sekiguchi wrote: > Hi Vincenzo, > > I think SignatureUpdateProcessor is what you are looking for. > > > https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/Signatur

Re: Solr update only if field differs

2022-07-31 Thread Koji Sekiguchi
Hi Vincenzo, I think SignatureUpdateProcessor is what you are looking for. https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/SignatureUpdateProcessorFactory.java Koji On 2022/07/30 18:41, Vincenzo D'Amore wrote: Hi all, As far as I know it is not po

Re: Solr update only if field differs

2022-07-30 Thread Mikhail Khludnev
Hi, Vincenzo. I can only remember version control via checking a particular field. https://solr.apache.org/guide/solr/latest/indexing-guide/partial-document-updates.html#document-centric-versioning-constraints On Sun, Jul 31, 2022 at 2:52 AM Vincenzo D'Amore wrote: > Hi all, > > As far as I know