—— At this point it would be interesting to see how this Processor would increase the indexing performance when you have many duplicates
- when it comes to indexing performance with duplicates, there isn’t any difference than a new document. It’s mark as original destroyed, and new one replaces. Update isn’t a real thing, and the first operation is pretty much a joke speed wise and the second is as fast as indexing, and solr will manage the segments as needed when it determines to do so. Your best bet is to manage this code wise. Have an updated/created time field and when indexing only run on those that fits your automated schedule against such fields. In a database this takes like 5 minutes to write into your indexer, and I can promise you will be faster than trying to use a built in solr operation to figure it out for you. If I’m wrong I would love to know, but indexing code logic will always be faster than relying on a built in server function for these sorts of things. > On Aug 4, 2022, at 6:41 PM, Vincenzo D'Amore <v.dam...@gmail.com> wrote: > > > At this point it would be interesting to see how this Processor would > increase the indexing performance when you have many duplicates