Op do 31 mrt. 2022 om 18:05 schreef dmitri maziuk <dmitri.maz...@gmail.com>:
> On 2022-03-31 9:29 AM, Tealdi Paolo wrote: > > Hi Eric > > > > Many thanks for the answer. > > I noticed that reindexcollection seems to be SLOWER than DIH import. > > (Warning: there be python there) > > This is trimmed down from a working script: > https://gist.github.com/dmaziuk/57b9c1926578bc10f12c0999c4b7ab53 > > It is slower than DIH. It commits every document, that's likely part of > it. I think in your case, if both cores reside on the same server, you > will have contention and extra slow-down from that -- compared to > pulling from one server and pushing to another. So I wouldn't expect it > to be blazing fast. > > The part where it pulls IDs from Solr is trivially modified to pull > whole records from your source index, so if you can write python, you > can adjust it for your use and see how it goes. > > Dima > You can speed that up significantly by sending multiple documents in the same request and only committing once: https://web.archive.org/web/20170418205443/http://www.raspberry.nl/2011/04/08/solr-update-performance/ If you prefer elephants over pythons, have a look at Solarium's BufferedAdd plugin that does just that: https://solarium.readthedocs.io/en/latest/plugins/#bufferedadd-plugin Thomas