Op do 31 mrt. 2022 om 18:05 schreef dmitri maziuk <dmitri.maz...@gmail.com>:

> On 2022-03-31 9:29 AM, Tealdi Paolo wrote:
> > Hi Eric
> >
> > Many thanks for the answer.
> > I noticed that reindexcollection seems to be SLOWER than DIH import.
>
> (Warning: there be python there)
>
> This is trimmed down from a working script:
> https://gist.github.com/dmaziuk/57b9c1926578bc10f12c0999c4b7ab53
>
> It is slower than DIH. It commits every document, that's likely part of
> it. I think in your case, if both cores reside on the same server, you
> will have contention and extra slow-down from that -- compared to
> pulling from one server and pushing to another. So I wouldn't expect it
> to be blazing fast.
>
> The part where it pulls IDs from Solr is trivially modified to pull
> whole records from your source index, so if you can write python, you
> can adjust it for your use and see how it goes.
>
> Dima
>

You can speed that up significantly by sending multiple documents in the
same request and only committing once:
https://web.archive.org/web/20170418205443/http://www.raspberry.nl/2011/04/08/solr-update-performance/

If you prefer elephants over pythons, have a look at Solarium's BufferedAdd
plugin that does just that:
https://solarium.readthedocs.io/en/latest/plugins/#bufferedadd-plugin

Thomas

Reply via email to