Hello Noah

I remember a trick but I didn’t try it myself before. Turn off all soft and 
hard commits and do a singular manual commit at the end    .I don’t know if it 
can work for the whole 40 million documents but it might speed up indexing when 
done in large chunks. 



> On Nov 26, 2024, at 22:05, Noah Torp-Smith <n...@dbc.dk.invalid> wrote:
> Hello,
> We have a setup where we periodically index a solr “offline” and then copy 
> the data folder to a storage location. When we then deploy our solrs to 
> production, the containers then download that data folder to the right place 
> in the file system before the solr server is started. After the solr is 
> started, it is never updated, we just tear it down and replace on the next 
> cycle.
> This works ok, but I was wondering if there are any tweaks one could apply to 
> make the indexing go faster, when we know that there will be no searches 
> during the time we are indexing? The corpus we are indexing is around 40 
> million documents, and most of the time is spent on waiting for commits. We 
> commit every 5 million documents. Does that sound reasonable? Should we 
> commit more often? Or should we just commit at the end?
> I am aware that there is a lot of context I have not provided here. I am just 
> looking for any advice I can get for this kind of setup.
> Kind regards,
> /Noah

Reply via email to