Hello Noah

I remember a trick but I didn’t try it myself before. Turn off all soft and 
hard commits and do a singular manual commit at the end    .I don’t know if it 
can work for the whole 40 million documents but it might speed up indexing when 
done in large chunks. 

—ufuk

—

> On Nov 26, 2024, at 22:05, Noah Torp-Smith <n...@dbc.dk.invalid> wrote:
> 
> Hello,
> 
> We have a setup where we periodically index a solr “offline” and then copy 
> the data folder to a storage location. When we then deploy our solrs to 
> production, the containers then download that data folder to the right place 
> in the file system before the solr server is started. After the solr is 
> started, it is never updated, we just tear it down and replace on the next 
> cycle.
> This works ok, but I was wondering if there are any tweaks one could apply to 
> make the indexing go faster, when we know that there will be no searches 
> during the time we are indexing? The corpus we are indexing is around 40 
> million documents, and most of the time is spent on waiting for commits. We 
> commit every 5 million documents. Does that sound reasonable? Should we 
> commit more often? Or should we just commit at the end?
> 
> I am aware that there is a lot of context I have not provided here. I am just 
> looking for any advice I can get for this kind of setup.
> 
> Kind regards,
> /Noah

Reply via email to