I agree with ufuk.  Don't commit until the end.  Commits are only about
visibility of changes, not about durability (in SolrCloud and sometimes
standalone mode).  Thus there's no point in sending explicit commits from
the client until the end.  In solrconfig.xml, it may help to disable auto
soft commit but as Walter hints, it might not make a big difference since
it's mostly not blocking indexing.  It gives Solr needless work to do, may
kick off a background merge, and only one background merge can happen at a
time.

To speed up indexing more, set the mergeFactor to like 20 and then issue an
"optimize" to 10 segments at the end.

On Tue, Nov 26, 2024 at 8:47 AM ufuk yılmaz <uyil...@vivaldi.net.invalid>
wrote:

> Hello Noah
>
> I remember a trick but I didn’t try it myself before. Turn off all soft
> and hard commits and do a singular manual commit at the end    .I don’t
> know if it can work for the whole 40 million documents but it might speed
> up indexing when done in large chunks.
>
> —ufuk
>
> —
>
> > On Nov 26, 2024, at 22:05, Noah Torp-Smith <n...@dbc.dk.invalid> wrote:
> >
> > Hello,
> >
> > We have a setup where we periodically index a solr “offline” and then
> copy the data folder to a storage location. When we then deploy our solrs
> to production, the containers then download that data folder to the right
> place in the file system before the solr server is started. After the solr
> is started, it is never updated, we just tear it down and replace on the
> next cycle.
> > This works ok, but I was wondering if there are any tweaks one could
> apply to make the indexing go faster, when we know that there will be no
> searches during the time we are indexing? The corpus we are indexing is
> around 40 million documents, and most of the time is spent on waiting for
> commits. We commit every 5 million documents. Does that sound reasonable?
> Should we commit more often? Or should we just commit at the end?
> >
> > I am aware that there is a lot of context I have not provided here. I am
> just looking for any advice I can get for this kind of setup.
> >
> > Kind regards,
> > /Noah
>
>

Reply via email to