Another way to handle this is have your indexing code fork out to as many cores as the solr indexing server has. It’s way less work to force the code to run itself that many times in parallel, and as long as your sql queries and said tables are properly indexed the database shouldn’t be a bottle neck, just need to make sure the indexing server has the resources needed, since obviously you never index you a query server. It’s just a copy and tuned different than the indexer for fast reads, not writes.
> On Sep 29, 2022, at 2:21 PM, Andy Lester <a...@petdance.com> wrote: > > > >> On Sep 29, 2022, at 4:17 AM, Jan Høydahl <jan....@cominvent.com> wrote: >> >> * Index with multiple threads on the client, experiment to find a good >> number based on the number of CPUs on receiving side > > That may also mean having multiple clients. We went from taking about 8 hours > to index our entire 42M rows to about 1.5 hours because we ran 10 indexer > clients at once. Each indexer takes roughly 1/10th of the data and churns > away. We don't have any of the clients do a commit. After the indexers are > done, we run one more time through the queue with a commit at the end. > > As Jan says, make sure it's not your database that is the bottleneck, and > experiment with how many clients you want to have going at once. > > Andy