Oooooh look into perls fork manager module, https://metacpan.org/pod/Parallel::ForkManager
. Only trick is each time it spawns a process you have to redeclare the dbh and any stored procedures but it’s a small price to pay for being able to simply adjust the number of parallel jobs it will do in one script, want 25? Sure run 25! Only trick is if you do incremental commits based on doc count you should set it in solr itself as once a process spawns any outside variables, like a doc counter, can’t get modified across each one and persist. > On Jul 22, 2022, at 2:31 PM, Andy Lester <a...@petdance.com> wrote: > > > >>> On Jul 22, 2022, at 1:19 PM, dmitri maziuk <dmitri.maz...@gmail.com> wrote: >>> >>> The DIH does not yet support Solr 9 but I don't think it'll be long before >>> it does. >> >> FWIW I've been gradually migrating our DIH imports to little python scripts; >> with all the extra things you can do in those, and less bloat in the main >> JVM, you gotta wonder how much interest there's gonna be in keeping that >> alive long-term. > > > And I’m sure the DIH is slower, too. > > We used to have the DIH pull from our Oracle database. It took about 10 > hours to do all 45M records. > > I migrated to simple Perl program that pulled from Oracle, created JSON and > sent it to the update handlers. We can easily run 10 in parallel and finish > it off in about 45 minutes. > > Andy