Thanks, Shawn. DIH full-import, by itself works very well. It is bummer that my incremental load itself is into millions. When specifying batchSize on data source, the delta-import will honor that batch size once, for the first fetch, then will loop the rest by hundreds per sec. That doesn't help getting all the Indexing done in a day for my need.
I hope this finding may help the maintainer of the code to improve. It took me days to realize it. Thanks, again. On Thu, Dec 7, 2023, 4:49 PM Shawn Heisey <apa...@elyograg.org.invalid> wrote: > On 12/7/23 07:56, Vince McMahon wrote: > > { > > "responseHeader": { > > "status": 0, > > "QTime": 0 > > }, > > "initArgs": [ > > "defaults", > > [ > > "config", > > "db-data-config.xml" > > ] > > ], > > "command": "status", > > "status": "idle", > > "importResponse": "", > > "statusMessages": { > > "Total Requests made to DataSource": "1", > > "Total Rows Fetched": "915000", > > "Total Documents Processed": "915000", > > "Total Documents Skipped": "0", > > "Full Dump Started": "2023-12-07 02:54:29", > > "": "Indexing completed. Added/Updated: 915000 documents. Deleted > > 0 documents.", > > "Committed": "2023-12-07 02:54:51", > > "Time taken": "0:0:21.831" > > } > > } > > There's no way Solr can index 915000 docs in 21 seconds without a LOT of > threads in the indexing program, and DIH is single-threaded. As you've > already noted, it didn't actually index most of the documents. I don't > have an answer as to why it didn't work. > > DIH lacks decent logging, error handling, and multi-threading. It is > not the most reliable way to index. This is why it was deprecated a > while back and then removed from 9.x. You would be far better off > writing your own indexing program rather than using DIH. > > I have an idea for a multi-threaded database->solr indexing program, but > haven't had much time to spend on it. If I can ever get it done, it > will be freely available. > > On the entity, "rows" is not a valid attribute. To control how many DB > rows are fetched at a time, set batchSize on the dataSource element. > The default batchSize is 500. > > Thanks, > Shawn > >