On 12/7/23 07:56, Vince McMahon wrote:
{ "responseHeader": { "status": 0, "QTime": 0 }, "initArgs": [ "defaults", [ "config", "db-data-config.xml" ] ], "command": "status", "status": "idle", "importResponse": "", "statusMessages": { "Total Requests made to DataSource": "1", "Total Rows Fetched": "915000", "Total Documents Processed": "915000", "Total Documents Skipped": "0", "Full Dump Started": "2023-12-07 02:54:29", "": "Indexing completed. Added/Updated: 915000 documents. Deleted 0 documents.", "Committed": "2023-12-07 02:54:51", "Time taken": "0:0:21.831" } }
There's no way Solr can index 915000 docs in 21 seconds without a LOT of threads in the indexing program, and DIH is single-threaded. As you've already noted, it didn't actually index most of the documents. I don't have an answer as to why it didn't work.
DIH lacks decent logging, error handling, and multi-threading. It is not the most reliable way to index. This is why it was deprecated a while back and then removed from 9.x. You would be far better off writing your own indexing program rather than using DIH.
I have an idea for a multi-threaded database->solr indexing program, but haven't had much time to spend on it. If I can ever get it done, it will be freely available.
On the entity, "rows" is not a valid attribute. To control how many DB rows are fetched at a time, set batchSize on the dataSource element. The default batchSize is 500.
Thanks, Shawn