On 12/7/23 07:56, Vince McMahon wrote:
{
   "responseHeader": {
     "status": 0,
     "QTime": 0
   },
   "initArgs": [
     "defaults",
     [
       "config",
       "db-data-config.xml"
     ]
   ],
   "command": "status",
   "status": "idle",
   "importResponse": "",
   "statusMessages": {
     "Total Requests made to DataSource": "1",
     "Total Rows Fetched": "915000",
     "Total Documents Processed": "915000",
     "Total Documents Skipped": "0",
     "Full Dump Started": "2023-12-07 02:54:29",
     "": "Indexing completed. Added/Updated: 915000 documents. Deleted
0 documents.",
     "Committed": "2023-12-07 02:54:51",
     "Time taken": "0:0:21.831"
   }
}

There's no way Solr can index 915000 docs in 21 seconds without a LOT of threads in the indexing program, and DIH is single-threaded. As you've already noted, it didn't actually index most of the documents. I don't have an answer as to why it didn't work.

DIH lacks decent logging, error handling, and multi-threading. It is not the most reliable way to index. This is why it was deprecated a while back and then removed from 9.x. You would be far better off writing your own indexing program rather than using DIH.

I have an idea for a multi-threaded database->solr indexing program, but haven't had much time to spend on it. If I can ever get it done, it will be freely available.

On the entity, "rows" is not a valid attribute. To control how many DB rows are fetched at a time, set batchSize on the dataSource element. The default batchSize is 500.

Thanks,
Shawn

Reply via email to