On 11/11/2023 12:32, Vince McMahon wrote:
What is the fastest way to load and index this large and wide CSV file? It
is taking too long, 20+ hours, now.
I am assuming here that you are sending the CSV data directly to Solr
and letting Solr parse it into documents. If that is incorrect, please
fully describe your indexing software.
How many total documents are being indexed in those 20 hours?
How many threads do you have indexing simultaneously? How many CSV
lines are you sending in each batch?
When I was maintaining large-ish Solr installs, I was doing the indexing
single-threaded and it would do about 1000 docs per second. Indexing
with multiple threads is the secret to making Solr index quickly.
Thanks,
Shawn