Hi,

If you want to index fast you shold
* Make sure you have enough hardware on the solr side to handle the bulk load
* Index with multiple threads on the client, experiment to find a good number 
based on the number of CPUs on receiving side
* If using JAVA on client, use CloudSolrClient which is smart enough to send 
docs to correct shard
* Do NOT commit during the bulk load, wait until the end
* Experiemnt with batch size, e.g. try sending 500 docs in each update request, 
then 1000 etc until you find the best compromise
* Use JavaBin if you can, it should be slightly faster than JSON, but probably 
not much
* Remember that your RDBMS may be the bottleneck at the end of the day, how 
many rows can it deliver? You may need to partition the data set with SELECT 
... WHERE clauses for each client to read in parallell.

Jan

> 29. sep. 2022 kl. 10:06 skrev Shankar R <iamrav...@gmail.com>:
> 
> Hi,
> We are having nearly 70-80 millions of data which need to be indexed in
> solr 8.6.1.
> We want to choose between Java BInary format or direct JSON format.
> Our source data is DBMS which is a structured data.
> 
> Regards
> Ravi

Reply via email to