Shawn, Thanks for helping me out. Solr documentation has a lot of bells and whistles and I am overwhelmed.
The total number of documents is 200 millions. Each line of the csv will be a document. There are 200 million lines. I have the 2 options on load-n-index The current way of getting data is using API liked https:// .../200mmCsvCore/dataimport? command="full-import" &clean=true &commit=true &optimize=true &wt=json &indent=true &verbose=false &debug=false I am thinking of csv because another remote location also wants to use Solr and my gut feeling is that fetching a large single csv file over the network will keep data consistent across the two places. I didn't think about the parsing of the csv file with double quotes and delimiter. Will json file be faster? I am not aware of a way to split the 200 million lines CSV to batches of loads. Will smaller batches be faster? Could you give me an example of how to split? >From the Solr UI, how can I tell the number of threads are set for indexing ?