Shawn,

Thanks for helping me out.   Solr documentation has a lot of bells and
whistles and I am overwhelmed.

The total number of documents is 200 millions.  Each line of the csv will
be a document.  There are 200 million lines.

I have the 2 options on load-n-index

The current way of getting data is using API liked https://
.../200mmCsvCore/dataimport?
command="full-import"
&clean=true
&commit=true
&optimize=true
&wt=json
&indent=true
&verbose=false
&debug=false

I am thinking of csv because another remote location also wants to use Solr
and my gut feeling is that fetching a large single csv file over the
network will keep data consistent across the two places.

I didn't think about the parsing of the csv file with double quotes and
delimiter.  Will json file be faster?

I am not aware of a way to split the 200 million lines CSV to batches of
loads.  Will smaller batches be faster?  Could you give me an example of
how to split?

>From the Solr UI, how can I tell the number of threads are set for indexing
?

Reply via email to