If you need to upload csv files directly into solr (and they have a reasonable amount of rows, i.e. not too much to lead an OOM in Solr) Well, I'm used to loading them directly with a curl from a bash script. It's something like this:
curl "http://solr.server:8983/solr/collection/update?commit=true" --data-binary @file.csv -H 'Content-type:application/csv' You must have the name of the fields in your solr collection as the first row of CSV file, it should be something like that: "id","code","description","field1","field2","field3" 1,"code1","description 1","xxxx","yyy","zzz" 2,"code2","description 2","20","129","M" On Fri, Feb 10, 2023 at 9:28 PM Chris Hostetter <hossman_luc...@fucit.org> wrote: > : @Chris can you provide a sample Java code using > ContentStreamUpdateRequest > : class? > > I mean ... it's a SolrRequest like any other... > > 1) create an instante > > 2) add the File you want to add (or pass in some other ContentStream -- > maybe StringStream if your CSV is already in memory) > > 3) process() it using your SolrClient > > > As with most classes in solrj, looking at the the test cases is probably > the best way to see "sample" code. (allthough some of them are explictly > convoluted to test edge cases in the underlying implementation.) > > > This is probably the simplest one... > > hossman@slate:~/lucene/solr [j11] [branch_9_1] $ grep -A5 'new > ContentStreamUpdateRequest' > solr/solrj/src/test/org/apache/solr/client/solrj/request/json/JsonQueryRequestIntegrationTest.java > ContentStreamUpdateRequest up = new > ContentStreamUpdateRequest("/update"); > up.setParam("collection", COLLECTION_NAME); > up.addFile(getFile("solrj/books.csv"), "application/csv"); > up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); > UpdateResponse updateResponse = up.process(cluster.getSolrClient()); > assertEquals(0, updateResponse.getStatus()); > > > > > > : > : Le ven. 10 févr. 2023 à 19:22, Chris Hostetter <hossman_luc...@fucit.org> > a > : écrit : > : > : > > : > : what is a common use case then if it is not the csv type? > : > : how to index massively data into Solr using SolrJ > : > : You can't just read line by line each dataset you want to index. > : > > : > There are lots of usecases for using SolrJ that involve programaticlly > : > generating the SolrInputDocuments you wnat to index in solr -- > frequently > : > after ready from some normalized /authoritative data store. > : > > : > If you already have data "on disk" in a format that solr can parse > (csv, > : > solr's xml, a PDF file you want Solr's extraction module to parse, > etc...) > : > then that's what the ContentStreamUpdateRequest is for... > : > > : > > : > > https://solr.apache.org/docs/9_1_0/solrj/org/apache/solr/client/solrj/request/ContentStreamUpdateRequest.html > : > > : > : > : > : Le lun. 30 janv. 2023 à 14:11, Jan Høydahl <jan....@cominvent.com> a > : > écrit : > : > : > : > : > It's not a common use case for SolrJ to post plain CSV content to > Solr. > : > : > SolrJ is used to push SolrInputDocument objects. Maybe there's a > way > : > to do > : > : > it by using some Generic request type and overriding content type.. > : > Can you > : > : > explain more what you app will do, where that CSV file comes from > in > : > the > : > : > first place and why you'd want to use SolrJ to move it to Solr, > rather > : > than > : > : > curl or some other http client lib? > : > : > > : > : > Jan > : > : > > : > : > > 29. jan. 2023 kl. 20:44 skrev marc nicole <mk1853...@gmail.com>: > : > : > > > : > : > > The Java code should perform the post. Any piece of code to show > to > : > : > better > : > : > > explain this? > : > : > > > : > : > > thanks > : > : > > > : > : > > Le dim. 29 janv. 2023 à 20:29, Jan Høydahl < > jan....@cominvent.com> a > : > : > écrit : > : > : > > > : > : > >> Read csv in your app, create a Solr doc from each line and > ingest to > : > : > Solr > : > : > >> in fitting batches. You can use a csv library or just parse each > : > line > : > : > >> yourself if the format is fixed. > : > : > >> > : > : > >> If you need to post csv directly to Solr you’d use a plain http > post > : > : > with > : > : > >> content-type csv, but in most cases your app would do that. > : > : > >> > : > : > >> Jan Høydahl > : > : > >> > : > : > >>> 29. jan. 2023 kl. 20:21 skrev marc nicole <mk1853...@gmail.com > >: > : > : > >>> > : > : > >>> Hi guys, > : > : > >>> > : > : > >>> I can't find a reference on how to index a dataset.csv file > into > : > Solr > : > : > >> using > : > : > >>> SolrJ. > : > : > >>> https://solr.apache.org/guide/6_6/using-solrj.html > : > : > >>> > : > : > >>> Thanks. > : > : > >> > : > : > > : > : > > : > : > : > > : > -Hoss > : > http://www.lucidworks.com/ > : > > -Hoss > http://www.lucidworks.com/ -- Vincenzo D'Amore