If you need to upload csv files directly into solr (and they have a
reasonable amount of rows, i.e. not too much to lead an OOM in Solr)
Well, I'm used to loading them directly with a curl from a bash script.
It's something like this:

curl "http://solr.server:8983/solr/collection/update?commit=true";
--data-binary @file.csv -H 'Content-type:application/csv'

You must have the name of the fields in your solr collection as the first
row of CSV file, it should be something like that:

"id","code","description","field1","field2","field3"
1,"code1","description 1","xxxx","yyy","zzz"
2,"code2","description 2","20","129","M"



On Fri, Feb 10, 2023 at 9:28 PM Chris Hostetter <hossman_luc...@fucit.org>
wrote:

> : @Chris can you provide a sample Java code using
> ContentStreamUpdateRequest
> : class?
>
> I mean ... it's a SolrRequest like any other...
>
> 1) create an instante
>
> 2) add the File you want to add (or pass in some other ContentStream --
> maybe StringStream if your CSV is already in memory)
>
> 3) process() it using your SolrClient
>
>
> As with most classes in solrj, looking at the the test cases is probably
> the best way to see "sample" code.  (allthough some of them are explictly
> convoluted to test edge cases in the underlying implementation.)
>
>
> This is probably the simplest one...
>
> hossman@slate:~/lucene/solr [j11] [branch_9_1] $ grep -A5 'new
> ContentStreamUpdateRequest'
> solr/solrj/src/test/org/apache/solr/client/solrj/request/json/JsonQueryRequestIntegrationTest.java
>     ContentStreamUpdateRequest up = new
> ContentStreamUpdateRequest("/update");
>     up.setParam("collection", COLLECTION_NAME);
>     up.addFile(getFile("solrj/books.csv"), "application/csv");
>     up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>     UpdateResponse updateResponse = up.process(cluster.getSolrClient());
>     assertEquals(0, updateResponse.getStatus());
>
>
>
>
>
> :
> : Le ven. 10 févr. 2023 à 19:22, Chris Hostetter <hossman_luc...@fucit.org>
> a
> : écrit :
> :
> : >
> : > : what is a common use case then if it is not the csv type?
> : > : how to index massively data into Solr using SolrJ
> : > : You can't just read line by line each dataset you want to index.
> : >
> : > There are lots of usecases for using SolrJ that involve programaticlly
> : > generating the SolrInputDocuments you wnat to index in solr --
> frequently
> : > after ready from some normalized /authoritative data store.
> : >
> : > If you already have data "on disk" in a format that solr can parse
> (csv,
> : > solr's xml, a PDF file you want Solr's extraction module to parse,
> etc...)
> : > then that's what the ContentStreamUpdateRequest is for...
> : >
> : >
> : >
> https://solr.apache.org/docs/9_1_0/solrj/org/apache/solr/client/solrj/request/ContentStreamUpdateRequest.html
> : >
> : > :
> : > : Le lun. 30 janv. 2023 à 14:11, Jan Høydahl <jan....@cominvent.com> a
> : > écrit :
> : > :
> : > : > It's not a common use case for SolrJ to post plain CSV content to
> Solr.
> : > : > SolrJ is used to push SolrInputDocument objects. Maybe there's a
> way
> : > to do
> : > : > it by using some Generic request type and overriding content type..
> : > Can you
> : > : > explain more what you app will do, where that CSV file comes from
> in
> : > the
> : > : > first place and why you'd want to use SolrJ to move it to Solr,
> rather
> : > than
> : > : > curl or some other http client lib?
> : > : >
> : > : > Jan
> : > : >
> : > : > > 29. jan. 2023 kl. 20:44 skrev marc nicole <mk1853...@gmail.com>:
> : > : > >
> : > : > > The Java code should perform the post. Any piece of code to show
> to
> : > : > better
> : > : > > explain this?
> : > : > >
> : > : > > thanks
> : > : > >
> : > : > > Le dim. 29 janv. 2023 à 20:29, Jan Høydahl <
> jan....@cominvent.com> a
> : > : > écrit :
> : > : > >
> : > : > >> Read csv in your app, create a Solr doc from each line and
> ingest to
> : > : > Solr
> : > : > >> in fitting batches. You can use a csv library or just parse each
> : > line
> : > : > >> yourself if the format is fixed.
> : > : > >>
> : > : > >> If you need to post csv directly to Solr you’d use a plain http
> post
> : > : > with
> : > : > >> content-type csv, but in most cases your app would do that.
> : > : > >>
> : > : > >> Jan Høydahl
> : > : > >>
> : > : > >>> 29. jan. 2023 kl. 20:21 skrev marc nicole <mk1853...@gmail.com
> >:
> : > : > >>>
> : > : > >>> Hi guys,
> : > : > >>>
> : > : > >>> I can't find a reference on how to index a dataset.csv file
> into
> : > Solr
> : > : > >> using
> : > : > >>> SolrJ.
> : > : > >>> https://solr.apache.org/guide/6_6/using-solrj.html
> : > : > >>>
> : > : > >>> Thanks.
> : > : > >>
> : > : >
> : > : >
> : > :
> : >
> : > -Hoss
> : > http://www.lucidworks.com/
> :
>
> -Hoss
> http://www.lucidworks.com/



-- 
Vincenzo D'Amore

Reply via email to