Today I tried to index some pdf files and see how that goes.

After a few hours of unsuccessful attempts using SolrCloud docker image etc. I 
decided I should follow the exact path in the documentation first.

I downloaded Solr 8.11 archive from: 
https://www.apache.org/dyn/closer.lua/lucene/solr/8.11.2/solr-8.11.2.zip?action=download

I extracted the archive to a folder, then run the command in docs: 
https://solr.apache.org/guide/8_11/uploading-data-with-solr-cell-using-apache-tika.html

bin/solr -e schemaless

It started without problem and created a core named “gettingstarted” everything 
works!

Then I tried out the command in docs:
curl 
'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1&commit=true'
 -F "myfile=@example/exampledocs/solr-word.pdf"

It gives me 404 error. I was also getting 404 with various other methods I 
tried today. It seems the “schemaless” example doesn’t configure 
ExtractingRequestHandler in solrconfig.xml at all for me.

Am I doing something wrong?

--ufuk

Sent from Mail for Windows

Reply via email to