Hi:

I'm new to the list and with Apache Solr. I'm trying version 9.7.0 on Linux 
Ubuntu 2204 and I want to index multiple pdf files to analyze them with tika.

I've created a CORE from the solr admin panel at the following path:

  *
"/home/myuser/APPS/solr-9.7.0/server/solr/configsets/CORE"

The source of PDF documents to import is located at the path:

  *
"/home/myuser/documentos/Doc_solr/"

And I have configured 3 files:

  *
managed-schema.xml:  definiciĆ³n de campos metadatos pdf

  *
<field name=> Metadatos y Text

  *
solrconfig.xml

  *
<requestHandler name="/select" class="solr.SearchHandler">
  *
<requestHandler name="/dataimport" 
class="org.apache.solr.handler.dataimport.DataImportHandler">

  *
tika-data-config.xml:

  *
<entity name="pdf" processor="TikaEntityProcessor"
  *
<entity name="file" processor="FileListEntityProcessor"

Do I need to create another file for the dataimporter.xml configuration? What 
should I include?

Thanks

Reply via email to