Thank you all. I hope my reply will be sent to the correct
address/people/thread (--> Ignore unknown fields when indexing PDFs).
I tried these lines (as mentioned by Jeremy Buckley) for my schema:
or
But an error still occurs during the indexing process:
C:\SOLR\solr-9.2.1>java -Dc=mycore -Da
Try this. In your schema, explicitly define all the fields that you want
in your collection. Then, as the last field entry, add:
On Tue, Jun 4, 2024 at 1:06 PM Thomas Corthals
wrote:
> When you extra text from PDF with Tika, it includes additional metadata
> fields. This is the document I ge
Walter Underwood,
thanks for this clear description of the PDF world - I love it!
Walter Claassen
Von:"Walter Underwood"
An: users@solr.apache.org
Datum: 04.06.2024 18:20
Betreff: Re: Ignore unknown fields when indexing PDFs
PDFs don’t have fields. PDFs are instru
When you extra text from PDF with Tika, it includes additional metadata
fields. This is the document I get after executing the example from the ref
guide at
https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html#trying-out-solr-cell
{
"responseHeader":{
"status":0,
PDFs don’t have fields. PDFs are instructions for a monkey with rubber stamps
to make a printed page. They have instructions to move to a location and put a
character there.
As an XML developer friend said, turning a PDF document into structured text is
like turning hamburger back into a cow.