Re: Ignore unknown fields when indexing PDFs

2024-06-06 Thread Uwe Amberger
Thank you all. I hope my reply will be sent to the correct address/people/thread (--> Ignore unknown fields when indexing PDFs). I tried these lines (as mentioned by Jeremy Buckley) for my schema: or But an error still occurs during the indexing process: C:\SOLR\solr-9.2.1>java -Dc=mycore -Da

Re: Ignore unknown fields when indexing PDFs

2024-06-04 Thread Jeremy Buckley - IQS-C
Try this. In your schema, explicitly define all the fields that you want in your collection. Then, as the last field entry, add: On Tue, Jun 4, 2024 at 1:06 PM Thomas Corthals wrote: > When you extra text from PDF with Tika, it includes additional metadata > fields. This is the document I ge

Re: Ignore unknown fields when indexing PDFs - thanks to wunder

2024-06-04 Thread solr
Walter Underwood, thanks for this clear description of the PDF world - I love it! Walter Claassen Von:"Walter Underwood" An: users@solr.apache.org Datum: 04.06.2024 18:20 Betreff: Re: Ignore unknown fields when indexing PDFs PDFs don’t have fields. PDFs are instru

Re: Ignore unknown fields when indexing PDFs

2024-06-04 Thread Thomas Corthals
When you extra text from PDF with Tika, it includes additional metadata fields. This is the document I get after executing the example from the ref guide at https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html#trying-out-solr-cell { "responseHeader":{ "status":0,

Re: Ignore unknown fields when indexing PDFs

2024-06-04 Thread Walter Underwood
PDFs don’t have fields. PDFs are instructions for a monkey with rubber stamps to make a printed page. They have instructions to move to a location and put a character there. As an XML developer friend said, turning a PDF document into structured text is like turning hamburger back into a cow.