Walter Underwood,
thanks for this clear description of the PDF world - I love it!
Walter Claassen
Von: "Walter Underwood" <wun...@wunderwood.org>
An: users@solr.apache.org
Datum: 04.06.2024 18:20
Betreff: Re: Ignore unknown fields when indexing PDFs
PDFs don’t have fields. PDFs are instructions for a monkey with rubber
stamps to make a printed page. They have instructions to move to a
location and put a character there.
As an XML developer friend said, turning a PDF document into structured
text is like turning hamburger back into a cow.
I dealt with PDF documents in search for over twenty years. You are
lucky to get searchable text out of them.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)