Walter Underwood,

thanks for this clear description of the PDF world - I love it!

Walter Claassen



Von:    "Walter Underwood" <wun...@wunderwood.org>
An:     users@solr.apache.org
Datum:  04.06.2024 18:20
Betreff:        Re: Ignore unknown fields when indexing PDFs

PDFs don’t have fields. PDFs are instructions for a monkey with rubber stamps to make a printed page. They have instructions to move to a location and put a character there.

As an XML developer friend said, turning a PDF document into structured text is like turning hamburger back into a cow.

I dealt with PDF documents in search for over twenty years. You are lucky to get searchable text out of them.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

Reply via email to