> On Dec 2, 2019, at 11:12 PM, Grant Taylor via cctalk <cctalk@classiccmp.org>
> wrote:
>
> On 12/2/19 9:06 PM, Grant Taylor via cctalk wrote:
>> In my opinion, PDFs are the last place that computer usable data goes.
>> Because getting anything out of a PDF as a data source is next to impossible.
>> Sure, you, a human, can read it and consume the data.
>> Try importing a simple table from a PDF and working with the data in
>> something like a spreadsheet. You can't do it. The raw data is there. But
>> you can't readily use it.
>> This is why I say that a PDF is the end of the line for data.
>> I view it as effectively impossible to take data out of a PDF and do
>> anything with it without first needing to reconstitute it before I can use
>> it.
>
> I'll add this:
>
> PDF is a decent page layout format. But trying to view the contents in any
> different layout is problematic (at best).
>
> Trying to use the result of a page layout as a data source is ... problematic.
That's hardly surprising. These properties are precisely the intent of PDF.
It's basically a portable variant of PostScript, with some cleanups (relatively
sane Unicode support, transparency, hyperlinks, a few other things). Its
specific purpose is to encode page images, just as they appear on actual paper.
Indeed, PDF is often used as a "camera ready copy" format for material going
to a print shop. It works quite well for that.
For scanned documents, where each page is just an image, PDF is a decent
container format. For documents with actual text, it's far more problematic.
Using PDF as an intermediate form is every bit as inappropriate as using JPEG
for line art or any other application where artefacts are impermissible. The
trouble (for both of these) is that many of the users don't know the
limitations and blindly use the wrong tools.
paul