Re: PDF

Mark Waddingham via use-livecode Tue, 15 May 2018 01:23:39 -0700

On 2018-05-14 20:50, Richard Gaskin via use-livecode wrote:

They are indeed for very different purposes, and we've been using PDF
for so long that it's become the hammer that makes everything look
like a nail, applied to so much while it's only truly best for a much
smaller subset.

Of course the subtle detail here is the use of 'best' - 'best' relativeto what requirements?

PDF is a very general format - it models the notion of printed matterwhich we still grow up with (although, admittedly, increasingly less astime goes by). This suggests the problem is not with PDF, nor with PDFbeing used - it is with the ties humans have to 'printed matter'.

In the course of my work I often go through periods of research, which
inevitably has me reading a lot of academic research papers and
corporate white papers.  Nearly all of them are published as PDF, many
exclusively in that format.


Two things to point out here:

Academic research papers are generally written using the format whichthe journal publishers require - typically LaTeX/TeX for anything beyondthe normal written word and embedded figures.

Corporate white papers are usually written using Word Processors, in aformat / layout defined by the company they are coming from. In manycases they will also go through some sort of 'design' phase afterwards,particularly if they are to be published widely - and often that will beusing some page layout tool (such as InDesign).

In both these cases, the author/designer is designing at a fixed width(the joy of the rise of WYSIWIG in the 80's / 90's perhaps?)

The circumstances in which I'm immersed in such focus vary, and the
devices I have with me vary as well.  With reflowing content it
doesn't matter which device I happen to be using at the time, the work
continues unabated.

But when I encounter a PDF while using screen less than 8.5" wide, the
need to constantly zoom in and out and scroll back and forth so slows
progress that it kills the joy of research, bringing the work to a
halt until I can get to a device that happens to emulate size
characteristics of paper, even though I'll never print anything I'm
reading.

Curious if I'm alone with the time I spend on smaller screens led me
to research that as well.  And it turns out I'm far from alone; it's
where people are spending most of their computing time these days.
And since this trend is driven largely by people younger than me it
seems unlikely to slow down, at least until the next displacing form
factor comes along (but then we'll be doing something entirely
different still).

Right so the problem is nothing to do with PDF, it is to do with thefact that humans work better designing things at fixed width and thegeneral tools which people learn to use, and continue to use supportthis frame of mind.

If a document is any more than 'just text' (as in something which can berendered using a single font independent of page width) then requiringdocuments to work at any layout width means the author has to abstractand then instruct a tool to preserve that.

Certainly for many individual cases of 'document type' you can mechanizeand assist; however, then the authors need to be aware of precisely whatdocument type they are producing, and learn how to instruct a tool toencode content for that document type.

I'd like to be optimistic here, but I honestly don't think this is aproblem with tooling - semantic representation of content has beenaround for as long as I have (probably longer), I was playing withsystems which offered it when I was in my teens; and yet in my entirelife since then I still see the majority of documents produced usingword processors, or similar 'unconstrained' tools.

The problem I think is that humans don't like to be constrained whenwriting - any tool which appears to constrain what they can do in whatthey think (at the time) is an unreasonable way tend to be considered tobe 'bad'. However, to achieve the goal of representing content in acontextual manner (relative to some abstract pattern which can beprocessed in the ways necessary to free us of fixed width layout, inthis case) constraints are absolutely necessary.

Admittedly the rise of the web, and particularly HTML/CSS means we havean ever increasing body of practitioners who do have to think about thepatterns of content, rather than just the content, but the knowledgethey have and are able to apply has been hard won and learned by them(just like any other domain specific endeavour).

Different tools for different jobs indeed.  Not everything is a nail,
but the combination of technological inertia combined with an an
acceptance among the majority of people who are not inventors of
making due with whatever tool is handed to them, we keep using hammers
to drive screws.

Ideally all content would be represented at a semantic level requisiteto its context.

e.g. Why use anything other than ASCII text, if your text can beentirely represented using ASCII?

... in exactly the same way as the author intended.


This is the only part of what you wrote I disagree with, if we were to
try it on as a general rule.

Writing is the flow of ideas from one mind to another, encoded in
streams of text.

Line breaks are often a meaningful part that communication, and on
occasion page breaks as well.

But for most writing, aside from perhaps code and poetry, column width
is rarely a semantic consideration at all.  Even printed books come in
different sizes.


By general do you mean either:

  - for a 'high' percentage of cases

  - for all cases

I'm guessing you meant the former - I was talking about the latter.

The point is that there is no general rule - I can guarantee for everyconstraint which you add to a system for representation of content,there will be numerous (entire families in fact) of existing exampleswhich cannot fit into it. Similarly, what you will find is that if asystem is required to be used, then people will find a way to 'workaround' the constraints - leaving you back where you started - i.e. yoursystem will work exceptionally well for things written precisely to workwith it; but poorly for the rest, and over time the poor cases willstart to become a noticeable percentage of content.

As people who write software, we have the ability to create abstractrepresentations of content but the problem is mapping the concrete formto the abstract - particularly when we live in a world where concreteforms abound in their billions, and entire workflows are centered aroundit. Any system which can't deal with the concrete or interoperate withit is unlikely to ever gain a huge amount of traction.

From that point of view, I do think ePub is a bit of a 'red herring'here - it isn't really anything 'more' than a container format, with areasonable way to encode indicies/document structure. Internally it usesthe web technologies, which are good for reflowing text, certainly, butyou still need to generate the HTML/CSS etc. and it is the mapping from'what I want to say' to 'how do I encode it in a way which works in allthe ways other people want it to' which is the hard part.

I'm sure things like ePub will help a bit - at least it is trying toinstigate some bounds on communication of such things - however, I dostrongly suspect it will become a technical detail which is largelyirrelevant at some point though.

After all, what the world perhaps needs (rather than another fileformat) is a way to take the existing forms of how we communicate andturn them into a form which is more amenable to modern usage patternsmechanically. (i.e. A system which turns a PDF into a re-flowabledocument).


Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: PDF

Reply via email to