R.H. wrote:

> To extract text from a PDF document, I am using a command line tool on
> Windows which is available also for Linux based systems called Xpdf.
>
> It was working well, using shell() on LiveCode Community 8x, but
> tested only in the IDE on Windows.

A good tool.  Thanks.


> I needed this since some people had sent huge lists of numerical data
> in PDF which were impossible to extract, and the manual method could
> taken weeks.

Given PDF's role as a delivery vehicle, it's most commonly an extra step added to the end of a publishing process.

Have you asked that data provider if they have the data available in the format it was in before they went to that extra final step to convert it to PDF?


> Nevertheless, I can not see that PDF will lose ground as the standard
> for many years to come. There are possibly billions of documents in
> PDF around?

Postel's Law is worth quoting here:

       "Be liberal in what you accept,
        and conservative in what you send."

The need to *read* PDFs will remain for a very long time. Adobe's power and influence have made the cumbersome, inflexible, and expensively complex format almost ubiquitous during the advent of the PC era, leaving a vast collection of legacy documents that will continue to encumber consumers and developers alike for at least a decade to come, likely longer.

A great many households still have a VHS player. Old formats take a long time to die, and never completely go away.

But that's for reading.

Choosing what our apps output offers us an opportunity to consider modern workflows in a world where the majority of time spent with computing devices is on screens too small to read PDFs comfortably.

Computing has taken us to a place where device size is varied and usability is often a far more significant product differentiator than algorithms.

It seems useful to encourage developers looking to distinguish their apps for modern audiences to consider output formats that integrate well across the full mix of devices we use.

EPub won't likely be a de-facto requirement for years. But market differentiation isn't about waiting to play catch-up.


> What should replace it? And people are still printing.

EPub is printable for the ever-smaller number of documents requiring tree death just to be read.

EPub uses HTML, tucked inside a common Zip container like so many other formats (docx, xlxs, odoc, GarageBand, APKs, etc.). The developer expense in dealing with the format is a small fraction of what's required for dealing with PDF.

Nearly everything you can do in a browser can be done in EPub. Indeed, we're beginning to see EPub reader extensions for browsers, and I suspect it won't be long until we see native EPub support directly in most popular browsers.

Like PDF, EPub files are normally readable by all, and like PDF EPubs can be password-protected when DRM is needed.

But unlike PDF, EPub inherits HTML's ability to reflow content for dynamic page rendering.

Many millennials don't have a laptop or desktop computer at all. Mobile-exclusive workflows are common across all age groups throughout much of the world. And the over-40 crowd everywhere appreciates the ease with which text can be dynamically resized so they don't need to reach for their reading glasses quite so often.

Given the breadth of HTML tools and experience available, along with the common acceptance of Zip as a wrapper delivering format for multi-part documents, EPub seems well placed to serve modern multi-device needs with far less expense for developers and a much better experience for end users.

So sure, we'll be *reading* PDFs even longer than VHS players will continue taking up space in our livings rooms.

But if you're looking to distinguish your app from lackluster competitors, adding EPub support for both reading and writing is worth considering.

--
 Richard Gaskin
 Fourth World Systems
 Software Design and Development for the Desktop, Mobile, and the Web
 ____________________________________________________________________
 ambassa...@fourthworld.com                http://www.FourthWorld.com

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to