Thanks Richard. This helps cut my search down considerably. I had already set up an ubuntu vm on my unraid server, so I should be able to get something going. Much appreciated.
On Sat, May 12, 2018 at 2:08 PM, Richard Gaskin via use-livecode < use-livecode@lists.runrev.com> wrote: > Mike Bonner wrote: > > > I haven't needed to do this before, but is there a (relatively) easy > > way to extract the text from a bunch of pdf files? I'm hoping I can > > build some indexes for the boatload of files I want to go through > > (THough, I guess I could bipass LC and just grep my heart out) > > > > Any suggestions? > > Long term: > > Per Postel's Law, reduce the stockpile of PDFs littering humanity's > infosphere by generating none except in the increasingly rare cases where > no other format is a better choice. > > PDF is an archaic format held over from the days when nearly all display > devices had screens at least as wide as a printed page. Back in the '90s, > when it was popularized, a fixed-size format emulating a printed piece of > paper was not an unreasonable thing to do. > > But times have changed. We rarely kill trees just to read anymore, so the > bounds of a printed page are approaching meaninglessness. > > This becomes critically important for delivering an enjoyable reading > experience when we consider that an ever-smaller minority of our time is > spent on screens large enough to accommodate that size. > > Many of our screens are much smaller, and moreover they vary enough to > make any single fixed size needlessly cumbersome. > > Attempting to read PDFs on a phone ranges from mildly annoying to > prohibitively frustrating. > > That unnecessary pain is easily replaced these days with modern formats > that reflow text to fit any of the many devices we might be using at any > given moment. > > There's a good argument for using EPub as that foundation. > > But that's a long-term solution, and while I believe it's an inevitability > as mobile use continues to grow it won't solve your need in the > here-and-now., so: > > > Short term: > > The Linux universe has many good command-line solutions available for > extracting text from PDFs easily and efficiently, like this one: > https://www.howtogeek.com/228531/how-to-convert-a-pdf-file- > to-editable-text-using-the-command-line-in-linux/ > > For those Win10 Pro users who can be convinced the tick a checkbox, the > entire universe of the Ubuntu shell is now available. > > macOS also includes utilities for this, but I don't believe the same ones > (at least not without installing an independent package manager like > Homebrew. > > -- > Richard Gaskin > Fourth World Systems > > > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode