Might read this one too: http://stackoverflow.com/questions/1554280/extract-text-from-pdf-in-javascript
On Fri, Jul 8, 2016 at 10:48 AM, Mike Bonner <bonnm...@gmail.com> wrote: > Its ugly but, could you use pdf.js to extract the text in a browser > widget showing the pdf? > http://git.macropus.org/2011/11/pdftotext/example/ > > Not sure what else is in pdf.js but it looks interesting. > > On Fri, Jul 8, 2016 at 10:30 AM, Paul Dupuis <p...@researchware.com> > wrote: > >> On 7/8/2016 11:55 AM, Colin Holgate wrote: >> > I was trying an export as spreadsheet from Acrobat Pro, but that didn’t >> work. Doing a Save as Text from Acrobat Reader was more successful, but the >> columns come out in a different order, and some columns get combined into a >> single string. >> >> Over the few years, I have spent a ridiculous amount of time exploring >> PDF access via LiveCode is every way possible. Ultimately, for our needs >> we created the XPDF external and transferred it to LiveCode, but we >> explored javascript extraction from a browser. Interapplication >> communication, shell command line tools, etc., etc. >> >> The reality is the PDF format is great for visually representing a >> printed page and totally sucks for text content - that is actually >> getting the characters of the document rather than an image of the >> characters. >> >> There is NO really mapping of characters to their appearance in the PDF >> other than geometric position on the page. You get no font information, >> no size, no styles, zip. You get line breaks at the end of every visible >> line and you can get line breaks in what appears to be the middle of >> content depending upon how the original source document was rendered >> into a PDF. Headers and footers end up in the middle of paragraphs. You >> have no real way to tell a line break from a paragraph break and more. >> >> In truth a NEW portable document format needs to be invented that >> connects and preserves content to its appearance, but I suspect that >> people who want to keep both intact and portable are just using HTML5 >> and CSS3. >> >> >> _______________________________________________ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your >> subscription preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode >> > > _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode