Hello,
I'm not sure if that is the same util I used on linux but isn't there a
-raw switch for that one?
Gena
On 25/01/2013 23:26, Esther wrote:
Hi Alex, Dan, Gena, Annie, and Others,
I can adapt Dan's suggestion of using pdftotext to an AppleScript by putting a wrapper
around the command line instruction, so you can select files in Finder and run the
AppleScript in order to create .html versions of the selected files in the same
directory. This might work for his purposes, but the problem is the requirement
"I'd need line spacing preserved, and navigation could be made into headings."
It is actually difficult to find a program that maintains formatting in converting back
to HTML, especially of table data in PDF files, although that might depend on how the PDF
file was generated. The test case I use for this is the HTML page for Appendix A of the
"VoiceOver Getting Started Guide" at:
http://help.apple.com/voiceover/info/guide/10.8/English.lproj/index.html
You'll notice that the columns of shortcut keys and associated actions read
correctly when you use the web page. If you print out a PDF version of the
page, VoiceOver reads the entries under the first column of shortcuts and then
the entries for the column of associated actions. This does not get fixed if
you convert back to HTML with pdftotext (or many other programs).
The best solution I could find, in response to a question from a member of the
mac-access list whose bank statement was delivered in PDF format, was to get
the trial version of Wondershare PDF Converter or PDF Converter Pro for Mac
from the developer's web site. This had a limit of 5 pages for conversion in
the trial version, which was suitable for bank statements, but not in general.
Note, this is not a general recommendation for that software, but it worked for
that specific purpose. To read more about this and usage tests, see the mail
archive link for my mac-access link post at:
• Re: Tables in PDF documents
http://www.mail-archive.com/mac-access%40mac-access.net/msg11985.html
To get back to this specific suggestion of pdftotext, I'll post the link to my
earlier recommendation of pdftotext to Dan, which gave some suggestions for
alternatively using this either as an AppleScript or Automator action as well
as other notes about the application:
• pdftotext utility [was Re: Xpdf for mac]
http://www.mail-archive.com/macvisionaries%40googlegroups.com/msg61916.html
The AppleScript described there for HTML conversions can be adapted for similar
use (e.g., highlight files in Finder, then run the AppleScript to batch convert
files without having to use Terminal.)
I'll paste in the AppleScript below my signature starting below the line "---Cut Here---" and
ending with the line "end run". You can save it from the AppleScript editor under a name of your
choice, like "PDF to HTML".
HTH. Cheers,
Esther
---Cut Here---
(*
Use pdftotext to create an HTML version of the selected PDF file
Created 25 January 2013; modifeid from PDF to Text AppleScript of 17 May
2011
*)
on run
tell application "Finder"
set chosenFile to the selection as alias
end tell
do shell script "/usr/local/bin/pdftotext -htmlmeta " & quoted form of
POSIX path of chosenFile
end run
On Jan 25, 2013, at 8:32 AM, - wrote:
In terminal one can use the pdftotext program found at:
http://www.bluem.net/en/mac/packages/
The command to convert to html is:
pdftotext file.pdf -htmlmeta
The converted file has a html extension. The original files are retained as
pdf.
This can be put in a script with a loop to convert all pdf files in a directory.
XB
--
"If you want someone who thinks outside the box, hire someone who lives
outside the box" Barbara Otto
--
You received this message because you are subscribed to the Google Groups
"MacVisionaries" group.
To post to this group, send email to macvisionaries@googlegroups.com.
To unsubscribe from this group, send email to
macvisionaries+unsubscr...@googlegroups.com.
Visit this group at http://groups.google.com/group/macvisionaries?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.