Hi Alex, Dan, Gena, Annie, and Others,

I can adapt Dan's suggestion of using pdftotext to an AppleScript by putting a 
wrapper around the command line instruction, so you can select files in Finder 
and run the AppleScript in order to create .html versions of the selected files 
in the same directory.  This might work for his purposes, but the problem is 
the requirement "I'd need line spacing preserved, and navigation could be made 
into headings."

It is actually difficult to find a program that maintains formatting in 
converting back to HTML, especially of table data in PDF files, although that 
might depend on how the PDF file was generated.  The test case I use for this 
is the HTML page for Appendix A of the "VoiceOver Getting Started Guide" at:
http://help.apple.com/voiceover/info/guide/10.8/English.lproj/index.html
You'll notice that the columns of shortcut keys and associated actions read 
correctly when you use the web page.  If you print out a PDF version of the 
page, VoiceOver reads the entries under the first column of shortcuts and then 
the entries for the column of associated actions.  This does not get fixed if 
you convert back to HTML with pdftotext (or many other programs).

The best solution I could find, in response to a question from a member of the 
mac-access list whose bank statement was delivered in PDF format, was to get 
the trial version of Wondershare PDF Converter or PDF Converter Pro for Mac 
from the developer's web site.  This had a limit of 5 pages for conversion in 
the trial version, which was suitable for bank statements, but not in general.  
Note, this is not a general recommendation for that software, but it worked for 
that specific purpose.  To read more about this and usage tests, see the mail 
archive link for my mac-access link post at:
• Re: Tables in PDF documents
http://www.mail-archive.com/mac-access%40mac-access.net/msg11985.html

To get back to this specific suggestion of pdftotext, I'll post the link to my 
earlier recommendation of pdftotext to Dan, which gave some suggestions for 
alternatively using this either as an AppleScript or Automator action as well 
as other notes about the application:
• pdftotext utility [was Re: Xpdf for mac]
http://www.mail-archive.com/macvisionaries%40googlegroups.com/msg61916.html

The AppleScript described there for HTML conversions can be adapted for similar 
use (e.g., highlight files in Finder, then run the AppleScript to batch convert 
files without having to use Terminal.)

I'll paste in the AppleScript below my signature starting below the line 
"---Cut Here---" and ending with the line "end run". You can save it from the 
AppleScript editor under a name of your choice, like "PDF to HTML".

HTH.  Cheers,

Esther

---Cut Here---
(*
Use pdftotext to create an HTML version of the selected PDF file
     Created 25 January 2013; modifeid from PDF to Text AppleScript of 17 May 
2011
*)
on run
        tell application "Finder"
                set chosenFile to the selection as alias
        end tell
        do shell script "/usr/local/bin/pdftotext -htmlmeta " & quoted form of 
POSIX path of chosenFile
end run




On Jan 25, 2013, at 8:32 AM, - wrote:

> 
> In terminal one can use the pdftotext program found at:
> 
> http://www.bluem.net/en/mac/packages/
> 
> The command to convert to html is:
> 
> pdftotext file.pdf -htmlmeta
> 
> The converted file has a html extension.  The original files are retained as 
> pdf.
> 
> This can be put in a script with a loop to convert all pdf files in a 
> directory.
> 
> XB
> 

-- 
You received this message because you are subscribed to the Google Groups 
"MacVisionaries" group.
To post to this group, send email to macvisionaries@googlegroups.com.
To unsubscribe from this group, send email to 
macvisionaries+unsubscr...@googlegroups.com.
Visit this group at http://groups.google.com/group/macvisionaries?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to