Hello,

I'm not sure if that is the same util I used on linux but isn't there a -raw switch for that one?

Gena

On 25/01/2013 23:26, Esther wrote:
Hi Alex, Dan, Gena, Annie, and Others,

I can adapt Dan's suggestion of using pdftotext to an AppleScript by putting a wrapper 
around the command line instruction, so you can select files in Finder and run the 
AppleScript in order to create .html versions of the selected files in the same 
directory.  This might work for his purposes, but the problem is the requirement 
"I'd need line spacing preserved, and navigation could be made into headings."

It is actually difficult to find a program that maintains formatting in converting back 
to HTML, especially of table data in PDF files, although that might depend on how the PDF 
file was generated.  The test case I use for this is the HTML page for Appendix A of the 
"VoiceOver Getting Started Guide" at:
http://help.apple.com/voiceover/info/guide/10.8/English.lproj/index.html
You'll notice that the columns of shortcut keys and associated actions read 
correctly when you use the web page.  If you print out a PDF version of the 
page, VoiceOver reads the entries under the first column of shortcuts and then 
the entries for the column of associated actions.  This does not get fixed if 
you convert back to HTML with pdftotext (or many other programs).

The best solution I could find, in response to a question from a member of the 
mac-access list whose bank statement was delivered in PDF format, was to get 
the trial version of Wondershare PDF Converter or PDF Converter Pro for Mac 
from the developer's web site.  This had a limit of 5 pages for conversion in 
the trial version, which was suitable for bank statements, but not in general.  
Note, this is not a general recommendation for that software, but it worked for 
that specific purpose.  To read more about this and usage tests, see the mail 
archive link for my mac-access link post at:
• Re: Tables in PDF documents
http://www.mail-archive.com/mac-access%40mac-access.net/msg11985.html

To get back to this specific suggestion of pdftotext, I'll post the link to my 
earlier recommendation of pdftotext to Dan, which gave some suggestions for 
alternatively using this either as an AppleScript or Automator action as well 
as other notes about the application:
• pdftotext utility [was Re: Xpdf for mac]
http://www.mail-archive.com/macvisionaries%40googlegroups.com/msg61916.html

The AppleScript described there for HTML conversions can be adapted for similar 
use (e.g., highlight files in Finder, then run the AppleScript to batch convert 
files without having to use Terminal.)

I'll paste in the AppleScript below my signature starting below the line "---Cut Here---" and 
ending with the line "end run". You can save it from the AppleScript editor under a name of your 
choice, like "PDF to HTML".

HTH.  Cheers,

Esther

---Cut Here---
(*
Use pdftotext to create an HTML version of the selected PDF file
      Created 25 January 2013; modifeid from PDF to Text AppleScript of 17 May 
2011
*)
on run
        tell application "Finder"
                set chosenFile to the selection as alias
        end tell
        do shell script "/usr/local/bin/pdftotext -htmlmeta " & quoted form of 
POSIX path of chosenFile
end run




On Jan 25, 2013, at 8:32 AM, - wrote:


In terminal one can use the pdftotext program found at:

http://www.bluem.net/en/mac/packages/

The command to convert to html is:

pdftotext file.pdf -htmlmeta

The converted file has a html extension.  The original files are retained as 
pdf.

This can be put in a script with a loop to convert all pdf files in a directory.

XB



--
"If you want someone who thinks outside the box, hire someone who lives outside the box" Barbara Otto

--
You received this message because you are subscribed to the Google Groups 
"MacVisionaries" group.
To post to this group, send email to macvisionaries@googlegroups.com.
To unsubscribe from this group, send email to 
macvisionaries+unsubscr...@googlegroups.com.
Visit this group at http://groups.google.com/group/macvisionaries?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to