On Thu, Sep 22, 2011 at 11:01 AM, Sharon Kimble <skimbl...@gmail.com> wrote: > I have a 96 page pdf file that I need to convert to text in one run. > I've imported it into inkscape but that only converts one page at a > time. I've tried using pdftotext but i cant work out the syntax for > that so am unable to test it out properly. I've tried pdfedit but that > only works on one page at a time and doesnt convert it to text. > > Can anyone help me out with suggestions for converting the pdf in one > go to text please? > > Many thanks > Sharon. > --
Use pdftotext if you want it converted to plain text. Like this : pdftotext -layout /path/to/pdffile.pdf /path/to/textfile.txt or if you want it to be html (text only) : pdftotext -format -htmlmeta /path/to/pdffile.pdf /path/to/textonlyHTMLfile.html If you want to save images, colors and other formatting as well, then you can convert only to html. Use pdtohtml for that. Note that pdfto html is memory intensive. To convert to a single html file for the content : pdftohtml -p -nodrm /path/to/pdffile.pdf /path/to/htmlfile.html this actually creates 3 html files : htmlfile.html - the main file to view htmlfiles.html - the full converted single html file htmlfile_ind.html - Navigation page. To convert to multiple html files (one html file for each page) : pdftohtml -c -p -nodrm /path/to/pdffile.pdf /path/to/htmlfile.html this create 2 main files along with one html page for each page in the book : htmlfile.html - the main file to view htmlfile_ind.html - the navigation page Keep in mind that pdftohtml is memory intensive and creating a single paged html file is extremly memory intensive. -- The mysteries of the Universe are revealed when you break stuff. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/CAM8yCh_MkjWBiRRmrswQ+kMd6n5aBtda1O-LPF8XqqiK=fk=z...@mail.gmail.com