Quoth Chad Perrin on Monday, 06 September 2010: > On Sun, Sep 05, 2010 at 10:31:54AM +0200, Erik Trulsson wrote: > > On Sun, Sep 05, 2010 at 08:57:11AM +0200, Roland Smith wrote: > > > On Sat, Sep 04, 2010 at 05:09:20PM -0600, Chad Perrin wrote: > > > > What PDF to HTML translators, other than pdftohtml, am I likely to be > > > > able to find in ports? I went looking for pdf2html, expecting to find > > > > that there, but no luck. Before I spend hours sifting through, still > > > > without knowing whether I missed something that should be obvious, > > > > > > Yes, you did. :-) > > Apparently not. See below. > > > > > > > > > I > > > > figured I'd ask here whether anyone knows of something off the top of > > > > his/her head. > > > > > > Try textproc/pdftohtml > > > > Uhm, he said "other than pdftohtml" so I suspect he already knew about > > that one. > > This is indeed the case. > > I appreciate the several suggestions I've received, though I see in > retrospect that I haven't been sufficiently specific, since I have not > gotten any suitable answers. > > I have "inherited" a Perl script that wraps pdftohtml. The reason a > wrapper is needed is that a substantial amount of cleanup work is needed > to produce HTML suitable to our final needs. The output of pdftohtml is > sufficiently far from "perfect" that I would like to test the output of a > few other possible "back ends" for the script to see if a significant > amount of work being done by the script can be eliminated. > > Toward that end, the simpler the tool the better -- and the tool on the > "back end" should not be something that must be contacted across a > network, or that cannot be redistributed freely. I wanted to start with > things I have in the base system on my FreeBSD laptop (where I'm doing my > development) or through ports. OpenOffice.org is quite a bit larger and > more unwieldy than we would really want to deal with at this point. > Using Google or Adobe tools online is well outside the range of what we > need (requiring network access for the tool to work). > > I've started looking at the Xpdf tools as well as pdftohtml. Other > suggestions from within ports would be appreciated. Additional options > other than what can be found in ports might also be useful, understanding > the needs I sketched out above. The script itself is Perl, in case that > matters. > > To everyone who has replied so far: thank you for your time. > > -- > Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]
How about print/p5-PDFLib and print/pecl-pdflib to roll your own? Maybe that's more work than you wanted. -- Sterling (Chip) Camden | sterl...@camdensoftware.com | 2048D/3A978E4F http://camdensoftware.com | http://chipstips.com | http://chipsquips.com
pgpBoOjYgQ0uf.pgp
Description: PGP signature