Here's a Linux (ubuntu) system administration question for you. What is the best html -> latex converter? To help LyX users access the htmltolatex program,what is the best approach?
Explanation: I never tried to import an html file until yesterday, and I got the error from LyX indicating that java could not find htmltolatex.jar. While tracking that down, I was surprised that preferences for lyx converters assumed I had htmltolatex installed. The LyX preference was set "java htmltolatex.jar --input $$i --output $$o". I don't know why it is set that way! I don't have htmltolatex.jar installed, obviously it should fail. I'm reading the LyX configure script, and I see it checks for 3 possible converters, "html2latex", then "gnuhtml2latex", and then htmltolatex. Are these supposed to be in order of quality? I tried GNUhtml2latex first because there is an Ubuntu package for it. The imported file didn't look great in LyX. It appears to me that html2latex is a perl script and the homepage for it was last edited in 1998. Is htmltolatex better? Its web page is more up-to-date. Either that means it is not done yet or that it has great new features :) The sourceforge page for htmltolatex is http://htmltolatex.sourceforge.net. The download link the points to a tarball that does not have system administrator information. How is a person supposed to install this? $ ls build.xml config.xml gpl.txt htmltolatex.jar LICENCE.txt README.txt src classes config.xsd htmltolatex javadoc manual samples The file "htmltolatex" is a shell script that calls java on the indicated file, and it accesses htmltolatex.jar. Here's what I mean: ======================= $ cat htmltolatex #!/bin/sh if [ $# -lt 1 ]; then echo "Usage: $0 -input <input-HTML-file> -output <output-LaTeX-file> [-css <css-file-assigned-to-input file>] [-config <configuration-file>]" exit 1 fi java -jar htmltolatex.jar $@ ====================== I tested the C programmer approach. Put htmltolatex script in the path somewhere, and put the htmltolatex.jar file somewhere like /usr/share/htmltolatex, and then edit the htmltolatex script to adjust for the path? : java -jar /usr/share/htmltolatex/htmltolatex.jar $@ I tried that approach and it failed because it can't find other files it wants. Error: Cannot convert file ---------------------------------------- An error occurred whilst running htmltolatex -input 'News.html' -output 'News.tex' Fatal error: Can't load configuration. /home/pauljohn/config.xml (No such file or directory) Error: Cannot convert file ---------------------------------------- An error occurred whilst running htmltolatex -input 'New.html' -output 'New.tex' Then I copied config.xml into /usr/share/htmltolatex and modified the script by adding a -config file option. java -jar /usr/share/htmltolatex/htmltolatex.jar $@ -config /usr/share/htmltolatex/config.xml Horray, it runs from LyX with no crash. I have no way of knowing if it will work in other test cases. The LaTeX markup does include images, that is encouraging. HTML enumerated and bullet lists do come into LyX correctly. However, LyX can't compile the document. It complains about an undefined option in this ERT: \href{http://pj.freefaculty.org}{thing} I see the LyX Document->settings->pdf properties menu has a hypreref option, and once I enable that support, then the document will compile. That's awesome. Its a little encouraging, but still troublesome. Am I taking the best approach? It reminds me of a time about 5 years ago when I was trying to generate HTML from LyX. The default converters were tex4ht or latex2html or something like that, and we were debating about how to configure those programs, and somebody spoke up "hevea" works much better than either of those. Anyway, I wonder if people who have wrestled with html -> latex will speak up and let us know which html to latex converter works best, and if it is the Java one htmltolatex, can we hear how you install that on a multiuser system. PJ -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas