Hi Jim, > Do any of you know of a pdf-to-text converter that is better than > pdftotxt? pdftotxt does not preserve line breaks, table formatting, > displayed code, etc. Even the official .txt version of the previous > release of POSIX had many conversion-artifact errors.
I would use a good html-to-text converter. The best I know of is 'w3m'. Example: $ w3m -dump http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html Bruno