RiverWind wrote: > The idea was to concat a large html file and then convert it to > text. The pdf can be converted to text, and it so far seems like a > pretty viable translation.
If I were going to do that for myself I would convert each individual html file to text first and then concatenate the individual text files. The reason being that the individual html files are at that moment completely consistent. Individually they should be able to convert to text cleanly with no problems. And then the text can be concatenated. But once you concatenate the html then you have created a Frankenstein html file that is almost certainly going to be problematic to convert to text. Also, my naive experience with this is that converting html to text is a lot easier than converting pdf to text. With html it is already a text type. The mime type is "text/html" after all. But pdf has been less accessible for conversions for me. The mime time is "application/pdf" and isn't a text type. That introduces more room for error to be introduced. Bob
signature.asc
Description: Digital signature