up.
Finding converters from HTML/DOC to plain text shouldn't be too hard.
You could also try to find a commercial document conversion vendor, or
try to convert HTML and DOC both to PDF so you'll only have to deal with
PDF-to-text extraction in the end.
Good luck!
Artjom
--
Artjom Simon
Just this week I was researching the current market of queuing
solutions, and PgQ showed up on the radar with slides from a distant
past [0], and unfortunately the Wiki article [1] that linked to [2]
needs an update since the link doesn't work anymore :/
I'd love if somebody could give a bit