On Apr 21, 2009, at 6:25 AM, Rui Carneiro wrote:
Anyone know some good libraries to handle the content of files like pdf, ppt, doc, etc? I am already indexing attachments all I need now is extractthe text of them.
I've no idea, but you could at least look at some of the other full text search engines. I remember them advertising indexing support for all kinds of formats. Maybe they're using some specific library or maybe it would be easy to extract their parsing code.