On Apr 21, 2009, at 6:25 AM, Rui Carneiro wrote:

Anyone know some good libraries to handle the content of files like pdf, ppt, doc, etc? I am already indexing attachments all I need now is extract
the text of them.

I've no idea, but you could at least look at some of the other full text search engines. I remember them advertising indexing support for all kinds of formats. Maybe they're using some specific library or maybe it would be easy to extract their parsing code.

Reply via email to