Antiword would be hard to inject into Nutch as it is not Java based. It will
reqier native calls.
Alexander
2008/11/12 Sertic Mirko, Bedag <[EMAIL PROTECTED]>
> Hi
>
> You can also use a tool called "antiword" to extract the text from a .doc
> file, and then
> give the text to lucene.
>
> See he
Thank you,
It was really helpful. I also found some similar work being done in the
Nutch project.
Regards,
Dipesh
On Wed, Nov 12, 2008 at 12:52 PM, Dave Newton <[EMAIL PROTECTED]> wrote:
> --- On Tue, 11/11/08, dipesh wrote:
> > I wanted to know if there are classes in Lucene that support
> > pa
Dipesh,
Start here.
http://poi.apache.org/
John G.
-Original Message-
From: dipesh [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 11, 2008 8:38 PM
To: java-user@lucene.apache.org
Subject: Parsing MSWord
Hello,
I wanted to know if there are classes in Lucene that support parsing MSW
--- On Tue, 11/11/08, dipesh wrote:
> I wanted to know if there are classes in Lucene that support
> parsing MSWord documents.
Searching the web might help:
http://www.google.com/search?q=lucene+%2Bword
The Apache Tika project (http://incubator.apache.org/tika/) might also be of
interest.
Dav