7 02:48
To: java-user@lucene.apache.org
Subject: Re: indexing pdfs
hi sachin the link wat u gave me only a zip file and an exe file for
downoad. and this zip file also contains no class files.but wouldn't we
be requiring a jar file or class file ???
On 3/8/07, Kainth, Sachin <[EMAIL PROTECT
---Original Message-
From: ashwin kumar [mailto:[EMAIL PROTECTED]
Sent: 08 March 2007 13:07
To: java-user@lucene.apache.org
Subject: Re: indexing pdfs
hi again
do we have to download any jar files to run this program if so can u
give me the link pls
ashwin
On 3/8/07, Kainth, Sachin <[E
Hi,
Here it is:
http://www.seekafile.org/
-Original Message-
From: ashwin kumar [mailto:[EMAIL PROTECTED]
Sent: 08 March 2007 13:07
To: java-user@lucene.apache.org
Subject: Re: indexing pdfs
hi again
do we have to download any jar files to run this program if so can u
give me the
t in-memory.
The only other way I have heard of is to use Ifilters. I believe
SeekAFile does indexing of pdfs.
Sachin
-Original Message-
From: ashwin kumar [mailto:[EMAIL PROTECTED]
Sent: 08 March 2007 11:35
To: java-user@lucene.apache.org
Subject: Re: indexing pdfs
Is the only way index
kumar [mailto:[EMAIL PROTECTED]
Sent: 08 March 2007 11:35
To: java-user@lucene.apache.org
Subject: Re: indexing pdfs
Is the only way index pdfs is to convert it into a text and then only
index it ???
On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:
>
> Hi Aswin,
>
> You can tr
Is the only way index pdfs is to convert it into a text and then only index
it ???
On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:
Hi Aswin,
You can try pdfbox to convert the pdf documents to text and then use
Lucene to index the text. The code for turning a pdf to text is very
simple:
For DOC files you can use the Jakarta POI library. Text extraction is
outlined here: http://jakarta.apache.org/poi/hwpf/quick-guide.html
Ulf
On 08.03.2007, at 10:37, ashwin kumar wrote:
hi can some one help me by giving any sample programs for indexing
pdfs and .doc files
---
Hi Aswin,
You can try pdfbox to convert the pdf documents to text and then use
Lucene to index the text. The code for turning a pdf to text is very
simple:
private static string parseUsingPDFBox(string filename)
{
// document reader
PDDocument doc = PDDocument.loa