RE: indexing pdfs

2007-03-09 Thread Kainth, Sachin
7 02:48 To: java-user@lucene.apache.org Subject: Re: indexing pdfs hi sachin the link wat u gave me only a zip file and an exe file for downoad. and this zip file also contains no class files.but wouldn't we be requiring a jar file or class file ??? On 3/8/07, Kainth, Sachin <[EMAIL PROTECT

Re: indexing pdfs

2007-03-08 Thread ashwin kumar
---Original Message- From: ashwin kumar [mailto:[EMAIL PROTECTED] Sent: 08 March 2007 13:07 To: java-user@lucene.apache.org Subject: Re: indexing pdfs hi again do we have to download any jar files to run this program if so can u give me the link pls ashwin On 3/8/07, Kainth, Sachin <[E

RE: indexing pdfs

2007-03-08 Thread Kainth, Sachin
Hi, Here it is: http://www.seekafile.org/ -Original Message- From: ashwin kumar [mailto:[EMAIL PROTECTED] Sent: 08 March 2007 13:07 To: java-user@lucene.apache.org Subject: Re: indexing pdfs hi again do we have to download any jar files to run this program if so can u give me the

Re: indexing pdfs

2007-03-08 Thread ashwin kumar
t in-memory. The only other way I have heard of is to use Ifilters. I believe SeekAFile does indexing of pdfs. Sachin -Original Message- From: ashwin kumar [mailto:[EMAIL PROTECTED] Sent: 08 March 2007 11:35 To: java-user@lucene.apache.org Subject: Re: indexing pdfs Is the only way index

RE: indexing pdfs

2007-03-08 Thread Kainth, Sachin
kumar [mailto:[EMAIL PROTECTED] Sent: 08 March 2007 11:35 To: java-user@lucene.apache.org Subject: Re: indexing pdfs Is the only way index pdfs is to convert it into a text and then only index it ??? On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > Hi Aswin, > > You can tr

Re: indexing pdfs

2007-03-08 Thread ashwin kumar
Is the only way index pdfs is to convert it into a text and then only index it ??? On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: Hi Aswin, You can try pdfbox to convert the pdf documents to text and then use Lucene to index the text. The code for turning a pdf to text is very simple:

Re: indexing pdfs

2007-03-08 Thread Ulf Dittmer
For DOC files you can use the Jakarta POI library. Text extraction is outlined here: http://jakarta.apache.org/poi/hwpf/quick-guide.html Ulf On 08.03.2007, at 10:37, ashwin kumar wrote: hi can some one help me by giving any sample programs for indexing pdfs and .doc files ---

RE: indexing pdfs

2007-03-08 Thread Kainth, Sachin
Hi Aswin, You can try pdfbox to convert the pdf documents to text and then use Lucene to index the text. The code for turning a pdf to text is very simple: private static string parseUsingPDFBox(string filename) { // document reader PDDocument doc = PDDocument.loa