Hi Ashwin, Well in that case you might need to use Ifilters some other way instead of through SeekAFile. I don't know how since I haven't used it myself. Perhaps someone else here has.
Sachin -----Original Message----- From: ashwin kumar [mailto:[EMAIL PROTECTED] Sent: 09 March 2007 02:48 To: java-user@lucene.apache.org Subject: Re: indexing pdfs hi sachin the link wat u gave me only a zip file and an exe file for downoad. and this zip file also contains no class files.but wouldn't we be requiring a jar file or class file ??? On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > Hi, > > Here it is: > > http://www.seekafile.org/ > > -----Original Message----- > From: ashwin kumar [mailto:[EMAIL PROTECTED] > Sent: 08 March 2007 13:07 > To: java-user@lucene.apache.org > Subject: Re: indexing pdfs > > hi again > do we have to download any jar files to run this program if so can u > give me the link pls > > ashwin > > On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > > > Well you don't need to actually save the text to disk and then index > > the saved index file, you can directly index that text in-memory. > > > > The only other way I have heard of is to use Ifilters. I believe > > SeekAFile does indexing of pdfs. > > > > Sachin > > > > -----Original Message----- > > From: ashwin kumar [mailto:[EMAIL PROTECTED] > > Sent: 08 March 2007 11:35 > > To: java-user@lucene.apache.org > > Subject: Re: indexing pdfs > > > > Is the only way index pdfs is to convert it into a text and then > > only index it ??? > > > > > > > > On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > > > > > Hi Aswin, > > > > > > You can try pdfbox to convert the pdf documents to text and then > > > use > > > > Lucene to index the text. The code for turning a pdf to text is > > > very > > > simple: > > > > > > private static string parseUsingPDFBox(string filename) > > > { > > > // document reader > > > PDDocument doc = PDDocument.load(filename); > > > // create stripper (wish I had the power to do that - > > > wouldn't leave the house) > > > PDFTextStripper stripper = new PDFTextStripper(); > > > // get text from doc using stripper > > > return stripper.getText(doc); > > > } > > > > > > Sachin > > > > > > -----Original Message----- > > > From: ashwin kumar [mailto:[EMAIL PROTECTED] > > > Sent: 08 March 2007 09:37 > > > To: java-user@lucene.apache.org > > > Subject: indexing pdfs > > > > > > hi can some one help me by giving any sample programs for indexing > > > pdfs and .doc files > > > > > > thanks > > > regards > > > ashwin > > > > > > > > > This message has been scanned for viruses by MailControl - (see > > > http://bluepages.wsatkins.co.uk/?6875772) > > > > > > > > > This email and any attached files are confidential and copyright > > > protected. If you are not the addressee, any dissemination of this > > > communication is strictly prohibited. Unless otherwise expressly > > > agreed in writing, nothing stated in this communication shall be > > legally binding. > > > > > > The ultimate parent company of the Atkins Group is WS Atkins plc. > > > Registered in England No. 1885586. Registered Office Woodcote > > > Grove, Ashley Road, Epsom, Surrey KT18 5BW. > > > > > > Consider the environment. Please don't print this e-mail unless > > > you really need to. > > > > > > ------------------------------------------------------------------ > > > -- > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > -------------------------------------------------------------------- > > - To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]