Lucene parsing for PDF

2005-12-29 Thread Shyam Bhaskaran
Hi, I am working on a search project using Lucene and currently I am working on parsing PDF documents. I was successful in implementing my parser using Lucene and PDFBox. I have a doubt on how to exclude or (maybe delete) pages from the index. I am not sure how to do this.. I mean when exactly it

Re: Lucene parsing for PDF

2005-12-29 Thread Erik Hatcher
Shyam - I moderated your message through, so please subscribe to the list to send to it in the future. Please provide us with some details - a standalone RAMDirectory-using JUnit TestCase is the most ideal way to share an issue like this and have someone else take a look at it. And frequen

AW: Lucene parsing for PDF

2005-12-29 Thread Klaus
Hi, I think the easiest way is ro exclude the pages while you are parsing the pdf document. So you will provide just the necessary pages to lucene. Another solution is to create for each site an own document, this should hafe a field "pagenumber" or, und you can delete the document from the index

QueryParser over multiple fields

2005-12-29 Thread Gaston
Hallo, in my index every document consistsof multiple fields like url,contents,description etc.I want to search for documents in the url and the contents field. My problem is that the constructor of QueryParser only provide one field like "Query query=QueryParser.parse("query",field1,analyzer

Re: QueryParser over multiple fields

2005-12-29 Thread Erik Hatcher
On Dec 29, 2005, at 7:42 AM, Gaston wrote: in my index every document consistsof multiple fields like url,contents,description etc.I want to search for documents in the url and the contents field. My problem is that the constructor of QueryParser only provide one field like "Query query=Que

RE: QueryParser over multiple fields

2005-12-29 Thread Daan de Wit
Hi Gaston, Have a look at MultiFieldQueryParser. Greetings, Daan -Original Message- From: Gaston [mailto:[EMAIL PROTECTED] Sent: Thursday, December 29, 2005 13:42 To: java-user@lucene.apache.org Subject: QueryParser over multiple fields Hallo, in my index every document consistsof mu

Re: QueryParser over multiple fields

2005-12-29 Thread Gaston
Hallo Erik and Daan, thank you for the help. MultFieldQueryParser is described in chapter 5, I was searching in chapter 3. Sorry. Greetings and best wishes for the New Year 2006! Gaston Daan de Wit schrieb: Hi Gaston, Have a look at MultiFieldQueryParser. Greetings, Daan -Original

RE: QueryParser over multiple fields

2005-12-29 Thread Steven Pannell
Hi, You can also do it like this: QueryParser.parse("(summary:DOG OR title:DOG)", "title", getAnalyzer()); QueryParser.parse("(summary:DOG AND title:DOG)", "title", getAnalyzer()); The "title" is the default column, so no need to reference in the queryString eg: QueryParser.parse("(summary:DOG

Re: QueryParser over multiple fields

2005-12-29 Thread euw
> > Two options - MultiFieldQueryParser or building an aggregate single > field to search. I use the aggregate field option, which entails > building an additional field for each document, I call it "contents", > and index _all_ of the searchable text into that field. > > Erik How about a

Re: QueryParser over multiple fields

2005-12-29 Thread Erik Hatcher
On Dec 29, 2005, at 9:31 AM, [EMAIL PROTECTED] wrote: Two options - MultiFieldQueryParser or building an aggregate single field to search. I use the aggregate field option, which entails building an additional field for each document, I call it "contents", and index _all_ of the searchable tex

Correlating best fragments back to native documents - ?

2005-12-29 Thread Dmitry Goldenberg
Hello, I was wondering if anyone has seen or implemented the kind of solution where the best fragments generated by Lucene's Highlighter, are correlated back to the native documents such as PDF or MS Word. Basically, I want to be able to use native (or any other) API's to highlight Lucene's

Re: QueryParser over multiple fields

2005-12-29 Thread euw
> > > That's a perfectly good approach as well. I didn't mean to imply > that there were only "two options", just that the two I suggested > were the most commonly used ones. > > Erik > Ah, for a moment i thought i had overlooked a "canonical" solution, i'm quite new to lucene. Thanks. B