Re: SourceForge.net Lucene based search announcement

2005-05-25 Thread Erik Hatcher
Congrats, Chris! This was no doubt a big effort. I didn't do much but lend some moral support. There was an odd issue you had with certain types of queries at one point - did you get that resolved? Erik On May 25, 2005, at 3:40 PM, Chris Conrad wrote: Hello, I just wanted to let e

Re: Lucene - PDFBox

2005-05-25 Thread Thomas X Hoban
Thanks for your help. I was using 0.7.0. However, I installed 0.7.1 and get the same result with the ExtractText utility. I will post an issue with the PDFBox sourceforge site. Tom - Original Message - From: "Ben Litchfield" <[EMAIL PROTECTED]> To: Sent: Wednesday, May 25, 2005 5:

Re: Lucene - PDFBox

2005-05-25 Thread 田春峰
hi, I agree with Ben Litchfield, Before feed extracted text into lucene indexer , should ched the extracted text ,and for me , now using java org.pdfbox.ExtractText to get the text in pdf . [quote] "Ben Litchfield" <[EMAIL PROTECTED]> Can you run the following command line applica

Re: Lucene - PDFBox

2005-05-25 Thread Ben Litchfield
There were some fixes around extra spaces in the 0.7.1 version of PDFBox, if you are not using that version please try it, otherwise post an issue on the PDFBox sourceforge site. http://sourceforge.net/tracker/?group_id=78314&atid=552832 Thanks, Ben On Wed, 25 May 2005, Thomas X Hoban wrote:

Re: Lucene - PDFBox

2005-05-25 Thread Thomas X Hoban
In creating the index, the code passes StandardAnalyzer to the IndexWriter constructor. - Original Message - From: "Chris Fraschetti" <[EMAIL PROTECTED]> To: Sent: Wednesday, May 25, 2005 4:53 PM Subject: Re: Lucene - PDFBox Also, which analyzer are you using when indexing your docu

Re: Lucene - PDFBox

2005-05-25 Thread Thomas X Hoban
Thanks for replying. When I run the command, it generates a file with a "txt" extension. The text in this file has spaces interspersed in odd spots. Here is output from a file I ran the command on... Marc h 29, 2005 Hello t here m y good friend. HELLO Legal Soft w are is GOOD. I woul

Re: Lucene - PDFBox

2005-05-25 Thread Chris Fraschetti
Also, which analyzer are you using when indexing your documents? On 5/25/05, Ben Litchfield <[EMAIL PROTECTED]> wrote: > > Can you run the following command line application on the PDF to verify > that the extracted text is correct > > java org.pdfbox.ExtractText > > Ben > > > > On Wed, 25

Re: Lucene - PDFBox

2005-05-25 Thread Ben Litchfield
Can you run the following command line application on the PDF to verify that the extracted text is correct java org.pdfbox.ExtractText Ben On Wed, 25 May 2005, Thomas X Hoban wrote: > > > First, I am new to Lucene. > > Is there anyone out there who has had trouble getting hits when running

Lucene - PDFBox

2005-05-25 Thread Thomas X Hoban
First, I am new to Lucene. Is there anyone out there who has had trouble getting hits when running phrase queries against an index that contains content from PDF files. For PDF documents, I create the document using LucenePDFDocument.getDocument(file) and then add it to the index. For n

SourceForge.net Lucene based search announcement

2005-05-25 Thread Chris Conrad
Hello, I just wanted to let everyone know that we've officially announced that the new SourceForge.net search system is based on Lucene. It's been in operation for over a month now and we're very happy with it. I'd also like to personally thank Erik Hatcher for helping me out during dev

Re: search optimization - help

2005-05-25 Thread Paul Elschot
On Wednesday 25 May 2005 11:21, Kapil Chhabra wrote: > 1. My application requires documents to be sorted on one of my indexed > fields everytime. > I use the hits.setSort() method to specify the field. > In short my application will never use the scores generated by lucene > search. > Is calculatin

Re: Finding docs which contain at least x of the queryterms

2005-05-25 Thread Paul Elschot
On Wednesday 25 May 2005 13:00, Barbara Krausz wrote: > > > > Hi, > > Consider a Query with e.g. 4 terms (t1,t2,t3,t4). I want to retrieve all > documents which contain at least e.g. 3 of the queryterms. How can I > implement this? > The first idea is to use BooleanQueries such as > (t1 and t2

Re: *term (SuffixQeuries)

2005-05-25 Thread sergiu gordea
Hi all, I send this email to make a correction to the solution that enables SuffixQeuries The definition of the WILDTERM was a buggy one, it splitted a term in two terms e.g "term:te*st" was parsed to "term:te* term:st", of course this was wrong. HERE is the right way to do it ...

Re: How to navigate through indexed terms

2005-05-25 Thread Grant Ingersoll
This isn't totally what you want, but is a intermediate step between going through all terms is something like what is in Luke. In Luke on the Documents tab, you can put in a single letter in the Browse by term field and then hit "next term" and it will give you the next term, which you could th

Re: Using Highlighter to highlight entire HTML documents?

2005-05-25 Thread Dan Funk
I wrote a very simple sax parser for our xml content - I check for the search tokens (analyzer.tokenStream) in the text and place a span tag around each found token. This process could work well with xhtml as well. In other words, I could never get the highlighter to do what I wanted to

Re: Finding docs which contain at least x of the queryterms

2005-05-25 Thread Erik Hatcher
On May 25, 2005, at 7:00 AM, Barbara Krausz wrote: Hi, Consider a Query with e.g. 4 terms (t1,t2,t3,t4). I want to retrieve all documents which contain at least e.g. 3 of the queryterms. How can I implement this? The first idea is to use BooleanQueries such as (t1 and t2 and t3 and t4) or

Finding docs which contain at least x of the queryterms

2005-05-25 Thread Barbara Krausz
Hi, Consider a Query with e.g. 4 terms (t1,t2,t3,t4). I want to retrieve all documents which contain at least e.g. 3 of the queryterms. How can I implement this? The first idea is to use BooleanQueries such as (t1 and t2 and t3 and t4) or (t1 and t2 and t3) or(t1 and t2 and t4) or (t1 and

RE: Query.toString(0 does not escape special characters

2005-05-25 Thread Peter Gelderbloem
Yeah, That works Thanks Peter Gelderbloem -Original Message- From: Chris Lamprecht [mailto:[EMAIL PROTECTED] Sent: 24 May 2005 18:16 To: java-user@lucene.apache.org Subject: Re: Query.toString(0 does not escape special characters Hi Peter, See the method escape(String s) of QueryParser,

search optimization - help

2005-05-25 Thread Kapil Chhabra
1. My application requires documents to be sorted on one of my indexed fields everytime. I use the hits.setSort() method to specify the field. In short my application will never use the scores generated by lucene search. Is calculating scores a overhead? Can I skip the process somehow? 2. let C

RE: tf=0 while lucene is finding matches?

2005-05-25 Thread M. Mokotov
Got you :-) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Hostetter Sent: Wednesday, May 25, 2005 8:48 AM To: java-user@lucene.apache.org Subject: RE: tf=0 while lucene is finding matches? : I believe I do use the index number for the explain(),

How to navigate through indexed terms

2005-05-25 Thread Antoine Brun
Hello, I am currently looking for a way to navigate forward and backward among the indexed terms. For example, given a Term t, I would like to be able to get the next 10 terms or the previous 10 ones. Getting the next terms is quite straitforward, using the terms(Term t) method from IndexRea