Re: Sort by relevance+distance

2005-09-19 Thread James Huang
Cool! Only one question: if we have class RelevanceAndDistanceCollector extends HitCollector { public ScoreDoc[] getMatches(int start, int size) { ... } } and a call of getMatches(1, 25); would not cache as many as 1+ docs, would it? Remember this is the whole point o

"Best-practice" in a web application

2005-09-19 Thread Magne Skjeret
Hi I am using lucene to index all my data, and it is working just great. I will now add search to a web application, so the index can actually be used, not just sit there. I know how to to this, but I have been going around thinking on what is the best practice. Speed is essential for me. 1. Ca

Re: large index sizes

2005-09-19 Thread Richard Littin
Hi Edward, We have indexed the MedLine data. We used the default StopAnalyzer on the full text fields (fields that are more than just dates or ids) and the default Keyword for the other fields. So the index has the short fields stored in it and just indexing for the larger fields. In our a

large index sizes

2005-09-19 Thread Edward Summers
I'm investigating possible alternatives for indexing/searching a very large dataset (2TB) of xml data from the pubmed database[1]. Does anyone have any experience working with indexes of this size? Granted the actual index size would be smaller than the source files, but I'm just curious h

storing inverted document as a field

2005-09-19 Thread jian chen
Hi, I am playing with Lucene source code and have this somewhat stupid question, so please bear with me ;-) Basically, I want to implement a custom ranking algorithm. That is, iterating through the documents that contains all the search keywords, for each document, retrieve its inverted docum

Re: Sort by relevance+distance

2005-09-19 Thread markharw00d
Here's an example I put together to illustrate the point. package distance; import java.io.IOException; import java.util.ArrayList; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lu

Invalid handle?

2005-09-19 Thread Daniel Russo
I'm trying to run a Lucene (1.4.3) index through an RMI server on a Windows machine, but I'm getting the following error when I try to read some (but not all) documents from the Hits object: SEVERE: java.io.IOException: The handle is invalid java.io.IOException: The handle is invalid

Relevance Feedback

2005-09-19 Thread Gusenbauer Stefan
Does anyone have experiences with relevance feedback and lucene or just knows some good websites? thx stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

live update of index used by Tomcat

2005-09-19 Thread Daniel Naber
Hi, I need to merge two indexes into one which is accessed by a Searcher in Tomcat. Tomcat keeps the searcher (or reader) open for good performance. However, on Windows you cannot delete a file when it's opened for reading, so I cannot do the merge while Tomcat is running and the reader is open

Re: Some problem with prefix wilcard search

2005-09-19 Thread Daniel Naber
On Monday 19 September 2005 18:24, Erik Hatcher wrote: > So what's the deal with this?  It looks like something is wrong with   > your environment if it cannot resolve java.io.Reader. There once was a problem that the import statement for this was missing in the .jj file and thus it's missing in

Re: Sort by relevance+distance

2005-09-19 Thread Jeff Rodenburg
This is interesting, one I had not considered. Mark - are there any code samples that implement this approach? Or maybe something similar in approach? thanks, jeff On 9/19/05, mark harwood <[EMAIL PROTECTED]> wrote: > > I think the HitCollector approach was fine but needed > a couple of changes

Re: Some problem with prefix wilcard search

2005-09-19 Thread Erik Hatcher
On Sep 19, 2005, at 11:03 AM, tirupathi reddy wrote: C:\LUCENE-CURRENT\SOURCE\lucene-1.4.3>ant -Djavacc.home=c:/javacc javacc Buildfile: build.xml init: javacc-check: javacc-StandardAnalyzer: invoke-javacc: [java] Java Compiler Compiler Version 3.2 (Parser Generator) [java] (type "ja

Re: Sort by relevance+distance

2005-09-19 Thread James Huang
I think this is probably the closest thing I like to/am able to do now. If I ever get to do this, I'll share the idea/code and seek review and suggestions. Thank you very much, Mark, and all others that have helped! -James mark harwood <[EMAIL PROTECTED]> wrote: I think the HitCollector appro

Re: Some problem with prefix wilcard search

2005-09-19 Thread tirupathi reddy
Hello Erik, The output from ant command is : C:\LUCENE-CURRENT\SOURCE\lucene-1.4.3>ant Buildfile: build.xml init: [mkdir] Created dir: C:\LUCENE-CURRENT\SOURCE\lucene-1.4.3\build [mkdir] Created dir: C:\LUCENE-CURRENT\SOURCE\lucene-1.4.3\dist compile-core: [mkdir] Created dir: C:

Re: How do I implement "find documents like document x."

2005-09-19 Thread Grant Ingersoll
I believe there a several ways of doing it. You can use the MoreLikeThis contribution at http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/similarity or you can roll your own using the TermVector implementation. Basically, do your first search, get the term vector from the document you ar

How do I implement "find documents like document x."

2005-09-19 Thread Peter Gelderbloem
Hi I was wondering how would you search for documents similar to a specified document using Lucene? The context would be that I categorise document A manually, and then search for documents with similar terms. Hopefully the documents returned would be in the same category/theme as document A. The

Re: Sort by relevance+distance

2005-09-19 Thread mark harwood
I think the HitCollector approach was fine but needed a couple of changes: 1) use a PriorityQueue subclass in place of the SortedSet to keep only the top n scoring docs 2) multiply lucene score by a distance measurement based on the current doc's location (doc location being read from a cached arra

Re: Some problem with prefix wilcard search

2005-09-19 Thread Erik Hatcher
On Sep 19, 2005, at 4:41 AM, tirupathi reddy wrote: Hello, I am using Lucene for for searching in my application. My application needs prefix wildcard search also. But Lucene doesn't support this. So I changed in the QueryParser.jj file FROM: | (<_TERM_CHAR> | ( [ "*", "?"

Re: Lucene database bindings

2005-09-19 Thread mark harwood
>>does it deal w/ aggregate functions and group by >> clauses? Yes, it is basically *all* the normal SQL functionality but with the added option to mix in scores from lucene queries to the criteria. >From the example code: select top 10 count(*) as numAds,pricePounds from ads where pricePounds

Some problem with prefix wilcard search

2005-09-19 Thread tirupathi reddy
Hello, I am using Lucene for for searching in my application. My application needs prefix wildcard search also. But Lucene doesn't support this. So I changed in the QueryParser.jj file FROM: | (<_TERM_CHAR> | ( [ "*", "?" ] ))* > To: | | ( [ "*", "?" ] ))* > And then I build

Re: Sort by relevance+distance

2005-09-19 Thread Paul Elschot
On Sep 18, 2005, at 3:39 PM, James Huang wrote: > So the question is, is there a way to overriding score > calculation at runtime? In the lucene/search package, > I see interfaces like Scorer, Weight and methods like > Query.createWeight(). This looks promising. You indeed need to override the fol