term boosting has no effect with span queries ?

2005-05-10 Thread Vincent Le Maout
Hi, I'm trying to boost the score of documents containing some specific terms using the setBoost() method of the class SpanTermQuery (actually inherited from the class Query) but this seems to have no effect on the hits scores. Checking the way the scores are computed (by calling the explain met

How to get values that produced hits

2005-05-10 Thread Steve Rajavuori
I have some cases where a user submits a Boolean query that could have many terms -- e.g. "A=1 OR B=2 OR C=3", etc. When I retrieve the hits I want to be able to retrieve the value the produced the hit. In other words, I want to know if it was "A=1" that produced this hit, or was it "B=2". How c

Real time indexing with RAMDirectory

2005-05-10 Thread Rifflard Mickaël
Hi all, Is it possible, with the RAMDirectory (or another Directory), to "flush" informations after each Document indexing ? I tried this but this "flush" appears to be able to be made after 2 indexing at best. What do you think about it ? I forgot a configuration ? Thanks, Mickaël

Strange results using QueryParser (?)

2005-05-10 Thread Lilja, Bjorn
Hi, We have implemented a lucene search like this: registry = LocateRegistry.getRegistry(RMIAddress, RMIPort); searchables = new Searchable[] { (Searchable) registry.lookup(RMIIndexName)}; queryParser = new QueryParser(defaultField, new StandardAnalyzer()); Query query = queryParser.parse(queryS

only getting Hits with score >= threshold

2005-05-10 Thread Kai Gülzau
Hi, i'm trying to collect Documents whose (normalized) score is greater than a given threshold. But i don't know what is the smartest way to do so :) Do i have to subclass (Index)Searcher and override search(Query query, Filter filter, final int nDocs) to achieve this? Kai Gülzau ---

Re: Real time indexing with RAMDirectory

2005-05-10 Thread Otis Gospodnetic
Hi Mickaël, Have you tried using minMergeDocs=1 ? Will that do what you want? Otis --- Rifflard Mickaël <[EMAIL PROTECTED]> wrote: > Hi all, > > Is it possible, with the RAMDirectory (or another Directory), to > "flush" informations after each Document indexing ? > I tried this but this "flush

Re: Strange results using QueryParser (?)

2005-05-10 Thread Otis Gospodnetic
Hi, My guess is that the analyzer you use for indexing keeps the / (or perhaps documenttype is a Keyword field, while the StandardAnalyzer and QueryParser combination remove the / from the query string. Wildcards work because they are not analyzed: http://wiki.apache.org/jakarta-lucene/LuceneFAQ#

Re: only getting Hits with score >= threshold

2005-05-10 Thread Otis Gospodnetic
Hi Kai, You could use HitCollector for this: http://lucene.apache.org/java/docs/api/org/apache/lucene/search/HitCollector.html Here are some bits about HitCollector: http://www.lucenebook.com/search?query=hitcollector+score A custom HitCollector comes with the book, and you can download the sourc

Re: Distribution Strategies?

2005-05-10 Thread Doug Cutting
Steven J. Owens wrote: A friend just asked me for advice about synchronizing lucene indexes across a very large number of servers. I haven't really delved that deeply into this sort of stuff, but I've seen a variety of comments here about similar topics. Are there are any well-known approach

Re: Indexing in multi-threaded environment

2005-05-10 Thread Doug Cutting
Chris Lamprecht wrote: I've done exactly what you describe, using N threads where N is the number of processors on the machine, plus one more thread that writes to the file system index (since that is I/O-bound anyway). Since most of the CPU time is tokenizing/stemming/etc, the method works well.

sanity check - large, long running index updates and concurrent read-only service

2005-05-10 Thread Naomi Dushay
Context: our index is currently around 6 gig and takes about an hour just to optimize. Updating it, even in batches, can involve active updating for 15 or more minutes. Index updates are done with two different batch processes as there are currently two different workflows to update the index

Re: sanity check - large, long running index updates and concurrent read-only service

2005-05-10 Thread Yonik Seeley
Could you explain why you need to copy the index? It doesn't seem like that buys you anything (except maybe if the copy is to a physically separate disk) -Yonik On 5/10/05, Naomi Dushay <[EMAIL PROTECTED]> wrote: > Context: our index is currently around 6 gig and takes about an hour just to >

Re: only getting Hits with score >= threshold

2005-05-10 Thread Yonik Seeley
But only Hits normalizes scores AFAIK, so going the route of using a HitCollector means doing your own score normalization. The easiest, If you don't need sorting, is to use Hits and iterate over the docs until hits.score() is less than the threshold. Note that it may not make sense filtering by

Re: How to get values that produced hits

2005-05-10 Thread Yonik Seeley
a) if you don't care about scoring, split up the boolean query into term queries and do them individually. b) do term queries after the fact (or use a termdoc enumerator for a faster check). -Yonik On 5/10/05, Steve Rajavuori <[EMAIL PROTECTED]> wrote: > I have some cases where a user submits a

Re: expert question: concurrent, asynchronous batch updates and real-time reads on very large, heavily used index

2005-05-10 Thread Yonik Seeley
Once an IndexReader is opened on an index, it's view of that index never changes. Reuse the same IndexReader for all query requests and ony reopen it after you do your optimize. -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED]

RE: Splitting index into indexed fields and stored fields for performance

2005-05-10 Thread Monsur Hossain
> -Original Message- > From: Chris Lamprecht [mailto:[EMAIL PROTECTED] > Sent: Thursday, April 28, 2005 7:53 PM > > Since the "stored fields" index would basically just be a > database, perhaps this is better served using a traditional > relational database (or even use the OS's file s

RE: Real time indexing with RAMDirectory

2005-05-10 Thread Rifflard Mickaël
Hi Otis, My question was too much short cut. Here is a sample : import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexReader; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.analysis.standard.StandardAnalyzer; import java.io.IOException; public