Re: Lucene Search result (scoring )

2007-06-15 Thread Chris Hostetter
the "explain" method on a Searcher, and the Explanation classes can explain everything baout how/why a particular document in a particular index gets a particular score for a particular search. The only tricky thing about it is understanding that it refers to the "raw" scores (what you seem to be

Re: negative queries

2007-06-15 Thread Chris Hostetter
: The mailing list has already answered this question dozens of times. I've : been wondering lately, does this list have a FAQ? If so, is this question on : it? The wiki is open to editing by all. Here are a couple choice threads related to this topic which should be of interest both to the th

Re: How to Use ParallelReader

2007-06-15 Thread Chris Hostetter
: My question is: If I just want to update the small fields in one index : and do not want to update the large fields in another index, how can I : make sure these two indexes are synchronized and have the same document : number? the short answer: build them in the same order, use the exact same

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Mathieu Lecarme
Compass use a trick to manage father-son indexation. If you index "collection", with a fields Date, wich are the newest picture inside, and putting all picture's keyword to it collection? Then, with a keyword search, you will find the collection with the most tag occurence number and date s

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Antoine Baudoux
On 15 Jun 2007, at 19:07, Walt Stoneburner wrote: Antoine Baudoux writes: I want to be able to give a score to each collection. Keep in mind, Lucene is computing a score based on quite a number of things from how often a term is used in a document, how often it appears in the collection of doc

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Antoine Baudoux
Well maybe i didnt explain my problem very well. I have a database with over 3 million images, with each image belonging to one out of 300 possible collections. A query could return more than 100.000 images (for example if they search for a popular image keyword). I want to sort my result

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Mathieu Lecarme
Walt explain differently what I said. Lucene can be efficiently use for selecting objects, without sorting or scoring anything, then, with id stored in Lucene, you can sort yourself with a simple Sortable implementation. The only limit is that lucene gives you not too much results, with your

Re: negative queries

2007-06-15 Thread Paul Elschot
On Friday 15 June 2007 03:07, Antony Sequeira wrote: > Hi > I am aware that with Lucene I can not do negative only queries such as > -foo:bar > > But today I ran into an issue where I realized even queries such as > +foo:bar +(-goobly:doo) > also never return any results. Could you try this:

Re: Lucene Search result (scoring )

2007-06-15 Thread Donna L Gresh
Your examples are a little confusing to read. However, I think one thing that you need to know is that the score (by "default") depends on more than just the number of hits. It also depends on the length of the document the hits are in. For example, matching two words in a two-word-long documen

RE: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Walt Stoneburner
Antoine Baudoux writes: I want to be able to give a score to each collection. Keep in mind, Lucene is computing a score based on quite a number of things from how often a term is used in a document, how often it appears in the collection of documents, how long the query is, etc. If your concep

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Mathieu Lecarme
Your need is : >From a request you find images from images you get collections collections are sorted collections are returned you've got a lot of images, and 300 collections right? Antoine Baudoux a écrit : > I am very sorry, but i dont understand at all what you mean in > terms of Lucene

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Antoine Baudoux
I am very sorry, but i dont understand at all what you mean in terms of Lucene api. Could you drop a few lines of concrete code to help me understand? I'm quite new to lucene. Thanks! You sort only "collection", wich are 300. first step, you search query with lucene Map collecs wich com

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Antoine Baudoux
Hi, Another possibility is to re-think this a bit. You are "displaying documents one page at a time", which I take to mean you are displaying some number (say 50) document summaries per page. I'm also assuming that you want to display ALL documents from, say, collection 32 and then (and only th

Re: efficient way to filter out unwanted results

2007-06-15 Thread Jiye Yu
Thanks Antony for the idea. The only thing that may prevent it from working well is that the index is updated frequently so the docid to ext id or cache needs to be updated freq, which may affect the performance. Thanks again for your help. Antony Bowesman wrote: yu wrote: Thanks Sawan for

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Erick Erickson
Another possibility is to re-think this a bit. You are "displaying documents one page at a time", which I take to mean you are displaying some number (say 50) document summaries per page. I'm also assuming that you want to display ALL documents from, say, collection 32 and then (and only then) di

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Mathieu Lecarme
You sort only "collection", wich are 300. first step, you search query with lucene Map collecs wich come from any persisted stuff. Collection implement Sortable. Set bags = new HashSet(); iterate over hit bags.add(collecs.get(hit.getTheIdOfTheCollection)); you've got a bag with at most 300 elemen

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Antoine Baudoux
The problem is that i want lucene to do the sorting, because the query qould return thousands of results, and I'm displaying documents one page at a time. -- Antoine Baudoux Development Manager [EMAIL PROTECTED] Tél.: +32 2 333 58 44 GSM: +32 499 534 538 Fax.: +32 2 648 16 53 On 15 Jun 2007,

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Mathieu Lecarme
First step is to feed a Set with "collection" Second step is to sort it. With a sortedSet, you can do that, isnt'it? M. Antoine Baudoux a écrit : > Could-you be more precise? I dont understand what you mean. > > > > On 15 Jun 2007, at 17:20, Mathieu Lecarme wrote: > >> Your request seems to be

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Antoine Baudoux
Could-you be more precise? I dont understand what you mean. On 15 Jun 2007, at 17:20, Mathieu Lecarme wrote: Your request seems to be a two steps query. First step, you select image, and then collection Second step, you sort collection. BitVector can help you? M. Antoine Baudoux a écrit :

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Mathieu Lecarme
Your request seems to be a two steps query. First step, you select image, and then collection Second step, you sort collection. BitVector can help you? M. Antoine Baudoux a écrit : > Hi, > > I'm developping an image database. Each lucene document > representing an image contains (among ot

Re: FW: Lucene indexing vs RDBMS insertion.

2007-06-15 Thread Erick Erickson
From my perspective, this is an irrelevant question. The real question is "is Lucene indexing fast enough for my application?". Which nobody can answer for you, you have to experiment. If you're building an index that's only updated every 6 months, Lucene is certainly "fast enough". If you're re

Re: Wildcard query with untokenized punctuation (again)

2007-06-15 Thread Erick Erickson
On 6/14/07, Renaud Waldura <[EMAIL PROTECTED]> wrote: Thank you for this crystal-clear explanation Mark! > Are you sure you need a PhraseQuery and not a Boolean > query of Should clauses? Excellent question. What's the requirement, hey? Well, the requirement is to find documents referring to "

Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Antoine Baudoux
Hi, I'm developping an image database. Each lucene document representing an image contains (among other fields ): - a date field - a collection field containing the ID of the collection the image belongs to. I want to be able to give a score to each collection. Collecti

Re: negative queries

2007-06-15 Thread Steven Rowe
Hi Antony, Antony Sequeira wrote: > In the attached test file I am using string queries and showing the > failure case. The attachment didn't make it for some reason. > Basically I get the impression that I can not have a clause like > +(-x:y) anywhere in my query. What follows assumes that the

Re: negative queries

2007-06-15 Thread Steven Rowe
Daniel Noll wrote: > On Friday 15 June 2007 11:07:25 Antony Sequeira wrote: >> Hi >> I am aware that with Lucene I can not do negative only queries such as >> -foo:bar > > The mailing list has already answered this question dozens of times. I've > been wondering lately, does this list have a F

Fwd: Call for Papers Opens for OS Summit Asia 2007

2007-06-15 Thread Erik Hatcher
Begin forwarded message: From: J Aaron Farr <[EMAIL PROTECTED]> Call for Papers Opens for OS Summit Asia 2007 The call for papers is now open for OS Summit Asia, to be held November 26-30 at the Cyberport in Hong Kong. This joint conference between the Apache Software Foundation and the Ecl

Re: efficient way to filter out unwanted results

2007-06-15 Thread Antony Bowesman
yu wrote: Thanks Sawan for the suggestion. I guess this will work for statically known doc ids. In my case, I know only external ids that I want to exclude from the result set.for each search. Of course, I can always exclude these docs in a post search process. I am curious if there are oth

Re: FW: Lucene indexing vs RDBMS insertion.

2007-06-15 Thread Chris Lu
It's better to first understand what's the computation difference between Lucene Indexing and database insertiong. For Lucene Indexing need to stem all words out, sort them, save them to disk. And since Lucene is an incremental merge model, saved documents may need to merge and saved again. There

Re: efficient way to filter out unwanted results

2007-06-15 Thread yu
Thanks Sawan for the suggestion. I guess this will work for statically known doc ids. In my case, I know only external ids that I want to exclude from the result set.for each search. Of course, I can always exclude these docs in a post search process. I am curious if there are other more eff

FW: Lucene indexing vs RDBMS insertion.

2007-06-15 Thread Chew Yee Chuang
Hi, I’m a new user to Lucene, and heard that it is a powerful tool for full text search and I’m planning to use it in my project for data storage purpose. Before the implementation, I could like to know whether there is performance issue on Lucene indexing process. I have no doubt on the retrievin