Re: analyser

2006-04-11 Thread Daniel Noll
Raghavendra Prabhu wrote: While Indexing, I use a different Analyser While searching, I use a simple standard Analyzer Will this prevent me from getting the same best fragments when i do a search for two terms say term1 and term2 It depends on the differences, but in general you will always g

analyser

2006-04-11 Thread Raghavendra Prabhu
While Indexing, I use a different Analyser While searching, I use a simple standard Analyzer Will this prevent me from getting the same best fragments when i do a search for two terms say term1 and term2 Rgds Prabhu

Re: Lucene Seaches VS. Relational database Queries

2006-04-11 Thread Chris Hostetter
1) An inverted full text index is not a replacment for a relational database. 2) many people think they need a relational database, when all they really need is a well designed full text index. To get to some of your specific questions... : them in one field). One of the problems I see would b

Lucene Seaches VS. Relational database Queries

2006-04-11 Thread Ananth T. Sarathy
H, We have made documents out of the rows in our database and one of the team is suggesting that we abandon some of our database queries and instead use lucene. I think there are some fundamental problems with this especially when it comes to association tables (where there is a 1 one to many rela

Re: MultiReader and MultiSearcher

2006-04-11 Thread Doug Cutting
Peter Keegan wrote: Oops. I meant to say: Does this mean that an IndexSearcher constructed from a MultiReader doesn't merge the search results and sort the results as if there was only one index? It doesn't have to, since a MultiReader *is* a single index. A quick test indicates that it does

Re: MultiReader and MultiSearcher

2006-04-11 Thread Yonik Seeley
On 4/11/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > Oops. I meant to say: Does this mean that an IndexSearcher constructed from > a MultiReader doesn't merge the search results and sort the results as if > there was only one index? That's how I answered it. A single search is done... the "mergin

Re: MultiReader and MultiSearcher

2006-04-11 Thread Peter Keegan
Oops. I meant to say: Does this mean that an IndexSearcher constructed from a MultiReader doesn't merge the search results and sort the results as if there was only one index? A quick test indicates that it does merge the results properly, however there is a difference in the order of documents wi

Re: MultiReader and MultiSearcher

2006-04-11 Thread Yonik Seeley
On 4/11/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > Does this mean that MultiReader doesn't merge the search results and sort > the results as if there was only one index? Correct, it doesn't. It supports the lower level primitives like TermEnum and TermDocs that searches use to run. A term qu

Re: MultiReader and MultiSearcher

2006-04-11 Thread Peter Keegan
Does this mean that MultiReader doesn't merge the search results and sort the results as if there was only one index? If not, does it simply concatenate the results? Peter On 4/11/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > On 4/11/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > > Could you

Re: search.Similarity

2006-04-11 Thread Erik Hatcher
In case anyone misses the smiley, I'm just teasing. Yes, there are years of research and heavy duty experience that are behind Lucene. There are quite a number of research documents and books that describe information retrieval theory and practice, several of them linked here: <

Re: search.Similarity

2006-04-11 Thread Erik Hatcher
On Apr 11, 2006, at 1:46 PM, miki sun wrote: Is there any theory behind the similarity measure of Lucene? http://lucene.apache.org/java/docs/api/org/apache/lucene/search/ Similarity.html No, Doug just made it up with some random mathematical formulas, just for fun :) Erik --

Re: getting frequency of a phrase within documents

2006-04-11 Thread Chris Hostetter
if you use a custom SImilarity class, the tf(float) function is used for phrases to determine how the score should be determined based on the number of times the phrase qppears in the documents. if you make it an identity function, and modify the other functions in the Similarity to be (mostly) c

search.Similarity

2006-04-11 Thread miki sun
Hi there Is there any theory behind the similarity measure of Lucene? http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html Thanks Miki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional co

Re: What is the retrieval modle for lucene?

2006-04-11 Thread Chris Lamprecht
It uses a combination of boolean, to get the set of matching documents, and vector space (by default) to rank them. Or one might say it uses the vector space model, and only returns nonzero scoring documents. On 4/10/06, hu andy <[EMAIL PROTECTED]> wrote: > I have seen in some documents that ther

getting frequency of a phrase within documents

2006-04-11 Thread Vishal Bathija
Hi, I am using phraseQuery to get the number of documents that the query appers in using the hits. I would like to know if there is any way in which i can get the number of times a phrase appears within each document. I am currently using searching for the phrase "avoids deadlock" phraseQuery q

Re: MultiReader and MultiSearcher

2006-04-11 Thread Yonik Seeley
On 4/11/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > Could you explain why an IndexSearcher constructed from multiple readers is > faster than a MultiSearcher constructed from same readers? The "convergence layer" is a level lower for a MultiReader vs a MultiSearcher. A MultiReader is an IndexRe

RE: Distributed Lucene.. - clustering as a requirement

2006-04-11 Thread Dmitry Goldenberg
I guess Compass is probably the way to go - http://www.opensymphony.com/compass/ From: Prasenjit Mukherjee [mailto:[EMAIL PROTECTED] Sent: Tue 4/11/2006 2:45 AM To: java-user@lucene.apache.org Subject: Re: Distributed Lucene.. - clustering as a requirement Agre

Re: MultiReader and MultiSearcher

2006-04-11 Thread Peter Keegan
Yonik, Could you explain why an IndexSearcher constructed from multiple readers is faster than a MultiSearcher constructed from same readers? Thanks, Peter On 4/10/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > On 4/10/06, oramas martín <[EMAIL PROTECTED]> wrote: > > Is there any performance

Clusterization of searching

2006-04-11 Thread anton
What be way for clusterizations of searching? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Small field indexing and ranking

2006-04-11 Thread Daniel Naber
On Dienstag 11 April 2006 10:33, Nadav Har'El wrote: > This sort of proximity-influenced scoring is missing from > Lucene's QueryParser, and I've been wondering recently > on how it is best to add it, and whether it is possible to > easily do it with existing Lucene machinary, like the > SpanQuery

Re: Small field indexing and ranking

2006-04-11 Thread Nadav Har'El
"Maxym Mykhalchuk" <[EMAIL PROTECTED]> wrote on 11/04/2006 11:52:07 AM: > As for improving multi-word queries, Doug Cutting recently posted a link to > his presentation, > http://www.haifa.ibm.com/Workshops/ir2005/papers/DougCutting-Haifa05.pdf, > just scroll down to Nutch N-Grams there, and you'l

Re: Small field indexing and ranking

2006-04-11 Thread Maxym Mykhalchuk
Hi Nadav, Thanks for suggestions. As for improving multi-word queries, Doug Cutting recently posted a link to his presentation, http://www.haifa.ibm.com/Workshops/ir2005/papers/DougCutting-Haifa05.pdf, just scroll down to Nutch N-Grams there, and you'll see the answer. Basically, "Buffy the V

Re: Small field indexing and ranking

2006-04-11 Thread Nadav Har'El
"Maxym Mykhalchuk" <[EMAIL PROTECTED]> wrote on 10/04/2006 09:46:16 PM: > Here's the issue: All my "documents" will be having a few (2-3: > title, short description) short fields. You see, it's rare that the > same word is repeated several times in a title, so will Lucene be > able to give me a dec