Re: Query String for a phrase?

2007-03-07 Thread Doron Cohen
Most likely the string jakarta&apache is analyzed as a single word, both at indexing time and at search time. See also "AnalysisParalysis" in Lucene Wiki. "ruchi thakur" <[EMAIL PROTECTED]> wrote on 07/03/2007 20:39:27: > Thanks Patrick. One more question. The info in link says to use the belo

Re: how to define a pool for Searcher?

2007-03-07 Thread Mohammad Norouzi
Erick and Mark thank you very much, you really give me good information. so I decided to try HitCollector and see how it works. but about storing document ID I dont think it is good because the result may be exceed than 50 000 and I just were optimistic about telling that number ;) any way, I wil

Re: Query String for a phrase?

2007-03-07 Thread ruchi thakur
Thanks Patrick. One more question. The info in link says to use the below query for phrase "jakarta apache" . It works fine. But when i run jakarta&apache also, it has the same effect, ie; like a phrase. It works fine too. Though it is working but still am little doubtful as i could n

Re: sharing my experience for upgrading from Lucene 1.9 to Lucene 2.2-dev

2007-03-07 Thread Chris Lu
I understand the NPE could be all reasonable changes. Those are changes that a Lucene-API may need to pay attention. The "java.io.IOException:read past EOF" is pretty consistent for my case. I have run it on two computers and got the same error. After changing back to indexWriter.addIndexes(direc

Re: sharing my experience for upgrading from Lucene 1.9 to Lucene 2.2-dev

2007-03-07 Thread Doron Cohen
Hi Chris, thanks for sharing this info (see below) "Chris Lu" <[EMAIL PROTECTED]> wrote on 07/03/2007 18:32:22: > I would like to share my experience for upgrading from Lucene 1.9 to > Lucene 2.2, build 515893. > > I have been working on a product called DBSight. It has both a > designing web UI

Re: score

2007-03-07 Thread Doron Cohen
http://lucene.apache.org/java/docs/scoring.html "ashwin kumar" <[EMAIL PROTECTED]> wrote on 07/03/2007 18:54:49: > hi all when i search using lucene i am getting the path of the documents in > which the search string is found along with this > > i am also gettin a score . my question is > > what

score

2007-03-07 Thread ashwin kumar
hi all when i search using lucene i am getting the path of the documents in which the search string is found along with this i am also gettin a score . my question is what is this score? whats the use of score? how the score is given for each document ? thanks regards ashwin

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Doron Cohen
Also, QueryParser would generate that combination with: *:* -naughty1 -naughty2 > Thanks! I was not aware of that class, for some reason. >> http://lucene.apache. >> org/java/docs/api/org/apache/lucene/search/MatchAllDocsQuery.html >> >> You can use that Query in front of a NOT query clause.

sharing my experience for upgrading from Lucene 1.9 to Lucene 2.2-dev

2007-03-07 Thread Chris Lu
I would like to share my experience for upgrading from Lucene 1.9 to Lucene 2.2, build 515893. I have been working on a product called DBSight. It has both a designing web UI for configuring database crawl, and also capabilities to serve search requests like later-emerged SOLR. So DBSight can do

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Greg Gershman
I thought about this, as I think overall the resources required would be less than creating a filter. Ultimately I decided against it for a few reasons: 1) I'm working with an existing index of ~50 million documents, I don't want to reindex the whole thing, or even just the documents that contai

Re: Term Frequency within Hits

2007-03-07 Thread Erick Erickson
See TermFreqVector, HitCollector, perhaps TopDocs, perhaps TermEnum. Make sure you create your index such that frequencies are stored (see the FAQ). Erick On 3/7/07, teramera <[EMAIL PROTECTED]> wrote: So after I execute a search I end up with a 'Hits' object. The number of Hits is the order

Re: alternative scoring algorithm for PhraseQuery

2007-03-07 Thread Paul Elschot
On Wednesday 07 March 2007 18:12, Philipp Nanz wrote: > Thanks for your answers. Your input is really appreciated :-) > > @Paul Elschot: > Thanks for the hint. I guess I could use coord() to penalize missing > terms like this: > > Query: a b c d > Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Paul Elschot
On Wednesday 07 March 2007 16:07, Greg Gershman wrote: > I'm attempting to create a profanity filter. I thought to use a QueryFilter created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I have run into is that, as a pure negative query is not supported (a query for (-term

Term Frequency within Hits

2007-03-07 Thread teramera
So after I execute a search I end up with a 'Hits' object. The number of Hits is the order of a million. What I want to do is from these Hits is extract term frequencies for a few known fields. I don't have a global list of terms for any of the fields but want to generate the term frequency based

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Grant Ingersoll
Not sure if this helpful given your proposed solution, but could you do something on the indexing side, such as: 1. Remove the profanity from the token stream, much like a stopword. This would also mean stripping it from the display text 2. If your TokenFilter comes across a profanity, some

Re: Query String for a phrase?

2007-03-07 Thread Patrick Turcotte
Hi, Please suggest what should be the query String for a pharse search. Did you take a look at: http://lucene.apache.org/java/docs/queryparsersyntax.html ? Patrick

Re: alternative scoring algorithm for PhraseQuery

2007-03-07 Thread Chris Hostetter
: Query: a b c d : Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1 : Doc B: a b c => sloppyFreq(0) * coord(3, 4) = 0,75 : : Doc would score higher. I guess that might be a valid solution. : There is a drawback though, i.e. sloppyFreq(1) * coord(4, 4) = 0,5 : So a perfect match with one insertion

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Greg Gershman
One point: if you use stemming, or some other modification of the terms before indexing, you'll need to make sure the terms you create to match against are also stemmed. Greg - Original Message From: Greg Gershman <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, Mar

Query String for a phrase?

2007-03-07 Thread ruchi thakur
Hello, Please suggest what should be the query String for a pharse search. Thanks and Regards, Ruchi

Re: alternative scoring algorithm for PhraseQuery

2007-03-07 Thread Philipp Nanz
Thanks for your answers. Your input is really appreciated :-) @Paul Elschot: Thanks for the hint. I guess I could use coord() to penalize missing terms like this: Query: a b c d Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1 Doc B: a b c => sloppyFreq(0) * coord(3, 4) = 0,75 Doc would score

Re: Using ParallelReader over large immutable index and small updatable index

2007-03-07 Thread Joe Shaw
Hi, On Tue, 2007-03-06 at 15:34 -0500, Andy Liu wrote: > Is there a working solution out there that would let me use ParallelReader > to search over a large, immutable index and a smaller, auxillary index that > is updated frequently? Currently, from my understanding, the ParallelReader > fails

Re: Using ParallelReader over large immutable index and small updatable index

2007-03-07 Thread Andy Liu
From my understanding, MultiSearcher is used to combine two indexes that have the same fields but different documents. ParallelReader is used to combine two indexes that have same documents but different fields. I'm trying to do the latter. Is my understanding correct? For example, what I'm t

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread mark harwood
Ah. Sorry. Last post was a ProfanitySelector rather than ProfanityFilter! - this fixes it anyway naughty1 naughty2 xxx - Original Message From: mark

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread mark harwood
Sounds like the sort of filter that could be usefully cached. You can do all this in Java code or the XML query parser (in contrib) might be a quick and simple way to externalize the profanity settings in a stylesheet which is actually used at query time e.g. <

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Greg Gershman
Thanks! I was not aware of that class, for some reason. I tried creating a NegativeQueryFilter, it works just fine. Can you think of any reason why one approach would be better than the other? If there's interest, I'm happy to post the NegativeQueryFilter. Greg - Original Message Fr

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Mark Miller
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/MatchAllDocsQuery.html You can use that Query in front of a NOT query clause. Greg Gershman wrote: I'm attempting to create a profanity filter. I thought to use a QueryFilter created with a Query of (-$#!+ AND [EMAIL PROTECTED] A

Re: how to define a pool for Searcher?

2007-03-07 Thread Mark Miller
You should not be returning 50 thousand documents. Since you are implementing paging, you should only return enough to cover your page size. If a user is viewing page 1 with documents 1-10, you send back information for 10 of the docs. On page 2, 10-20, you send back information for 10 of the d

Negative Filtering (such as for profanity)

2007-03-07 Thread Greg Gershman
I'm attempting to create a profanity filter. I thought to use a QueryFilter created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I have run into is that, as a pure negative query is not supported (a query for (-term) DOES NOT return the inverse of a query for (term)), I b

Re: how to define a pool for Searcher?

2007-03-07 Thread Erick Erickson
You may not be able to store all the documents, but what about just storing the document IDs in a list? And remember that a Hits object re-queries the index every 100 documents or so when you iterate through it, so if you're really using a Hits object, you're re-executing the query anyway. You m

Re: how to define a pool for Searcher?

2007-03-07 Thread Mohammad Norouzi
yes I am very concerned about this because we have a big project with many users and I am responsible for this. the thing that preoccupied my mind is application performance because there is more than 500 thousands records (documents). a single search may returns about 50 thousand documents and i

Re: how to define a pool for Searcher?

2007-03-07 Thread Mark Miller
To address your hits question: I wouldn't keep hits around, but would re-search instead. It is often more of a headache than a time savings to keep around all of the Hits objects and to have to manage them. I made my own Hits object that does no caching because of this. Pagination is often best

Re: how to define a pool for Searcher?

2007-03-07 Thread Mark Miller
You only want a single IndexSearcher for EVERY user searching an index. IndexAccessor will manage this for you. The only reason you might have more than one IndexSearcher is if you have more than one index to search or some additional MultiSearchers. You always want ONE IndexSearcher, ONE Index