date:20070307

Re: Query String for a phrase?

2007-03-07 Thread Doron Cohen

Most likely the string jakarta&apache is analyzed as a single word, both at indexing time and at search time. See also "AnalysisParalysis" in Lucene Wiki. "ruchi thakur" <[EMAIL PROTECTED]> wrote on 07/03/2007 20:39:27: > Thanks Patrick. One more question. The info in link says to use the belo

Re: how to define a pool for Searcher?

2007-03-07 Thread Mohammad Norouzi

Erick and Mark thank you very much, you really give me good information. so I decided to try HitCollector and see how it works. but about storing document ID I dont think it is good because the result may be exceed than 50 000 and I just were optimistic about telling that number ;) any way, I wil

Re: Query String for a phrase?

2007-03-07 Thread ruchi thakur

Thanks Patrick. One more question. The info in link says to use the below query for phrase "jakarta apache" . It works fine. But when i run jakarta&apache also, it has the same effect, ie; like a phrase. It works fine too. Though it is working but still am little doubtful as i could n

Re: sharing my experience for upgrading from Lucene 1.9 to Lucene 2.2-dev

2007-03-07 Thread Chris Lu

I understand the NPE could be all reasonable changes. Those are changes that a Lucene-API may need to pay attention. The "java.io.IOException:read past EOF" is pretty consistent for my case. I have run it on two computers and got the same error. After changing back to indexWriter.addIndexes(direc

Re: sharing my experience for upgrading from Lucene 1.9 to Lucene 2.2-dev

2007-03-07 Thread Doron Cohen

Hi Chris, thanks for sharing this info (see below) "Chris Lu" <[EMAIL PROTECTED]> wrote on 07/03/2007 18:32:22: > I would like to share my experience for upgrading from Lucene 1.9 to > Lucene 2.2, build 515893. > > I have been working on a product called DBSight. It has both a > designing web UI

Re: score

2007-03-07 Thread Doron Cohen

http://lucene.apache.org/java/docs/scoring.html "ashwin kumar" <[EMAIL PROTECTED]> wrote on 07/03/2007 18:54:49: > hi all when i search using lucene i am getting the path of the documents in > which the search string is found along with this > > i am also gettin a score . my question is > > what

score

2007-03-07 Thread ashwin kumar

hi all when i search using lucene i am getting the path of the documents in which the search string is found along with this i am also gettin a score . my question is what is this score? whats the use of score? how the score is given for each document ? thanks regards ashwin

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Doron Cohen

Also, QueryParser would generate that combination with: *:* -naughty1 -naughty2 > Thanks! I was not aware of that class, for some reason. >> http://lucene.apache. >> org/java/docs/api/org/apache/lucene/search/MatchAllDocsQuery.html >> >> You can use that Query in front of a NOT query clause.

sharing my experience for upgrading from Lucene 1.9 to Lucene 2.2-dev

2007-03-07 Thread Chris Lu

I would like to share my experience for upgrading from Lucene 1.9 to Lucene 2.2, build 515893. I have been working on a product called DBSight. It has both a designing web UI for configuring database crawl, and also capabilities to serve search requests like later-emerged SOLR. So DBSight can do

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Greg Gershman

I thought about this, as I think overall the resources required would be less than creating a filter. Ultimately I decided against it for a few reasons: 1) I'm working with an existing index of ~50 million documents, I don't want to reindex the whole thing, or even just the documents that contai

Re: Term Frequency within Hits

2007-03-07 Thread Erick Erickson

See TermFreqVector, HitCollector, perhaps TopDocs, perhaps TermEnum. Make sure you create your index such that frequencies are stored (see the FAQ). Erick On 3/7/07, teramera <[EMAIL PROTECTED]> wrote: So after I execute a search I end up with a 'Hits' object. The number of Hits is the order

Re: alternative scoring algorithm for PhraseQuery

2007-03-07 Thread Paul Elschot

On Wednesday 07 March 2007 18:12, Philipp Nanz wrote: > Thanks for your answers. Your input is really appreciated :-) > > @Paul Elschot: > Thanks for the hint. I guess I could use coord() to penalize missing > terms like this: > > Query: a b c d > Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Paul Elschot

On Wednesday 07 March 2007 16:07, Greg Gershman wrote: > I'm attempting to create a profanity filter. I thought to use a QueryFilter created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I have run into is that, as a pure negative query is not supported (a query for (-term

Term Frequency within Hits

2007-03-07 Thread teramera

So after I execute a search I end up with a 'Hits' object. The number of Hits is the order of a million. What I want to do is from these Hits is extract term frequencies for a few known fields. I don't have a global list of terms for any of the fields but want to generate the term frequency based

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Grant Ingersoll

Not sure if this helpful given your proposed solution, but could you do something on the indexing side, such as: 1. Remove the profanity from the token stream, much like a stopword. This would also mean stripping it from the display text 2. If your TokenFilter comes across a profanity, some

Re: Query String for a phrase?

2007-03-07 Thread Patrick Turcotte

Hi, Please suggest what should be the query String for a pharse search. Did you take a look at: http://lucene.apache.org/java/docs/queryparsersyntax.html ? Patrick

Re: alternative scoring algorithm for PhraseQuery

2007-03-07 Thread Chris Hostetter

: Query: a b c d : Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1 : Doc B: a b c => sloppyFreq(0) * coord(3, 4) = 0,75 : : Doc would score higher. I guess that might be a valid solution. : There is a drawback though, i.e. sloppyFreq(1) * coord(4, 4) = 0,5 : So a perfect match with one insertion

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Greg Gershman

One point: if you use stemming, or some other modification of the terms before indexing, you'll need to make sure the terms you create to match against are also stemmed. Greg - Original Message From: Greg Gershman <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, Mar

Query String for a phrase?

2007-03-07 Thread ruchi thakur

Hello, Please suggest what should be the query String for a pharse search. Thanks and Regards, Ruchi

Re: alternative scoring algorithm for PhraseQuery

2007-03-07 Thread Philipp Nanz

Thanks for your answers. Your input is really appreciated :-) @Paul Elschot: Thanks for the hint. I guess I could use coord() to penalize missing terms like this: Query: a b c d Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1 Doc B: a b c => sloppyFreq(0) * coord(3, 4) = 0,75 Doc would score

Re: Using ParallelReader over large immutable index and small updatable index

2007-03-07 Thread Joe Shaw

Hi, On Tue, 2007-03-06 at 15:34 -0500, Andy Liu wrote: > Is there a working solution out there that would let me use ParallelReader > to search over a large, immutable index and a smaller, auxillary index that > is updated frequently? Currently, from my understanding, the ParallelReader > fails

Re: Using ParallelReader over large immutable index and small updatable index

2007-03-07 Thread Andy Liu

From my understanding, MultiSearcher is used to combine two indexes that have the same fields but different documents. ParallelReader is used to combine two indexes that have same documents but different fields. I'm trying to do the latter. Is my understanding correct? For example, what I'm t

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread mark harwood

Ah. Sorry. Last post was a ProfanitySelector rather than ProfanityFilter! - this fixes it anyway naughty1 naughty2 xxx - Original Message From: mark

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread mark harwood

Sounds like the sort of filter that could be usefully cached. You can do all this in Java code or the XML query parser (in contrib) might be a quick and simple way to externalize the profanity settings in a stylesheet which is actually used at query time e.g. <

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Greg Gershman

Thanks! I was not aware of that class, for some reason. I tried creating a NegativeQueryFilter, it works just fine. Can you think of any reason why one approach would be better than the other? If there's interest, I'm happy to post the NegativeQueryFilter. Greg - Original Message Fr

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Mark Miller

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/MatchAllDocsQuery.html You can use that Query in front of a NOT query clause. Greg Gershman wrote: I'm attempting to create a profanity filter. I thought to use a QueryFilter created with a Query of (-$#!+ AND [EMAIL PROTECTED] A

Re: how to define a pool for Searcher?

2007-03-07 Thread Mark Miller

You should not be returning 50 thousand documents. Since you are implementing paging, you should only return enough to cover your page size. If a user is viewing page 1 with documents 1-10, you send back information for 10 of the docs. On page 2, 10-20, you send back information for 10 of the d

Negative Filtering (such as for profanity)

2007-03-07 Thread Greg Gershman

I'm attempting to create a profanity filter. I thought to use a QueryFilter created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I have run into is that, as a pure negative query is not supported (a query for (-term) DOES NOT return the inverse of a query for (term)), I b

Re: how to define a pool for Searcher?

2007-03-07 Thread Erick Erickson

You may not be able to store all the documents, but what about just storing the document IDs in a list? And remember that a Hits object re-queries the index every 100 documents or so when you iterate through it, so if you're really using a Hits object, you're re-executing the query anyway. You m

Re: how to define a pool for Searcher?

2007-03-07 Thread Mohammad Norouzi

yes I am very concerned about this because we have a big project with many users and I am responsible for this. the thing that preoccupied my mind is application performance because there is more than 500 thousands records (documents). a single search may returns about 50 thousand documents and i

Re: how to define a pool for Searcher?

2007-03-07 Thread Mark Miller

To address your hits question: I wouldn't keep hits around, but would re-search instead. It is often more of a headache than a time savings to keep around all of the Hits objects and to have to manage them. I made my own Hits object that does no caching because of this. Pagination is often best

Re: how to define a pool for Searcher?

2007-03-07 Thread Mark Miller

You only want a single IndexSearcher for EVERY user searching an index. IndexAccessor will manage this for you. The only reason you might have more than one IndexSearcher is if you have more than one index to search or some additional MultiSearchers. You always want ONE IndexSearcher, ONE Index

Re: Query String for a phrase?

Re: how to define a pool for Searcher?

Re: Query String for a phrase?

Re: sharing my experience for upgrading from Lucene 1.9 to Lucene 2.2-dev

Re: sharing my experience for upgrading from Lucene 1.9 to Lucene 2.2-dev

Re: score

score

Re: Negative Filtering (such as for profanity)

sharing my experience for upgrading from Lucene 1.9 to Lucene 2.2-dev

Re: Negative Filtering (such as for profanity)

Re: Term Frequency within Hits

Re: alternative scoring algorithm for PhraseQuery

Re: Negative Filtering (such as for profanity)

Term Frequency within Hits

Re: Negative Filtering (such as for profanity)

Re: Query String for a phrase?

Re: alternative scoring algorithm for PhraseQuery

Re: Negative Filtering (such as for profanity)

Query String for a phrase?

Re: alternative scoring algorithm for PhraseQuery

Re: Using ParallelReader over large immutable index and small updatable index

Re: Using ParallelReader over large immutable index and small updatable index

Re: Negative Filtering (such as for profanity)

Re: Negative Filtering (such as for profanity)

Re: Negative Filtering (such as for profanity)

Re: Negative Filtering (such as for profanity)

Re: how to define a pool for Searcher?

Negative Filtering (such as for profanity)

Re: how to define a pool for Searcher?

Re: how to define a pool for Searcher?

Re: how to define a pool for Searcher?

Re: how to define a pool for Searcher?

32 matches

Site Navigation

Mail list logo

Footer information