Re: efficient way to filter out unwanted results

2007-06-14 Thread Sawan Sharma
Hello Jay, I am not sure up to what level I understood your problem . But as far as my assumption, you can try HitCollector class and its collect method. Here you can get DocID for each hit and can remove while searching. Hope it will be useful. Sawan (Chambal.com inc. NJ USA) On 6/15/07,

efficient way to filter out unwanted results

2007-06-14 Thread yu
Hi everyone, I am trying to remove several docs from search results each time I do query. The docs can be identified by an exteranl ids whcih are saved/indexed. I could use a Query or QueryFilter to achieve this but not sure if it's the most efficient way to do that. Anyone has any experienc

Re: negative queries

2007-06-14 Thread Daniel Noll
On Friday 15 June 2007 11:07:25 Antony Sequeira wrote: > Hi > I am aware that with Lucene I can not do negative only queries such as > -foo:bar The mailing list has already answered this question dozens of times. I've been wondering lately, does this list have a FAQ? If so, is this question o

negative queries

2007-06-14 Thread Antony Sequeira
Hi I am aware that with Lucene I can not do negative only queries such as -foo:bar But today I ran into an issue where I realized even queries such as +foo:bar +(-goobly:doo) also never return any results. Basically I get the impression that I can not have a clause like +(-x:y) anywhere in my

RE: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Renaud Waldura
Thank you for this crystal-clear explanation Mark! > Are you sure you need a PhraseQuery and not a Boolean > query of Should clauses? Excellent question. What's the requirement, hey? Well, the requirement is to find documents referring to "annanicole smith", "anna smith" and "annaliese smith" et

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Mark Miller
All depends on what you are looking for. Ill try and give a hint as to what is going on now: When the QueryParser parsers <> it will shove that whole piece to the analyzer. Your analyzer returns two tokens: smith and ann. When the QueryParser sees that more than one token is returned from a piece

RE: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Renaud Waldura
Thanks guys, I like it! I'm already applying some regexps before query parsing anyway, so it's just another pass. Now, I'm not sure how to do that without breaking another QP feature that I kind of like: the query <> is parsed to PhraseQuery("smith ann"). And that seems right, from a user standpoi

Lucene Search result (scoring )

2007-06-14 Thread Yatin Soni
Hi, We are using Lucene as search engine and I have a question regarding the scoring of search results, I had given a example for it, Example :--> suppose we have four Items on which we have indexed, ///

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Mark Miller
Gotto agree with Erick here...best idea is just to preprocess the query before sending it to the QueryParser. My first thought is always to get out the sledgehammer... - Mark Erick Erickson wrote: Well, perhaps the simplest thing would be to pre-process the query and make the comma into a whi

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Mathieu Lecarme
if you don't use the same tokenizer for indexing and searching, you will have troubles like this. Mixing exact match (with ") and wildcard (*) is a strange idea. Typographical rules says that you have a space after a comma, no? Your field is tokenized? M. Renaud Waldura a écrit : > My very simple

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Erick Erickson
Well, perhaps the simplest thing would be to pre-process the query and make the comma into a whitespace before sending anything to the query parser. I don't know how generalizable that sort of solution is in your problem space though Best Erick On 6/13/07, Renaud Waldura <[EMAIL PROTECTED]>

Re: In which position of a document a word was found?

2007-06-14 Thread Grant Ingersoll
Have a look at the SpanQuery (starting at page 161 in LIA or in the javadocs). I also have some info in my ApacheCon talk at http:// www.cnlp.org/presentations/slides/AdvancedLuceneEU.pdf and http:// www.cnlp.org/apachecon2005 Incidentally, the SpanQuery functionality does not require Term

Re: Investigating Lucene's Applicability to [Unusual?] Use Case

2007-06-14 Thread eks dev
sounds easy (I said sounds :), e.g. your Statement becomes Document in Lucene lingo, you make it with 3-4 Lucene fields, CONTENT (Tokenized, not stored) OFFSET(not indexed, stored) - offset in file of the first byte of your statement DOC_LENGTH(not indexed, stored) - if you have no END-OF-Statem