Help needed ordering search results

2009-09-30 Thread mitu2009
Hi, I've 3 records in Lucene index. Record 1 contains healthcare in title field. Record 2 contains healthcare and insurance in description field but not together. Record 3 contains healthcare insurance in company name field. When a user searches for healthcare insurance,I want to show records i

Webinar: Apache Solr 1.4 – Faster, Easier, an d More Versatile than Ever

2009-09-30 Thread Erik Hatcher
Excuse the cross-posting and gratuitous marketing :) Erik My company, Lucid Imagination, is sponsoring a free and in-depth technical webinar with Erik Hatcher, one of our co-founders as Lucid Imagination, as well as co-author of Lucene in Action, and Lucene/Solr PMC member and com

Re: Highlighting phrases in 2.9

2009-09-30 Thread Mark Miller
Scott Smith wrote: > I've been looking at the changes I have to make in my code to go from > 2.4.1 to 2.9. One of the features I have is to highlight query hits in > documents which meet the search criteria. If the query has a phrase, > then I need to highlight the phrase, but not isolated words

Re: Implement SpanScorer on 2.9 lucene lib!

2009-09-30 Thread Mark Miller
Felipe Lobo wrote: > Hi, i updated my lucene lib to 2.9.0 and i'm trying to instanciate the > spanscorer but the constructor is protected. > I looked in the javadoc of lucene and saw 2 subclasses of it > (PayloadNearQuery.PayloadNearSpanScorer, > PayloadTermQuery.PayloadTermWeight.PayloadTermSpanSc

Implement SpanScorer on 2.9 lucene lib!

2009-09-30 Thread Felipe Lobo
Hi, i updated my lucene lib to 2.9.0 and i'm trying to instanciate the spanscorer but the constructor is protected. I looked in the javadoc of lucene and saw 2 subclasses of it (PayloadNearQuery.PayloadNearSpanScorer, PayloadTermQuery.PayloadTermWeight.PayloadTermSpanScorer). Using this classes is

Highlighting phrases in 2.9

2009-09-30 Thread Scott Smith
I've been looking at the changes I have to make in my code to go from 2.4.1 to 2.9. One of the features I have is to highlight query hits in documents which meet the search criteria. If the query has a phrase, then I need to highlight the phrase, but not isolated words from the phrase which also

Re: TSDC, TopFieldCollector & co

2009-09-30 Thread eks dev
About clear(Object sentinel) - is it still a question no, it is not. Makes no sense with mutable elements :) - Original Message > From: Shai Erera > To: java-user@lucene.apache.org > Sent: Wednesday, 30 September, 2009 21:02:19 > Subject: Re: TSDC, TopFieldCollector & co > > I was h

Re: TSDC, TopFieldCollector & co

2009-09-30 Thread Shai Erera
I was half way through answering the second part when I noticed your second update :). I don't know about adding reset() to Collector. It makes sense "for completeness" in case other Collectors can be reset() as well. But reset() is a delicate method. It needs to be used cautiously. E.g., if you a

Re: TSDC, TopFieldCollector & co

2009-09-30 Thread eks dev
forget the question about initialize(), reading javadoc before asking already answered questions helps a lot, sorry for the noise. ...NOTE in getSentinelObject() javadoc... - Original Message > From: eks dev > To: java-user@lucene.apache.org > Sent: Wednesday, 30 September, 2009 20:

Re: TSDC, TopFieldCollector & co

2009-09-30 Thread eks dev
> BTW eks, you asked about reusing TSDC. yeah, it is normally not a big deal to allocate everything again, but these arrays are not necessarily small, I guess it would make sense to open this possibility. do you think where would be better to add reset(), TSDC or to Collector? I would even s

Re: TopDocCollector limits

2009-09-30 Thread Mark Miller
Way the heck better - Hits is horrible for that. It caches like 100 hits and then keeps searching when you exhaust the cache (been I while since I've looked at the exact numbers). Its horribly inefficient for checking every hit. Hits will end up using a Collector anyway - and then throw a speed tr

Re: TopDocCollector limits

2009-09-30 Thread Max Lynch
Thanks Mark that's exactly what I need. How does the performance of processing each document in the collect method of HitCollector compare to looping through the Hits in the deprecated Hits class? On Tue, Sep 29, 2009 at 7:40 PM, Mark Miller wrote: > Max Lynch wrote: > > Hi, > > I am developing

Re: TSDC, TopFieldCollector & co

2009-09-30 Thread Shai Erera
BTW eks, you asked about reusing TSDC. PQ has a clear() method, so it can be reused. Only currently it's final and nullifies the array. We'll need to un-final it, and then override in HitQueue to just reset the ScoreDoc instances to be sentinels again. And of course add a reset() method to TSDC. O

Re: TSDC, TopFieldCollector & co

2009-09-30 Thread eks dev
Thanks Mark, Shai, I was getting confused by so many possibilities to do the "almost the same thing" ;) But have figured it out by peeking into BoolenQuery code that decides if "out of order" should be used..., BQ will pick the right TSDC ... I like it, option 1 it is minimum user code. Cheers

Re: TSDC, TopFieldCollector & co

2009-09-30 Thread eks dev
Thanks Mark, Shai, I was getting confused by so many possibilities to do the "almost the same thing" ;) But have figured it out by peeking into BoolenQuery code that decides if "out of order" should be used..., BQ will pick the right TSDC ... I like it, option 1 it is minimum user code. Cheers

Results of setting LogMergePolicy "calibrateSizeByDeletes=true"

2009-09-30 Thread Jibo John
Hello, I am in the process of trying out the lucene patch LUCENE-1634, however I'm not getting the expected behavior. I see that the segments are not getting merged even after all the documents are deleted from it. Because of this, the index size really grows to a huge number. The expec

Re: TSDC, TopFieldCollector & co

2009-09-30 Thread Shai Erera
I agree. If you need sort-by-score, it's better to use the "fast" search methods. IndexSearcher will create the appropriate TSDC instance for you, based on the Query that was passed. If you need to create multiple Collectors and pass a kind of Multi-Collector to IndexSearcher, then you should crea

Re: TSDC, TopFieldCollector & co

2009-09-30 Thread Mark Miller
If you want relevance sorting (Sort.Score not Sort.Relevance right?), I'd think you want to use TopScoreDocCollector, not TopFieldCollector. The only reason to use relevance with TopFieldCollector is if you you are doing a nth sort with a field sort as well. You don't really need to worry about th

Re: TSDC, TopFieldCollector & co

2009-09-30 Thread eks dev
and another question, is it somehow possible to reuse TopScoreDocCollector instance? Javadoc in create(...) warns about allocating full array. NOTE: The instances returned by this method * pre-allocate a full array of length * numHits, and fill the array with sentinel * objects.

How does the term infos file (.tis) works?

2009-09-30 Thread iron light
I try to traverse all the term text in one tis files. And it failed. the code is below. Does I misunderstand something? The source code (especial the index namespace) is very complicated for me. Is there any more document about the design and something can help me understand the source? Thanks.

TSDC, TopFieldCollector & co

2009-09-30 Thread eks dev
Hi All, What is the best way to achieve the following and what are the differences, if I say "I do not normalize scores, so I do not need max score tracking, I do not care if hits are returned in doc id order, or any other order. I need only to get maxDocs *best scoring* documents": OPTION 1:

Re: Whitespace/Standard Analyzer and punctuation

2009-09-30 Thread Karl Wettin
You could look in to modifying the standard tokenizer lexer code to handle punctuation (there is a patch in the isssue tracker for the old javacc grammer to handle punctuation) and there is also the Gate NLP project which has a fairly nice sentence splitter you might find useful. Add a whol

Re: Problem searching non analyzed fields

2009-09-30 Thread Paul Taylor
Robert Muir wrote: try checking out PerFieldAnalyzerWrapper, so you can specify how each field is handled, i.e. some fields with KeywordAnalyzer, other fields with StandardAnalyzer, etc. Thanks, yes actually I realize these fields do need some analysis because I want to the search to be case ins