Re: Phrase Highlighting

2009-04-29 Thread Max Lynch
You should switch to the SpanScorer (in o.a.l.search.highlighter). > That fragment scorer should only match true phrase matches. > > Mike > Thanks Mike. I gave it a try and it wasn't working how I expected. I am using pylucene right now so I can ask them if the implementation is different. I'm

RE: kamikaze

2009-04-29 Thread molz
Hi Michael, 2 Questions. 1. What version of Kamikaze are you running with? 2. Can you try the snippet below and let me know if it fails ? I ran it 20 times and it did not fail. Maybe there is some difference in the utitlity methods you have ? I am still trying to track down if anything is of

RE: kamikaze

2009-04-29 Thread molz
Hi Michael, Ah! I think we may have hit a regression bug here. We have identified the problem, the fix is rather simple and we were already in the process of getting a performance enhancement out in a day or two. Would it be useful to you if push in the bug fix as a part of that release. Alterna

RE: Search result ordering

2009-04-29 Thread Bill.Chesky
Unfortunately we do periodically add Documents to our index. However, I wasn't aware of the Lucene-assigned doc ID or Sort.INDEXORDER. This is good information to know. Who knows, we might be able to refactor things to use this method. Regarding performance, yes I have actually seen some d

RE: kamikaze

2009-04-29 Thread Michael Mastroianni
Hi Anmol-- Sorry, there was a typo in the main function of my unit test: here is a correct version (the utility functions remain the same). public void testMultipleIntersections() { ArrayList obs = new ArrayList(); ArrayList docs = new ArrayList();

RE: kamikaze

2009-04-29 Thread Michael Mastroianni
Hi Anmol-- I think I may have found a problem in AndDocIdSet. I got it to pass some simple tests, and was in the process of integration, when some of my tests started to fail right after I had replaced a bunch of OpenBitSet intersections with creating a list of P4DocIdSets and then creating an And

Re: Search result ordering

2009-04-29 Thread Erick Erickson
I really doubt boosting at index time will help. All that expresses is that "this document's title (say) is more important *when calculating score* than other documents with a smaller title boost". But since you're not searching on your key (I assume), boosting at index time would be irrelevant to

RE: Search result ordering

2009-04-29 Thread Bill.Chesky
Thanks Erick, Basically, the ideal ordering is an alphabetical one based on a String value that is known at index creation. I was just wondering if there was anything I could do at index creation time that might help me enforce that ordering at query time (without using a Sort). To be honest,

Re: Search result ordering

2009-04-29 Thread Erick Erickson
People (including me) use Lucene to page through results all the time, so I'm pretty sure you're OK. so here's my answers... (1) yes. (2) Well, the default sort is by score so if you want some other ordering you have to sort. (3) You can boost things at index time, but I don't think that's at

Re: Read past EOF

2009-04-29 Thread Michael McCandless
I've opened https://issues.apache.org/jira/browse/LUCENE-1623 for this. Mike On Tue, Apr 28, 2009 at 10:15 AM, Michael McCandless wrote: > Ugh, indeed FieldInfos fails to properly read 2.3.x indices if the > field name contains non-ascii characters.  I'll open an issue, make a > test case and wo

Search result ordering

2009-04-29 Thread Bill.Chesky
Hello, I have a few questions about the ordering of search results: 1) Given a query, are the Documents contained in the Hits object that is returned by IndexSearcher.search(Query query) guaranteed to be in the same order from one call to the next (assuming the index has not been updated in the m

Re: How to het the score in percentage

2009-04-29 Thread Erick Erickson
Would a TopDocCollector work for you? You can get a TopDoc object from that collector, from which you can get the max score. That, along with the score provided for each doc should give you a percentage. Best Erick On Wed, Apr 29, 2009 at 5:30 AM, joseph.christopher wrote: > > Hi Experts, > > W

RE: Getting search score in percentage

2009-04-29 Thread Steven A Rowe
Hi Joseph, On 4/29/2009 at 5:34 AM, joseph.christopher wrote: > We are in a procees of changing our existing fuzzy search engine to > lucene, but we are facing a roadblock here ie, in our existing system > we are showing the search score in percenetage but lucene provides the > search score in num

RE: kamikaze

2009-04-29 Thread Michael Mastroianni
Thanks for the response (and the library, of course :)). I figured out the order thing by looking at your tests (I should have done that first). It might be a good idea to have a ctor that takes a sorted array of ints, since it looks like in situations where you are, for instance, loading a docset

Re: no segments* file found: files: Error on opening index

2009-04-29 Thread Paul Taylor
Michael McCandless wrote: Are you sure you can't make the reader reopen block on a reindex? Or skip reopen if reindex is in process? (Because that's the simplest solution). Thats what Im suggesting in principle, I just need to work out the best way to do it because the reader reopen has no

Re: no segments* file found: files: Error on opening index

2009-04-29 Thread Michael McCandless
Are you sure you can't make the reader reopen block on a reindex? Or skip reopen if reindex is in process? (Because that's the simplest solution). If not, I think the next best solution is likely to allow multiple commit points in the index. You'll need a custom deletion policy that always keep

Re: no segments* file found: files: Error on opening index

2009-04-29 Thread Paul Taylor
Michael McCandless wrote: Lucene doesn't have anything builtin to handle this. It's probably best to put synchronization into your code in such a case? It's presumably also not great if your IndexReader opens an empty index since searches will find no results. Ie, you should probably only reop

Getting search score in percentage

2009-04-29 Thread joseph.christopher
Hi Experts, We are in a procees of changing our existing fuzzy search engine to lucene, but we are facing a roadblock here ie, in our existing system we are showing the search score in percenetage but lucene provides the search score in numbers which is derived from some internal logic. Can any

How to het the score in percentage

2009-04-29 Thread joseph.christopher
Hi Experts, We are in a procees of changing our existing fuzzy search engine to lucene, but we are facing a roadblock here ie, in our existing system we are showing the search score in percenetage but lucene provides the search score in numbers which is derived from some internal logic. Can any

Re: Phrase Highlighting

2009-04-29 Thread Michael McCandless
You should switch to the SpanScorer (in o.a.l.search.highlighter). That fragment scorer should only match true phrase matches. Mike On Tue, Apr 28, 2009 at 9:49 PM, Max Lynch wrote: > Hi, > I am trying to find out exactly when a word I'm looking for in a document is > found.  I've talked to a fe

lucene score and float precision

2009-04-29 Thread Jan Paetzold
Hi, in some cases we have the problem that for a document the ScoreDoc score differs at the last digit of the float from the score reported by the explanation functionality of lucene. For example: ScoreDoc: 16.770466 -- Explanation: 16.770468 = (MATCH) sum of: ... ScoreDoc: 21.118656 -- Explanat