You should switch to the SpanScorer (in o.a.l.search.highlighter).
> That fragment scorer should only match true phrase matches.
>
> Mike
>
Thanks Mike. I gave it a try and it wasn't working how I expected. I am
using pylucene right now so I can ask them if the implementation is
different. I'm
Hi Michael,
2 Questions.
1. What version of Kamikaze are you running with?
2. Can you try the snippet below and let me know if it fails ? I ran it 20
times and it did not fail. Maybe there is some difference in the utitlity
methods you have ? I am still trying to track down if anything is of
Hi Michael,
Ah! I think we may have hit a regression bug here. We have identified the
problem, the fix is rather simple and we were already in the process of
getting a performance enhancement out in a day or two. Would it be useful to
you if push in the bug fix as a part of that release. Alterna
Unfortunately we do periodically add Documents to our index. However, I wasn't
aware of the Lucene-assigned doc ID or Sort.INDEXORDER. This is good
information to know. Who knows, we might be able to refactor things to use
this method.
Regarding performance, yes I have actually seen some d
Hi Anmol--
Sorry, there was a typo in the main function of my unit test: here is a
correct version (the utility functions remain the same).
public void testMultipleIntersections()
{
ArrayList obs = new ArrayList();
ArrayList docs = new ArrayList();
Hi Anmol--
I think I may have found a problem in AndDocIdSet. I got it to pass some
simple tests, and was in the process of integration, when some of my
tests started to fail right after I had replaced a bunch of OpenBitSet
intersections with creating a list of P4DocIdSets and then creating an
And
I really doubt boosting at index time will help. All that expresses is
that "this document's title (say) is more important *when calculating score*
than other documents with a smaller title boost".
But since you're not searching on your key (I assume), boosting
at index time would be irrelevant to
Thanks Erick,
Basically, the ideal ordering is an alphabetical one based on a String value
that is known at index creation. I was just wondering if there was anything I
could do at index creation time that might help me enforce that ordering at
query time (without using a Sort). To be honest,
People (including me) use Lucene to page through results all the time,
so I'm pretty sure you're OK.
so here's my answers...
(1) yes.
(2) Well, the default sort is by score so if you want some other
ordering you have to sort.
(3) You can boost things at index time, but I don't think that's at
I've opened https://issues.apache.org/jira/browse/LUCENE-1623 for this.
Mike
On Tue, Apr 28, 2009 at 10:15 AM, Michael McCandless
wrote:
> Ugh, indeed FieldInfos fails to properly read 2.3.x indices if the
> field name contains non-ascii characters. I'll open an issue, make a
> test case and wo
Hello,
I have a few questions about the ordering of search results:
1) Given a query, are the Documents contained in the Hits object that is
returned by IndexSearcher.search(Query query) guaranteed to be in the
same order from one call to the next (assuming the index has not been
updated in the m
Would a TopDocCollector work for you? You can get a TopDoc
object from that collector, from which you can get the max score.
That, along with the score provided for each doc should give you
a percentage.
Best
Erick
On Wed, Apr 29, 2009 at 5:30 AM, joseph.christopher wrote:
>
> Hi Experts,
>
> W
Hi Joseph,
On 4/29/2009 at 5:34 AM, joseph.christopher wrote:
> We are in a procees of changing our existing fuzzy search engine to
> lucene, but we are facing a roadblock here ie, in our existing system
> we are showing the search score in percenetage but lucene provides the
> search score in num
Thanks for the response (and the library, of course :)). I figured out
the order thing by looking at your tests (I should have done that
first). It might be a good idea to have a ctor that takes a sorted array
of ints, since it looks like in situations where you are, for instance,
loading a docset
Michael McCandless wrote:
Are you sure you can't make the reader reopen block on a reindex? Or
skip reopen if reindex is in process?
(Because that's the simplest solution).
Thats what Im suggesting in principle, I just need to work out the best
way to do it because the reader reopen has no
Are you sure you can't make the reader reopen block on a reindex? Or
skip reopen if reindex is in process?
(Because that's the simplest solution).
If not, I think the next best solution is likely to allow multiple
commit points in the index. You'll need a custom deletion policy that
always keep
Michael McCandless wrote:
Lucene doesn't have anything builtin to handle this.
It's probably best to put synchronization into your code in such a
case? It's presumably also not great if your IndexReader opens an
empty index since searches will find no results. Ie, you should
probably only reop
Hi Experts,
We are in a procees of changing our existing fuzzy search engine to lucene,
but we are facing a roadblock
here ie, in our existing system we are showing the search score in
percenetage but lucene provides the search score in numbers which is derived
from some internal logic. Can any
Hi Experts,
We are in a procees of changing our existing fuzzy search engine to lucene,
but we are facing a roadblock
here ie, in our existing system we are showing the search score in
percenetage but lucene provides the search score in numbers which is derived
from some internal logic. Can any
You should switch to the SpanScorer (in o.a.l.search.highlighter).
That fragment scorer should only match true phrase matches.
Mike
On Tue, Apr 28, 2009 at 9:49 PM, Max Lynch wrote:
> Hi,
> I am trying to find out exactly when a word I'm looking for in a document is
> found. I've talked to a fe
Hi,
in some cases we have the problem that for a document the ScoreDoc score
differs at the last digit of the float from the score reported by the
explanation functionality of lucene. For example:
ScoreDoc: 16.770466 -- Explanation: 16.770468 = (MATCH) sum of: ...
ScoreDoc: 21.118656 -- Explanat
21 matches
Mail list logo