Same experience here as Tom. Disk I/O becomes bottleneck with large indexes
(or multiple shards per server) with less memory. Frequent updates to
indexes can make the I/O bottleneck worse.
Peter
On Mon, Feb 15, 2010 at 2:17 PM, Tom Burton-West wrote:
>
> Hi Chris,
>
> In our experience with larg
> We have a list of keywords with aliases (Example:
> keyword = "ms access"
> aliases = "microsoft access", "msaccess", "m.s.
> access" )
>
> We would like to intercept the aliases prior to them being
> indexed, and have
> the keyword indexed instead. We can do this with a
> CustomFilter for s
We have a list of keywords with aliases (Example: keyword = "ms access"
aliases = "microsoft access", "msaccess", "m.s. access" )
We would like to intercept the aliases prior to them being indexed, and have
the keyword indexed instead. We can do this with a CustomFilter for single
word aliases
Hi Chris,
In our experience with large indexes (about 200-300GB) , we found most of
our bottlenecks involved disk I/O. We found that if our experimental
indexes were too small, that much of the index could fit in cache, and so
our test results were not applicable to our larger indexes. On the
The 'explain' method in PayloadNearSpanScorer assumes the
AveragePayloadFunction was used. I don't see an easy way to override this
because 'payloadsSeen' and 'payloadScore' are private/protected. It seems
like the 'PayloadFunction' interface should have an 'explain' method that
the Scorer could ca
Hello All,
I am really sorry for not following the rules and
bringing it to the top. It is important at the moment.
Thanks.
On 11 February 2010 15:51, Smith G wrote:
> Hello All,
> I am writing some test cases for a custom-class which
> modifies incoming TermQuery and
This could be down to IDF ie "Lucane" is ranked higher because it is rarer
despite having worse edit distance.
This is arguably a bug.
See http://issues.apache.org/jira/browse/LUCENE-329 which discusses this. You
could try subclass QueryParser and override newFuzzyQuery to return
FuzzyLikeThisQu
Hello,
I'm using Lucene v3.
Please consider the following spellings
Lucene
Lucéne
lucéne
Lucane
Lucen
When searching for "lucéne" among those words using a FuzzyQuery (with 0.5
edit distance), results show :
1. Lucene 1.0259752
2. Lucane 1.0259752
3. Lucéne 0.95660806
4. lucéne 0.95660806
5.
Re Mike's delegating custom query suggestion - see
https://issues.apache.org/jira/browse/LUCENE-1999
- Original Message
From: Michael McCandless
To: java-user@lucene.apache.org
Sent: Mon, 15 February, 2010 10:03:30
Subject: Re: Further refinement of search results - distinguishing hi
I don't think Lucene makes this easy, today, out of the box. The
scoring process for a boolean query doesn't track which sub-clause had
matched.
Though, it does track the number of clauses that matched (coord). EG
you'd be able to tell that a given hit had both clauses match, vs only
1 (just not
10 matches
Mail list logo