date:20100215

Re: Can you use reduced sized test indexes to predict performance gains for a larger index?

2010-02-15 Thread Peter Keegan

Same experience here as Tom. Disk I/O becomes bottleneck with large indexes (or multiple shards per server) with less memory. Frequent updates to indexes can make the I/O bottleneck worse. Peter On Mon, Feb 15, 2010 at 2:17 PM, Tom Burton-West wrote: > > Hi Chris, > > In our experience with larg

Re: Controlling what is indexed / normalizing our index

2010-02-15 Thread Ahmet Arslan

> We have a list of keywords with aliases (Example: > keyword = "ms access" > aliases = "microsoft access", "msaccess", "m.s. > access" ) > > We would like to intercept the aliases prior to them being > indexed, and have > the keyword indexed instead. We can do this with a > CustomFilter for s

Controlling what is indexed / normalizing our index

2010-02-15 Thread maxSchlein

We have a list of keywords with aliases (Example: keyword = "ms access" aliases = "microsoft access", "msaccess", "m.s. access" ) We would like to intercept the aliases prior to them being indexed, and have the keyword indexed instead. We can do this with a CustomFilter for single word aliases

Re: Can you use reduced sized test indexes to predict performance gains for a larger index?

2010-02-15 Thread Tom Burton-West

Hi Chris, In our experience with large indexes (about 200-300GB) , we found most of our bottlenecks involved disk I/O. We found that if our experimental indexes were too small, that much of the index could fit in cache, and so our test results were not applicable to our larger indexes. On the

PayloadNearSpanScorer explain method

2010-02-15 Thread Peter Keegan

The 'explain' method in PayloadNearSpanScorer assumes the AveragePayloadFunction was used. I don't see an easy way to override this because 'payloadsSeen' and 'payloadScore' are private/protected. It seems like the 'PayloadFunction' interface should have an 'explain' method that the Scorer could ca

Re: question regarding BooleanQuery:equals() method

2010-02-15 Thread Smith G

Hello All, I am really sorry for not following the rules and bringing it to the top. It is important at the moment. Thanks. On 11 February 2010 15:51, Smith G wrote: > Hello All, > I am writing some test cases for a custom-class which > modifies incoming TermQuery and

Re: Strange Fuzzyquery results scoring when using a low minimal distance

2010-02-15 Thread mark harwood

This could be down to IDF ie "Lucane" is ranked higher because it is rarer despite having worse edit distance. This is arguably a bug. See http://issues.apache.org/jira/browse/LUCENE-329 which discusses this. You could try subclass QueryParser and override newFuzzyQuery to return FuzzyLikeThisQu

Strange Fuzzyquery results scoring when using a low minimal distance

2010-02-15 Thread stefcl

Hello, I'm using Lucene v3. Please consider the following spellings Lucene Lucéne lucéne Lucane Lucen When searching for "lucéne" among those words using a FuzzyQuery (with 0.5 edit distance), results show : 1. Lucene 1.0259752 2. Lucane 1.0259752 3. Lucéne 0.95660806 4. lucéne 0.95660806 5.

Re: Further refinement of search results - distinguishing hits with exact phrase match from the rest

2010-02-15 Thread mark harwood

Re Mike's delegating custom query suggestion - see https://issues.apache.org/jira/browse/LUCENE-1999 - Original Message From: Michael McCandless To: java-user@lucene.apache.org Sent: Mon, 15 February, 2010 10:03:30 Subject: Re: Further refinement of search results - distinguishing hi

Re: Further refinement of search results - distinguishing hits with exact phrase match from the rest

2010-02-15 Thread Michael McCandless

I don't think Lucene makes this easy, today, out of the box. The scoring process for a boolean query doesn't track which sub-clause had matched. Though, it does track the number of clauses that matched (coord). EG you'd be able to tell that a given hit had both clauses match, vs only 1 (just not

Re: Can you use reduced sized test indexes to predict performance gains for a larger index?

Re: Controlling what is indexed / normalizing our index

Controlling what is indexed / normalizing our index

Re: Can you use reduced sized test indexes to predict performance gains for a larger index?

PayloadNearSpanScorer explain method

Re: question regarding BooleanQuery:equals() method

Re: Strange Fuzzyquery results scoring when using a low minimal distance

Strange Fuzzyquery results scoring when using a low minimal distance

Re: Further refinement of search results - distinguishing hits with exact phrase match from the rest

Re: Further refinement of search results - distinguishing hits with exact phrase match from the rest

10 matches

Site Navigation

Mail list logo

Footer information