Why is the old value still in the index

2011-12-16 Thread Paul Taylor
I'm adding documents to an index, at a later date I modify a document and update the index, close the writer and open a new IndexReader. My indexreader iterates over terms for that field and docFreq() returns one as I would expect, however the iterator returns both the old value of the documen

Re: Why is the old value still in the index

2011-12-16 Thread Paul Taylor
rmDocsFreq1 test2 (but doesn't resolve the program with my real code that doesnt seem to have this mistake :() What I dont understand then is in the incorrect example why don't I get TermDocsFreq2 if Ive actually create another document rather than updating one ? -- Ian. On Fri, D

Re: Why is the old value still in the index

2011-12-16 Thread Paul Taylor
On 16/12/2011 17:43, Uwe Schindler wrote: Hi, I'm adding documents to an index, at a later date I modify a document and update the index, close the writer and open a new IndexReader. My indexreader iterates over terms for that field and docFreq() returns one as I would expect, however the iter

Re: Why is the old value still in the index

2011-12-16 Thread Paul Taylor
On 16/12/2011 20:54, Paul Taylor wrote: On 16/12/2011 17:43, Uwe Schindler wrote: Hi, I'm adding documents to an index, at a later date I modify a document and update the index, close the writer and open a new IndexReader. My indexreader iterates over terms for that field and do

Re: Why is the old value still in the index

2011-12-16 Thread Paul Taylor
On 16/12/2011 22:51, Rene Hackl-Sommer wrote: Maybe you could just use MatchAllDocsQuery? http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/search/MatchAllDocsQuery.html Rene Ah thanks Rene, thats what I wanted Paul

Query that returns all docs that contain a field

2011-12-19 Thread Paul Taylor
I was looking for a Query that returns all documents that contain a particular field, it doesnt matter what the value of the field is just that the document contains the field. Paul - To unsubscribe, e-mail: java-user-unsubs

Re: Query that returns all docs that contain a field

2011-12-19 Thread Paul Taylor
On 19/12/2011 13:39, Uwe Schindler wrote: Hi, There is also a Query/Filter based on that FieldCache: o.a.l.search.FieldValueFilter, possibly wrapped with ConstantScoreQuery Uwe Okay, thanks for all the options. Paul - To u

Re: Query that returns all docs that contain a field

2011-12-20 Thread Paul Taylor
On 19/12/2011 13:35, Michael McCandless wrote: You could also use FieldCache.getDocsWithField; it returns a bit set where the bit is set if that document had that field. And would this disregard documents that have been deleted but are still in the index Paul

Re: Query that returns all docs that contain a field

2011-12-20 Thread Paul Taylor
On 20/12/2011 19:27, Uwe Schindler wrote: Hi, No. But the corresponding filter/query does. The bits are just for lookup, if you already have a valid document. The remaining bits are undefined (like the rest of Fieldcache). Uwe Um, I just looked for the query in Javadocs and couldn't find it,

Query to find documents whihc contain the same value for a field, i.e duplicate fields

2011-12-20 Thread Paul Taylor
So I had this code, that would return all documents where there was more than one document that had the same value for fieldname. Trouble is I didn't realise this could return documents that had been deleted, so Im wondering what an equivalent using queries would be. public List getDuplicates

Re: Query to find documents whihc contain the same value for a field, i.e duplicate fields

2011-12-22 Thread Paul Taylor
On 20/12/2011 19:38, Paul Taylor wrote: So I had this code, that would return all documents where there was more than one document that had the same value for fieldname. Trouble is I didn't realise this could return documents that had been deleted, so Im wondering what an equivalent

Using dismax features in Lucene

2012-01-06 Thread Paul Taylor
Just reading Apache Solr Enterprise Search Server and was interested in pages 152, 153 dismax and DisjunctionMaxQuery and automatic Phrase Boosting. I would like to incorporate this into a standard Lucene setup, non solr, whats the best way to do that. In fact I'm already doing something very

Re: Using dismax features in Lucene

2012-01-09 Thread Paul Taylor
, Jan 6, 2012 at 10:52 PM, Paul Taylor wrote: Just reading Apache Solr Enterprise Search Server and was interested in pages 152, 153 dismax and DisjunctionMaxQuery and automatic Phrase Boosting. I would like to incorporate this into a standard Lucene setup, non solr, whats the best way to do that

Score exact matches higher than matches that match analysed text but not original text

2012-01-10 Thread Paul Taylor
My analyser strips out accents as often these are not entered correctly, so assume there are two documents in the database with default field containing República Republica a search for República or Republica will return both results, each with a score of 1. Its correct that they both get re

Re: Score exact matches higher than matches that match analysed text but not original text

2012-01-10 Thread Paul Taylor
On 10/01/2012 10:18, Ian Lea wrote: If a term has an accent, add both accented and unaccented versions at index and search time. So in your example your default field would contain República Republica and a search for "República" would expand to "República Republica" and match both and score h

Re: Using dismax features in Lucene

2012-01-23 Thread Paul Taylor
On 10/01/2012 18:16, Chris Hostetter wrote: : The book said that dismax query was similar but different to : : DisjunctionMaxQuery the dismax *parser* in Solr is relatively simple, the majority of the code in it relates to parsing config options, reporting debugging, etc... if you wanted to do

Re: Using dismax features in Lucene

2012-01-26 Thread Paul Taylor
On 10/01/2012 18:16, Chris Hostetter wrote: : The book said that dismax query was similar but different to : : DisjunctionMaxQuery the dismax *parser* in Solr is relatively simple, the majority of the code in it relates to parsing config options, reporting debugging, etc... if you wanted to do

Re: Score exact matches higher than matches that match analysed text but not original text

2012-01-27 Thread Paul Taylor
On 10/01/2012 12:26, Paul Taylor wrote: On 10/01/2012 10:18, Ian Lea wrote: If a term has an accent, add both accented and unaccented versions at index and search time. So in your example your default field would contain República Republica and a search for "República" would

Does Fuzzy Search scores the same as Exact Match

2012-01-28 Thread Paul Taylor
All things being equal does a fuzzy match give the same score as an exact match. i.e if I do a search for farmin and it matches two docs one on term farmin, the other on term farming, will it score farming higher or score both the same ? Paul --

Re: Does Fuzzy Search scores the same as Exact Match

2012-01-28 Thread Paul Taylor
On 28/01/2012 09:36, Uwe Schindler wrote: Hi, -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Saturday, January 28, 2012 10:33 AM To: 'java-user@lucene.apache.org' Subject: Does Fuzzy Search scores the same as Exact Match All things being equal do

Re: Does Fuzzy Search scores the same as Exact Match

2012-01-28 Thread Paul Taylor
On 28/01/2012 11:22, Uwe Schindler wrote: -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Saturday, January 28, 2012 10:33 AM To: 'java-user@lucene.apache.org' Subject: Does Fuzzy Search scores the same as Exact Match All things being equal does a f

Re: Does Fuzzy Search scores the same as Exact Match

2012-02-01 Thread Paul Taylor
On 28/01/2012 11:22, Uwe Schindler wrote: -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Saturday, January 28, 2012 10:33 AM To: 'java-user@lucene.apache.org' Subject: Does Fuzzy Search scores the same as Exact Match All things being equal does a f

When does Query Parser do its analysis ?

2012-02-01 Thread Paul Taylor
So I subclass Query Parser and give it query dug up then debugging shows it calls getFieldQuery(String field, String queryText, boolean quoted) twice once with queryText=dug and one with queryText=up but then when I run it with query dúg up the first call is queryText=dúg even though the

Re: When does Query Parser do its analysis ?

2012-02-01 Thread Paul Taylor
On 01/02/2012 22:03, Robert Muir wrote: On Wed, Feb 1, 2012 at 4:32 PM, Paul Taylor wrote: So it seems like it just broke the text up at spaces, and does text analysis within getFieldQuery(), but how can it make the assumption that text should only be broken at whitespace ? you are right, see

Re: When does Query Parser do its analysis ?

2012-02-02 Thread Paul Taylor
On 02/02/2012 07:27, Doron Cohen wrote: In my particular case I add album catalogsno to my index as a keyword field , but of course if the cat log number contains a space as they often do (i.e. cad 6) there is a mismatch. Ive now changed my indexing to index the value as 'cad6' r

Performance improvements for fuzzy queries ?

2012-02-03 Thread Paul Taylor
Using Lucene 3.5, I created a query parser based on the dismax parser but in order to get matches on misspellings ecetra I additionally do a fuzzy search and a wildcard search http://svn.musicbrainz.org/search_server/trunk/servlet/src/main/java/org/musicbrainz/search/servlet/DismaxQueryParse

Can I just add ShingleFilter to my nalayzer used for indexing and searching

2012-02-21 Thread Paul Taylor
Trying out ShingleFIlter and the way it is documented it implys that you can just add it to your anaylzer and that's it with no side-effects except a larger index, but I read other implying you have to modify the way you parse user queries, could anyone confirm/deny. Also is there an easy way

Re: Can I just add ShingleFilter to my nalayzer used for indexing and searching

2012-02-21 Thread Paul Taylor
On 21/02/2012 14:37, Steven A Rowe wrote: Hi Paul, Lucene QueryParser splits on whitespace and then sends individual words one-by-one to be analyzed. All analysis components that do their work based on more than one word, including ShingleFilter and SynonymFilter, are borked by this. (There

Performance of MultiFieldQueryParser versus QueryParser

2012-03-01 Thread Paul Taylor
If I happen to subclass MultiFieldQueryParser unneccessarily (thought need more than one default search but don't after all) would it have any impact on performance ? thanks Paul - To unsubscribe, e-mail: java-user-unsubscr

In Lucene 3.5 is it always better to not optimize indexes ?

2012-03-02 Thread Paul Taylor
I've updated codebase from 3.4 to 3.5 and as part of that took the advice to no longer optimize my indexes. During testing everything seemed okay but since releasing to Live noticed the load on the servers is about 50% higher. I made quite a few code changes in this release but its not obvious

Re: In Lucene 3.5 is it always better to not optimize indexes ?

2012-03-02 Thread Paul Taylor
- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Friday, March 02, 2012 6:08 PM To: java-user@lucene.apache.org Subject: In Lucene 3.5 is it always better to not opt

Re: SweetSpotSimilarity

2012-03-06 Thread Paul Taylor
On 05/03/2012 19:26, Chris Hostetter wrote: : very small to occasionally very large. It also might be the case that : cover letters and e-mails while short might not be really something to : heavily discount. The lower discount range can be ignored by setting : the min of any sweet spot to 1.

How disabling norms on a field effects other fields

2012-03-06 Thread Paul Taylor
I have a number of fields that either only ever have a term frequency of 1 or I don't want them to be disavantaged if they do have a greater term frequency, and I never boost the field so I disable norms for these fields with Field.Index.ANALYZED_NO_NORM or Field.Index.NOT_ANALYZED_NO_NORM. B

Re: How disabling norms on a field effects other fields

2012-03-06 Thread Paul Taylor
On 06/03/2012 21:44, Paul Taylor wrote: I have a number of fields that either only ever have a term frequency of 1 or I don't want them to be disavantaged if they do have a greater term frequency, and I never boost the field so I disable norms for these fields with Field.Index.ANALYZED_NO

Re: SweetSpotSimilarity

2012-03-06 Thread Paul Taylor
On 05/03/2012 23:24, Robert Muir wrote: On Mon, Mar 5, 2012 at 6:01 PM, Paul Hill wrote: I would definitely not suggest using SSS for fields like legal brief text or emails where there is huge variability in the length of the content -- i can't think of any context where a "short" email is de

Re: Performance improvements for fuzzy queries ?

2012-03-08 Thread Paul Taylor
On 03/02/2012 15:01, Paul Taylor wrote: Using Lucene 3.5, I created a query parser based on the dismax parser but in order to get matches on misspellings ecetra I additionally do a fuzzy search and a wildcard search http://svn.musicbrainz.org/search_server/trunk/servlet/src/main/java/org

There is a mismatch between the score for a wildcard match and an exact match

2012-03-09 Thread Paul Taylor
There is a mismatch between the score for a wildcard match and an exact match I search for |recording:live OR recording:luve* | And here is the Explain Output from Search |DocNo:0:1.4196585:-1cf0-4d1f-aca7-2a6f89e34b36 1.4196585 = (MATCH) max plus0.1 times others of: 0.3763506

Re: There is a mismatch between the score for a wildcard match and an exact match

2012-03-09 Thread Paul Taylor
On 09/03/2012 10:42, Paul Taylor wrote: There is a mismatch between the score for a wildcard match and an exact match Just found the problem has been reported https://issues.apache.org/jira/browse/LUCENE-2557 not quite whether there is a solution available yet. Paul

Re: There is a mismatch between the score for a wildcard match and an exact match

2012-03-09 Thread Paul Taylor
On 09/03/2012 12:23, Paul Taylor wrote: On 09/03/2012 10:42, Paul Taylor wrote: There is a mismatch between the score for a wildcard match and an exact match Just found the problem has been reported https://issues.apache.org/jira/browse/LUCENE-2557 not quite whether there is a solution

Memory issues with Lucene deployment

2012-09-25 Thread Paul Taylor
Doing Lucene search within a jetty servlet container, the machine has 16gb of memory. Using 64bit JVM and Lucene 3.6 and files are memory mapped so I just allocate a max of 512mb to jetty itself, understanding that the remaining memory can be used to memory map lucene files. Monitoring total

Re: Memory issues with Lucene deployment

2012-09-27 Thread Paul Taylor
On 25/09/2012 20:09, Uwe Schindler wrote: Hi, Without a full output of "free -h" we cannot say anything. But the total Linux memory use should always used by 100% on a good server otherwise it's useless (because full memory includes cache usage, too). I think, -Xmx may be too less for your Jav

Is there anything in Lucene 4.0 that provides 'absolute' scoring so that i can compare the scoring results of different searches ?

2012-10-25 Thread Paul Taylor
Is there anything in Lucene 4.0 that provides 'absolute' scoring so that i can compare the scoring results of different searches ? To explain if I do a search for two values fred OR jane and there is a document that contains both those words exaclty then that document will score 100, documents

Is there a problem with my Analyzer subclass ?

2013-01-22 Thread Paul Taylor
I've been investigating potential memory leaks in my Lucene based application thats runs on jetty. I did a memory dump with jmap and one thing I've noticed is that for any subclass of analyzer that I have created that there are alot instances of the $SavedStream inner class. So for example I c

Re: Is there a problem with my Analyzer subclass ?

2013-01-22 Thread Paul Taylor
I've found a simpler subclass, that illustrates the same problem package org.musicbrainz.search.analysis; import org.apache.lucene.analysis.*; import java.io.IOException; import java.io.Reader; /** * For analyzing catalogno so can compare values containing spaces with values that do not *

What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Paul Taylor
What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Paul Taylor
tance sharing the same field name should only include their per-field boost and not the document level boost) as the boost for multi-valued field instances are multiplied together by Lucene." -- Ian. On Mon, Feb 18, 2013 at 12:17 PM, Paul Taylor wrote: What is equivalent to Document.setBo

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Paul Taylor
different to what the migration guide says so I don't see that as an improvement. Paul - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message----- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Monday, Februar

Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?

2013-02-18 Thread Paul Taylor
men http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Monday, February 18, 2013 5:08 PM To: Uwe Schindler Cc: java-user@lucene.apache.org Subject: Re: What is equivalent to Document.setBoost() from Lucene 3.6 inLucene

What replaces the computeNorm method in DefaultSimilarity in 4.1 now that the method is final

2013-02-19 Thread Paul Taylor
What replaces the computeNorm method in DefaultSimilarity in 4.1 Ive always subclassed DefaultSimilarity to resolve an issue whereby when document has multiple values in a field (because has one-many relationship) its score worse then a document which just has single value but the computeNorm

Field seems to have become binary field on update to Lucene 4.1

2013-02-19 Thread Paul Taylor
Strange test failure after converting code from Lucene 3.6 to Lucene 4.1 public void testIndexPuid() throws Exception { addReleaseOne(); RAMDirectory ramDir = new RAMDirectory(); createIndex(ramDir); IndexReader ir = IndexReader.open(ramDir); Fields fiel

Re: Field seems to have become binary field on update to Lucene 4.1

2013-02-19 Thread Paul Taylor
On 19/02/2013 20:56, Paul Taylor wrote: Strange test failure after converting code from Lucene 3.6 to Lucene 4.1 public void testIndexPuid() throws Exception { addReleaseOne(); RAMDirectory ramDir = new RAMDirectory(); createIndex(ramDir); IndexReader ir

Not getting matches for analyzers using CharMappingFilter with Lucene 4.1

2013-02-20 Thread Paul Taylor
Just updating codebase from Lucene 3.6 to Lucene 4.1 and seems my tests that use NormalizeCharMap for replacing characters in the anyalzers are not working. Below Ive created a self-contained test case, this is the output when I run it --term=and-- --term=gold-- --term=platinum-

Re: Not getting matches for analyzers using CharMappingFilter with Lucene 4.1

2013-02-25 Thread Paul Taylor
On 20/02/2013 11:28, Paul Taylor wrote: Just updating codebase from Lucene 3.6 to Lucene 4.1 and seems my tests that use NormalizeCharMap for replacing characters in the anyalzers are not working. bump, anybody I thought a self contained testcase would be enough to pique somebodys interest

Do you still have to override QueryParser to allow numeric range searches in Lucene 4.1

2013-02-25 Thread Paul Taylor
In my 3.6 code I was adding numeric field to my index as follows: public void addNumericField(IndexField field, Integer value) { addField(field, NumericUtils.intToPrefixCoded(value)); } but I've chnaged it to (work in progress) public void addNumericField(IndexField field, Integer

Re: Not getting matches for analyzers using CharMappingFilter with Lucene 4.1

2013-02-25 Thread Paul Taylor
On 20/02/2013 11:28, Paul Taylor wrote: Just updating codebase from Lucene 3.6 to Lucene 4.1 and seems my tests that use NormalizeCharMap for replacing characters in the anyalzers are not working. Below Ive created a self-contained test case, this is the output when I run it --term

Re: Not getting matches for analyzers using CharMappingFilter with Lucene 4.1

2013-02-26 Thread Paul Taylor
On 25/02/2013 11:24, Thomas Matthijs wrote: On Mon, Feb 25, 2013 at 12:19 PM, Thomas Matthijs <mailto:li...@selckin.be>> wrote: On Mon, Feb 25, 2013 at 11:30 AM, Thomas Matthijs mailto:li...@selckin.be>> wrote: On Mon, Feb 25, 2013 at 11:24

Re: What replaces the computeNorm method in DefaultSimilarity in 4.1 now that the method is final

2013-02-26 Thread Paul Taylor
On 19/02/2013 11:42, Paul Taylor wrote: What replaces the computeNorm method in DefaultSimilarity in 4.1 Ive always subclassed DefaultSimilarity to resolve an issue whereby when document has multiple values in a field (because has one-many relationship) its score worse then a document which

ArrayIndexOutOfBoundsException trying to use tokenizer in Lucene 4.1

2013-02-26 Thread Paul Taylor
This works in 3.6, but in 4.1 fails whats wrong with the code public void testTokenization() throws IOException { StringBuffer sb = new StringBuffer(); for(char i=0;i<100;i++) { Character c = new Character(i); if(!Character.isWhitespace(c)) {

NullPointerException thrown on tokenizer in 4.1, worked okay in 3.6

2013-02-26 Thread Paul Taylor
This code worked in 3.6 but now throws nullpointer exception in 41, Im not expecting there to be a token created, but surely it shouldn't throw NullPointerException Tokenizer tokenizer = new org.apache.lucene.analysis.standard.StandardTokenizer(Version.LUCENE_41, new StringReader("!!!")); to

Re: ArrayIndexOutOfBoundsException trying to use tokenizer in Lucene 4.1

2013-02-26 Thread Paul Taylor
On 26/02/2013 13:29, Alan Woodward wrote: Hi Paul, You need to call tokenizer.reset() before you call incrementToken() Alan Woodward www.flax.co.uk Hi, thanks that fixes it

Re: NullPointerException thrown on tokenizer in 4.1, worked okay in 3.6

2013-02-26 Thread Paul Taylor
On 26/02/2013 12:29, Paul Taylor wrote: This code worked in 3.6 but now throws nullpointer exception in 41, Im not expecting there to be a token created, but surely it shouldn't throw NullPointerException Tokenizer tokenizer = new org.apache.lucene.analysis.standard.StandardToke

In Lucene 4.1 FuzzyQuery constructor now takes parameter maxEdits instead of parameter minSimilarity

2013-02-26 Thread Paul Taylor
FuzzyQuery constructor now takes parameter maxEdits instead of parameter minSimilarity. But I'm unclear how to map from the old value to the new value or whether they are unrelated and can no longer be compared. I was previously using a minsimilarity of 0.5f thanks Paul --

Uable to extends TopTermsRewrite in Lucene 4.1

2013-02-26 Thread Paul Taylor
In Lucene 3.6 I had code that replicated a Dismax Query, and the search used fuzzy queries in some cases to match values. But I was finding the score attributed to matches on fuzzy searches was completely different to the score attributed to matches on exact searches so the total score returned

Re: Uable to extends TopTermsRewrite in Lucene 4.1

2013-02-26 Thread Paul Taylor
l try out your recommendations Paul -----Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Tuesday, February 26, 2013 5:34 PM To: java-user@lucene.apache.org Subject: Uable to extends TopTermsRewrite in Lucene 4.1 In Lucene 3.6 I had code that replicated a Dismax Qu

Re: Uable to extends TopTermsRewrite in Lucene 4.1

2013-02-27 Thread Paul Taylor
On 26/02/2013 18:01, Paul Taylor wrote: On 26/02/2013 17:22, Uwe Schindler wrote: Hi, You cannot override rewrite() because you could easily break the logic behind TopTermsRewrite. If you want another behavior, subclass another base class and wrap the TopTermsRewrite instead of subclassing it

Using MappingCharFIlter in analyzer breaking wildcard matches

2013-03-25 Thread Paul Taylor
I created this simple StripSpacesAndSeparatorsAnalyzer so that it ignores certain characters such as hypens in the field so that I can search for catno:WRATHCD25 catno:WRATHCD-25 and get the same results, and that works (the original value of the field added to the index was WRATHCD-25) How

Re: Uable to extends TopTermsRewrite in Lucene 4.1

2013-04-04 Thread Paul Taylor
On 27/02/2013 10:28, Uwe Schindler wrote: Hi Paul, QueryParser and MTQ's rewrite method have nothing to do with each other. The rewrite method is (explained as simple as possible) a class that is responsible to "rewrite" a MultiTermQuery to another query type (generally a query that allows to

Re: Uable to extends TopTermsRewrite in Lucene 4.1

2013-04-04 Thread Paul Taylor
On 04/04/2013 10:59, Paul Taylor wrote: On 27/02/2013 10:28, Uwe Schindler wrote: Hi Paul, QueryParser and MTQ's rewrite method have nothing to do with each other. The rewrite method is (explained as simple as possible) a class that is responsible to "rewrite" a MultiTermQ

Re: Why does index boosting a field to 2.0f on a document have such a dramatic effect

2013-04-04 Thread Paul Taylor
On 04/04/2013 23:26, Chris Hostetter wrote: : At index time I boost the alias field of a small set of documents, setting the : boost to 2.0f, which I thought meant equivalent to doubling the score this doc : would get over another doc, everything else being equal. 1) you haven't shown us enough

Distinction between AtomicReader and CompositeReader

2013-04-24 Thread Paul Taylor
Trying to convert some Lucene 3 code to Lucene 4, I want to use termEnums.docs(ir.getLiveDocs()) to only return docs that have not been deleted for a particular term. However getLiveDocs() is only available for AtomicReaders, and although I just have a single index it is file based and uses Di

Blocking on IndexSearcher search

2010-08-25 Thread Paul Taylor
Hi My multithreaded code was always creating a new IndexSearcher for every search, but I changed over to the recommendation of creating just one index searcher and keeping it between searches. Now I find if I have multiple threads trying to search they block on the search method(), only one c

Re: Blocking on IndexSearcher search

2010-08-25 Thread Paul Taylor
Uwe Schindler wrote: Can you show us where it exactly blocks (e.g. use Ctrl-Break on windows to print a thread dump)? IndexSearchers methods are not synchronized and concurrent access is easy possible, all concurrent access is managed by the underlying IndexReader. Maybe you synchronize somewhere

Re: Blocking on IndexSearcher search

2010-08-25 Thread Paul Taylor
Uwe Schindler wrote: That lock contention is fine there as this is the central point where all IO is done. This does not mean that only one query is running in parallel, the queries are still running in parallel. But there is one place where all IO is waiting for one file descriptor. This is not

Re: Blocking on IndexSearcher search

2010-09-07 Thread Paul Taylor
Uwe Schindler wrote: Im using Windows and I'll try NIO, good idea, my app is already memory hungry in other areas so I guess MMapped is a no go, doe sit use heap or perm memory ? It uses address space for mapping the files into virtual memory (like a swap file) - this is why it o

Use of SimpleStringInterner in non Lucene App

2010-09-15 Thread Paul Taylor
Im currently using String.intern() in my app purely to reduce the memory usage, this has worked well except the memory required for intern() go into perm rather than heap, setting perm on different platforms is on trivial so Im looking for a solution that works on Heap. (Now before you say tune

Re: Use of SimpleStringInterner in non Lucene App

2010-09-20 Thread Paul Taylor
Paul Taylor wrote: Im currently using String.intern() in my app purely to reduce the memory usage, this has worked well except the memory required for intern() go into perm rather than heap, setting perm on different platforms is on trivial so Im looking for a solution that works on Heap

Closing indexsearcher , making sur eit is in use

2011-01-13 Thread Paul Taylor
As recommended, I use just one Index Searcher on my multithreaded GUI app using a singleton pattern If data is modified in the index I then close the reader and searcher, and they will be recreate on next call to getInstance() but Ive hit a problem whereby one thread was closing a searcher, anot

Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch

2011-01-20 Thread Paul Taylor
Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch in NormalizeCharMap (currently the singleMatch just has to be found in the token I want ut to match the whole token). Can this be done it sounds simple enough but I c

Re: Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch

2011-01-24 Thread Paul Taylor
On 22/01/2011 15:43, Koji Sekiguchi wrote: (11/01/20 22:19), Paul Taylor wrote: Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch in NormalizeCharMap (currently the singleMatch just has to be found in the token I

How do you know when index.optimize has finished ?

2011-01-28 Thread Paul Taylor
I'm building six different indexes in series, at the end of building an index I call optimize() and then close() the writer, then move onto the next one. I build them in series because they are extracting the data from a database and I don't want to overload the database. However the optimizatio

Re: Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch

2011-01-29 Thread Paul Taylor
On 29/01/2011 01:45, Koji Sekiguchi wrote: (11/01/25 2:14), Paul Taylor wrote: On 22/01/2011 15:43, Koji Sekiguchi wrote: (11/01/20 22:19), Paul Taylor wrote: Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch in

Lucene Merge failing on Open Files

2011-04-04 Thread Paul Taylor
Problem trying to merge indexes in the background whilst building some others, works okay on my humble labtop but fails on another machine, although it seems to allow 700,000 file handles Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy$MergeException: java.io.F

Re: Lucene Merge failing on Open Files

2011-04-04 Thread Paul Taylor
On 04/04/2011 20:13, Michael McCandless wrote: How are you merging these indices? (IW.addIndexes?). Are you changing any of IW's defaults, eg mergeFactor? Mike Hi Mike I have indexWriter.setMaxBufferedDocs(1); indexWriter.setMergeFactor(3000); these are a hangover from earlier code, I

Re: Lucene Merge failing on Open Files

2011-04-06 Thread Paul Taylor
On 04/04/2011 21:06, Simon Willnauer wrote: On Mon, Apr 4, 2011 at 9:59 PM, Paul Taylor wrote: On 04/04/2011 20:13, Michael McCandless wrote: How are you merging these indices? (IW.addIndexes?). Are you changing any of IW's defaults, eg mergeFactor? Mike Hi Mike I

Lucene 3.0.3 with debug information

2011-04-29 Thread Paul Taylor
Is there a built debug version of lucene 3.0.3 so I can profile it properly to find what part of the search is taking the time. Note:Ive already profiled by application and determined that it is the lucene/Search that is taking the time, I also had another attempt using luke but find it incred

Re: Lucene 3.0.3 with debug information

2011-04-29 Thread Paul Taylor
On 29/04/2011 16:03, Steven A Rowe wrote: Hi Paul, What did you find about Luke that's buggy? Bug reports are very useful; please contribute in this way. Please see previous post, in summary mistake on my part. The official Lucene 3.0.3 distribution jars were compiled using the -g cmdline a

Re: Lucene 3.0.3 with debug information

2011-04-29 Thread Paul Taylor
On 29/04/2011 21:14, Paul Taylor wrote: Hmm maybe that is enough, Im not sure. I'm profiling with YourkitProfiler and it doesnt show anything within the lucene classes so I assumed this meant they didnt contain the neccessary debugging info but I would have thought that -g is all I

Lucene spending alot of time in BooleanScorer2

2011-05-02 Thread Paul Taylor
Hi Nearing completion on a new version of a lucene search component for the http://www.musicbrainz.org music database and having a problem with performance. There are a number of indexes each built from data in a database, there is one index for albums, another for artists, and another for tr

Re: Lucene spending alot of time in BooleanScorer2

2011-05-03 Thread Paul Taylor
On 02/05/2011 23:36, Paul Taylor wrote: Hi Nearing completion on a new version of a lucene search component for the http://www.musicbrainz.org music database and having a problem with performance. There are a number of indexes each built from data in a database, there is one index for albums

Anyway to not bother scoring less good matches ?

2011-05-03 Thread Paul Taylor
Im receiving a number of searches with many ORs so that the total number of matches is huge ( > 1 million) although only the first 20 results are required. Analysis shows most time is spent scoring the results. Now it seems to me if you sending a query with 10 OR components, documents that matc

Problem modifying Similarity class to work with lucene 3.1.0

2011-05-03 Thread Paul Taylor
How can I convert this Similariity method to use 3.1 (currently using 3.0.3), I understand I have to replace lengthNorm() wuth computerNorm() , but fieldlName is not a provided parameter in computerNorm() and FieldInvertState does not contain the fieldname either. I need the field because I onl

Re: Problem modifying Similarity class to work with lucene 3.1.0

2011-05-03 Thread Paul Taylor
On 03/05/2011 15:06, Robert Muir wrote: On Tue, May 3, 2011 at 9:57 AM, Paul Taylor wrote: How can I convert this Similariity method to use 3.1 (currently using 3.0.3), I understand I have to replace lengthNorm() wuth computerNorm() , but fieldlName is not a provided parameter in computerNorm

Why has PerFieldAnalyzerWrapper been made final in Lucene 3.1 ?

2011-05-03 Thread Paul Taylor
We subclassed PerFieldAnalyzerWrapper as follows: public class PerFieldEntityAnalyzer extends PerFieldAnalyzerWrapper { public PerFieldEntityAnalyzer(Class indexFieldClass) { super(new StandardUnaccentAnalyzer()); for(Object o : EnumSet.allOf(indexFieldClass)) {

Re: Why has PerFieldAnalyzerWrapper been made final in Lucene 3.1 ?

2011-05-04 Thread Paul Taylor
On 04/05/2011 07:56, Israel Tsadok wrote: On Tue, May 3, 2011 at 7:03 PM, Paul Taylor <mailto:paul_t...@fastmail.fm>> wrote: We subclassed PerFieldAnalyzerWrapper as follows: public class PerFieldEntityAnalyzer extends PerFieldAnalyzerWrapper { public PerFieldEntit

Re: Anyway to not bother scoring less good matches ?

2011-05-04 Thread Paul Taylor
On 04/05/2011 12:39, Ahmet Arslan wrote: Im receiving a number of searches with many ORs so that the total number of matches is huge (> 1 million) although only the first 20 results are required. Analysis shows most time is spent scoring the results. Now it seems to me if you sending a query

Re: Anyway to not bother scoring less good matches ?

2011-05-04 Thread Paul Taylor
On 04/05/2011 12:51, Paul Taylor wrote: On 04/05/2011 12:39, Ahmet Arslan wrote: Im receiving a number of searches with many ORs so that the total number of matches is huge (> 1 million) although only the first 20 results are required. Analysis shows most time is spent scoring the resu

Re: Anyway to not bother scoring less good matches ?

2011-05-04 Thread Paul Taylor
On 04/05/2011 15:02, Ahmet Arslan wrote: Thanks for the hint, so this could be done by overriding getBooleanQuery() in QueryParser ? I think something like this should do the trick. Without overriding anything. Query query= QueryParser.parse("User Entered String"); if (query instanceof B

Re: Anyway to not bother scoring less good matches ?

2011-05-05 Thread Paul Taylor
On 05/05/2011 00:24, Chris Hostetter wrote: : Well I did extend QuerParser, and the method is being called but rather : disappointingly it had no noticeablke effect on how long queries took. I : really thought by reducing the number of matches the corresponding scoring : phase would be quicker.

Re: Anyway to not bother scoring less good matches ?

2011-05-05 Thread Paul Taylor
On 05/05/2011 00:24, Ahmet Arslan wrote: Thanks again, now done that but still not having much effect on total ime, So your main concern is enhancing the running time? , not to decrease the number of returned results. Additionally http://wiki.apache.org/lucene-java/ImproveSearchingSpeed Yes c

Re: Anyway to not bother scoring less good matches ?

2011-05-05 Thread Paul Taylor
On 05/05/2011 11:13, Ahmet Arslan wrote: Yes correct, but I have looked and the list of optimizations before. What was clear from profiling was that it wasnt the searching part that was slow (a query run on the same index with only a few matching docs ran super fast) the slowness only occurs when

  1   2   3   >