Re: lucene nicking my memory ?

2008-12-03 Thread Magnus Rundberget
Well... after various tests I downgraded to lucene 1.9.1 to see if that had any effect... doesn't seem that way. I have set up a JMeter test with 5 concurrent users doing a search (a silly search for a two letter word) every 3 seconds (with a random of +/- 500ms). - With 512 MB xms/xmx

Re: NPE inside org.apache.lucene.index.SegmentReader.getNorms

2008-12-03 Thread Mark Miller
Sounds familiar. This may actually be in JIRA already. - Mark On Dec 3, 2008, at 6:25 PM, "Teruhiko Kurosaka" <[EMAIL PROTECTED]> wrote: Mike, You are right. There was an error on my part. I think I was, in effect, making a SpanNearQuery object of: new SpanNearQuery(new SpanQuery[0], 0,

RE: NPE inside org.apache.lucene.index.SegmentReader.getNorms

2008-12-03 Thread Teruhiko Kurosaka
Mike, You are right. There was an error on my part. I think I was, in effect, making a SpanNearQuery object of: new SpanNearQuery(new SpanQuery[0], 0, true); > -Original Message- > From: Michael McCandless [mailto:[EMAIL PROTECTED] > Sent: Wednesday, December 03, 2008 10:47 AM > To:

Re: Indexing Names in Lucene -- Thomas = Tom, etc

2008-12-03 Thread Khawaja Shams
Hi, Yes that is pretty obvious that I would have to index Tom, but I think you missed the point. I don't have a list of names with their nick names, and this is pretty common: Mike being Michael, Richard being Rich or Dick, William could be Bill or Will, etc. I thought I would check if there was

Re: NPE inside org.apache.lucene.index.SegmentReader.getNorms

2008-12-03 Thread Michael McCandless
Actually I think something "outside" Lucene is probably setting that field. How did you create the Query that you are searching on? Mike Teruhiko Kurosaka wrote: Hello again, A debugging session shows that SpanWeight.query.field is null when SpanWeight.scorer() is being executed. API do

RE: NPE inside org.apache.lucene.index.SegmentReader.getNorms

2008-12-03 Thread Teruhiko Kurosaka
Hello again, A debugging session shows that SpanWeight.query.field is null when SpanWeight.scorer() is being executed. API doc says getField() is to "Returns the name of the field matched by this query." Am I right to assume that this field is set by a search mechanism within Lucene, not by my cod

NPE inside org.apache.lucene.index.SegmentReader.getNorms

2008-12-03 Thread Teruhiko Kurosaka
My application died throwing NPE inside SegmentReader.getNorms(). Exception in thread "main" java.lang.NullPointerException at java.util.Hashtable.get(Hashtable.java:336) at org.apache.lucene.index.SegmentReader.getNorms(SegmentReader.java:438) at org.apache.lucene.index.S

Re: Termfreq

2008-12-03 Thread Gustavo Corral
Yes, of course it makes sense. I was just confused about the documentation for the Similarity function. On Wed, Dec 3, 2008 at 9:52 AM, Erick Erickson <[EMAIL PROTECTED]>wrote: > I'm not much of an expert on term frequencies and scoring, > but would you really want the score calculated for a docu

Re: Termfreq

2008-12-03 Thread Erick Erickson
I'm not much of an expert on term frequencies and scoring, but would you really want the score calculated for a document to be affected by the occurrence of terms in a field you did NOT search on? I sure wouldn't, Best Erick On Wed, Dec 3, 2008 at 10:44 AM, Gustavo Corral <[EMAIL PROTECTED]>wr

Termfreq

2008-12-03 Thread Gustavo Corral
Hi list, I hope this is not a silly question, but I should ask. I developed a IR system for XML documents with Lucene and I was checking the explain() output for some queries, but I don't understand this part: 0.121383816 = fieldWeight(title:efecto in 1), product of: 1.0 = tf(termFreq(title:efec

Re: lucene nicking my memory ?

2008-12-03 Thread Mark Miller
Careful here. Not only do you need to pass -server, but you need the ability to use it :) It will silently not work if its not there I believe. Oddly, the JRE doesn't seem to come with the server hotspot implementation. The JDK always does appear to. Probably varies by OS to some degree. Some

Re: lucene nicking my memory ?

2008-12-03 Thread Eric Bowman
Are you not passing -server on the command line? You need to do that. In my experience with Sun JVM 1.6.x, the default gc strategy is really amazingly good, as long as you pass -server. If passing -server doesn't fix it, I would recommend enabling the various verbose GC logs and watching what hap

Re: lucene nicking my memory ?

2008-12-03 Thread Michael McCandless
Are you actually hitting OOME? Or, you're watching heap usage and it bothers you that the GC is taking a long time (allowing too much garbage to use up heap space) before sweeping? One thing to try (only for testing) might be a lower and lower -Xmx until you do hit OOME; then you'll know

Re: lucene nicking my memory ?

2008-12-03 Thread Magnus Rundberget
Sure, Tried with the following Java version: build 1.5.0_16-b06-284 (dev), 1.5.0_12 (production) OS : Mac OS/X Leopard(dev) and Windows XP(dev), Windows 2003 (production) Container : Jetty 6.1 and Tomcat 5.5 (latter is used both in dev and production) current jvm options -Xms512m -Xmx1024M

Re: lucene nicking all my memory

2008-12-03 Thread Magnus Rundberget
Cheers, In my scenario. I've made sure that the index does not get modified (so reopen shouldnt be necessary ?). I've tried scenario with both caching and not caching indexsearcher (and hereby the indexreader it creates in its constructor). When not caching Ive made sure to close the indexs

Re: lucene nicking all my memory

2008-12-03 Thread Ganesh
You are opening and closing IndexSearcher for every search. Try by caching IndexSearcher and do reopen the IndexReader, when the index gets modified. In your code below, How did you create IndexSearcher. If it is using IndexReader and you need to close that too. This might be the cause of mem

Re: lucene nicking my memory ?

2008-12-03 Thread Glen Newton
Hi Magnus, Could you post the OS, version, RAM size, swapsize, Java VM version, hardware, #cores, VM command line parameters, etc? This can be very relevant. Have you tried other garbage collectors and/or tuning as described in http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html?

Re: Problem with special charecters

2008-12-03 Thread prabin meitei
I don't think there is a way to do that if you are using lucene standardAnalyzer. because standardAnalyzer is meant to tokenize by some standard token characters. for custom analyzing it is good to use your own analyzer. You can probably use SimpleAnalyzer Prabin toostep.com On Wed, Dec 3, 2008

lucene nicking all my memory

2008-12-03 Thread Magnus Rundberget
Hi, We have an application using Tomcat, Spring etc and Lucene 2.4.0. Our index is about 100MB (in test) and has about 20 indexed fields. Performance is pretty good, but we are experiencing a very high usage of memory when searching. Looking at JConsole during a somewhat silly scenario (but

Re: Problem with special charecters

2008-12-03 Thread Ravichandra
It worked out well. Thanks Is there any way that we can use standardAnalyzer and tell it not generated tokens out of this? Thanks Ravichandra prabin meitei wrote: > > use your own analyzer. Write a class extending lucene analyzer. you can > override the tokenStream method to include whateve

lucene nicking my memory ?

2008-12-03 Thread Magnus Rundberget
Hi, We have an application using Tomcat, Spring etc and Lucene 2.4.0. Our index is about 100MB (in test) and has about 20 indexed fields. Performance is pretty good, but we are experiencing a very high usage of memory when searching. Looking at JConsole during a somewhat silly scenario (but

Re: Query time document group boosting

2008-12-03 Thread Toke Eskildsen
On Tue, 2008-12-02 at 23:42 +0100, Chris Hostetter wrote: > : A cosmetic remark, I would personally choose a single field for the boosts > and > : then one token per source. (groupboost:A^10 groupboost:B^1 > groupboost:C^0.1). > > that's a key improvement, as it helps keep the number of unique f

Re: # of fields, performance

2008-12-03 Thread Michael McCandless
Also, if you do some testing of this, please post back the results if you can. As you've noticed, this (how Lucene performs with a great many / variable fields per doc) isn't a well explored area yet... Mike Mark Miller wrote: There is not much impact as long as you turn off Norms for t

Re: Problem with special charecters

2008-12-03 Thread prabin meitei
use your own analyzer. Write a class extending lucene analyzer. you can override the tokenStream method to include whatever you want and exclude what you don't want. eg. of a token stream method which may work for you public TokenStream tokenStream(String fieldName, Reader reader) { Tok

Re: Problem with special charecters

2008-12-03 Thread Ravichandra
Hi I tried that approach, I did used escaping using the "\", and the query has the special charecter, but i got no results that time. What I found out was when I use Standard Analyzer on "ABC+S", the terms generated are "ABC" and "S" and '+' is getting lost. When I used whitespaceAnalyzer or

Re: Problem with special charecters

2008-12-03 Thread Ravichandra
Hi I tried that approach, I did used escaping using the "\", and the query has the special charecter, but i got no results that time. What I found out was when I use Standard Analyzer on "ABC+S", the terms generated are "ABC" and "S" and '+' is getting lost. When I used whitespaceAnalyzer or key

Re: Indexing Names in Lucene -- Thomas = Tom, etc

2008-12-03 Thread Ganesh
If you want to query for Tom, then you need to index the value Tom. Create one more field as Alias or add alias name as part of name field. Regards Ganesh - Original Message - From: "Khawaja Shams" <[EMAIL PROTECTED]> To: Sent: Wednesday, December 03, 2008 11:46 AM Subject: Indexing

Re: Problem with special charecters

2008-12-03 Thread prabin meitei
try manually escaping the search string, adding "\" in front of the special characters. (you can do this easily by using string replace) This will make sure that your query contains the special characters Prabin toostep.com On Wed, Dec 3, 2008 at 12:03 PM, Ravichandra < [EMAIL PROTECTED]> wrote:

Re: Indexing Names in Lucene -- Thomas = Tom, etc

2008-12-03 Thread Ian Lea
Hi To get from Thomas to Tom you'll need to use synonyms. For Thom you would have been able to use prefixes or wild cards. If you google for lucene synonyms you'll find loads of stuff. Also, I believe that Solr has built in support for synonyms. -- Ian. On Wed, Dec 3, 2008 at 6:16 AM, Khaw