Taking too much time in optimization

2009-08-10 Thread Laxmilal Menariya
Hello everyone, I have created a sample application & indexing files properties, have index appx 107K files. I am getting OutofMemoryError after 100K while indexing, got the cause from MaxBuffereddocs=100K, but after that I am calling optimize() method, this is taking too much time appx 12-HRS,

Re: sumOfSquaredWeights for lengthNorm

2009-08-10 Thread Grant Ingersoll
You can override the Similarity class and set it on both the IndexWriter and the IndexReader. Is that your question? On Aug 10, 2009, at 3:55 AM, Claudio Gennaro wrote: Dear all, I read a very old message on the list about the use of sumOfSquaredWeights in lengthNorm (Mon, 06 Mar 2006

R: sumOfSquaredWeights for lengthNorm

2009-08-10 Thread Claudio Gennaro
Yes, this is probably the right way, but the problem is how. The method lengthNorm(String fieldName, int numTerms) of the Similarity Class takes as input parameter only the number of terms of the current doc being indexed. Instead, I need the evaluate the sum of the squared of the distinct term fre

RE: Language Detection for Analysis?

2009-08-10 Thread Teruhiko Kurosaka
A shameless self-promotion: http://basistech.com/language-identification/ No, it's not free. Sorry. We have Lucene-compatible Tokenizers for those languages too: http://basistech.com/lucene/How-to-build-a-multilingual-search-engine.pdf Contact me if you have questions. -kuro > -Original Me

Re: Taking too much time in optimization

2009-08-10 Thread Otis Gospodnetic
Hi, That mergeFactor is too high. I suggest going back to default (10). maxBufferedDocs is an old and not very accurate setting (imagine what happens with the JVM heap if your indexer hits a SUPER LARGE document). Use setRamBufferSizeMB instead. Otis -- Sematext is hiring -- http://sematext.c

Re: score from spans

2009-08-10 Thread Grant Ingersoll
On Aug 9, 2009, at 5:10 AM, Eran Sevi wrote: Thanks for the answer. I tried to further understand the weight and score mechanism when running a span query search. I noticed that indeed the SpanScorer and SpanWeight are being called and some score is returned but it seems to me that these

Re: score from spans

2009-08-10 Thread Mark Miller
Hey Eran, I've started work on this in the past - you are right, it gets complicated quick! Its also likely to bring with it a sizable performance cost. We already have an issue in JIRA for this that is quite old: https://issues.apache.org/jira/browse/LUCENE-533 If you get any work going,

How to solve this problem

2009-08-10 Thread 石川
Hi, I am a newbie in lucene and am trying the 'indexing and searching' demo of lucene 1.4.3 using kaffe 1.0.6. After inputing the query, an error occurs as follows: Query: stringSearching for: string java.lang.NoSuchMethodError: org/apache/lucene/search/Searcher.search(Lorg/apache/luc

Re: Taking too much time in optimization

2009-08-10 Thread Laxmilal Menariya
Thanks, I will try. On Tue, Aug 11, 2009 at 6:08 AM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Hi, > > That mergeFactor is too high. I suggest going back to default (10). > maxBufferedDocs is an old and not very accurate setting (imagine what > happens with the JVM heap if your ind

Re: How to solve this problem

2009-08-10 Thread Alexander Aristov
I suspect that you might use incompatible versions of lucene and kaffe. Though I have never worked with kaffe before and so might be wrong. Best Regards Alexander Aristov 2009/8/11 石川 > Hi, > I am a newbie in lucene and am trying the 'indexing and searching' > demo of lucene 1.4.3 using k