RE: Lucene slow performance

2013-03-15 Thread Uwe Schindler
Please forceMerge only one time not every time (only to clean up your index)! If you are doing a reindex already, just fix your close logic as discussed before. Scott Smith schrieb: >Unfortunately, this is a production system which I can't touch (though >I was able to get a full reindex sch

Re: Migrating SnowballAnalyzer to 4.1

2013-03-15 Thread Steve Rowe
Hi Robert, On Mar 15, 2013, at 11:29 AM, Robert Muir wrote: > 2013/2/28 Steve Rowe : >> EnglishAnalyzer has used PorterStemmer instead of the English Snowball >> stemmer since it was created in 2010 as part of LUCENE-2055[2]. I think >> this is an oversight: EnglishAnalyzer should incorporate

Re: potential query performance issue

2013-03-15 Thread Lin Ma
Hi lukai, thanks for the reply. Do you mean WAND is a way to resolve this issue? For "native support", do you mean there is no built-in (existing ready to use externally open source) module in Lucene to implement WAND? If so, the performance will really be bad. regards, Lin On Sat, Mar 16, 2013 a

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
Unfortunately, this is a production system which I can't touch (though I was able to get a full reindex scheduled for tomorrow morning). Are you suggesting that I do: writer.forceMerge(1); writer.close(); instead of just doing the close()? -Original Message- From: Simon Willnauer [ma

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
To answer your first question: "good guess" :-). Yes, this is running on windows. Sorry, I should have mentioned this. Your second point was very interesting. My assumption was that the IndexReader would get closed when the garbage collector realized that these objects were no longer being us

RE: Lucene slow performance

2013-03-15 Thread Uwe Schindler
OK, your configuration seems fine. I would have the following idea: - Are you using windows? If yes, then IndexWriter cannot remove unused files when they are still in use (e.g. hold by an open IndexReader) - When you get a new IndexReader after changes to the index, do you close the old ones? If

Re: Lucene slow performance

2013-03-15 Thread Simon Willnauer
On Sat, Mar 16, 2013 at 12:02 AM, Scott Smith wrote: > " Do you always close IndexWriter after adding few documents and when > closing, disable "wait for merge"? In that case, all merges are interrupted > and the merge policy never has a chance to merge at all (because you are > opening and clo

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
" Do you always close IndexWriter after adding few documents and when closing, disable "wait for merge"? In that case, all merges are interrupted and the merge policy never has a chance to merge at all (because you are opening and closing IndexWriter all the time with cancelling all merges)?" F

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
Here's the code for the writer: IndexWriterConfig iwc = new IndexWriterConfig(Constants.LUCENE_VERSION, _analyzer); LogByteSizeMergePolicy lbsm = new LogByteSizeMergePolicy(); lbsm.setUseCompoundFile(true); iwc.setMergePolicy(lbsm); Directory fsDir = FSDire

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
A little more data, of the 3330 files in the index, 2173 are CFS files and average 120k. Another 1116 files are .del's and average about 4kB. The remaining .prx, .frq, etc. consists of 41 files and total only 101MB. The largest files are 3 .prx files which total less than 60MB and 2 .frq of a

RE: Lucene slow performance

2013-03-15 Thread Uwe Schindler
Hi, with standard configuartion, this cannot happen. What merge policy do you use? This looks to me like a misconfigured merge policy or using the NoMergePolicy. With 3,000 segments, it will be slow, the question is, why do you get those? Another thing could be: Do you always close IndexWriter

Re: Lucene slow performance

2013-03-15 Thread Simon Willnauer
Can you tell us a little more about how you use lucene, how do you index, do you use NRT or do you open an IndexReader for every request, do you maybe us a custom merge policy or somthing like this, any special IndexWriter settings? On Fri, Mar 15, 2013 at 11:15 PM, Scott Smith wrote: > We have a

Re: Luke?

2013-03-15 Thread Wouter Heijke
Great to read this, there is hope! And Luke definitely deserves to be a Lucene module. Wouter > If anyone is able to donate some effort, a nice future scenario could be > that Luke comes fully up to date with every Lucene release: > https://issues.apache.org/jira/browse/LUCENE-2562 > > - Mark > >

Lucene slow performance

2013-03-15 Thread Scott Smith
We have a system that is using lucene and the searches are very slow. The number of documents is fairly small (less than 30,000) and each document is typically only 2 to 10 kilo-characters. Yet, searches are taking 15-16 seconds. One of the things I noticed was that the index directory has sev

Re: Luke?

2013-03-15 Thread Mark Miller
If anyone is able to donate some effort, a nice future scenario could be that Luke comes fully up to date with every Lucene release: https://issues.apache.org/jira/browse/LUCENE-2562 - Mark On Mar 15, 2013, at 5:58 AM, Eric Charles wrote: > For the record, I happily use Luke (with Lucene 4.1)

Re: potential query performance issue

2013-03-15 Thread lukai
I had implemented wand with solr/lucene. So far there is no performance issue. There is no native support for this functionality, you need to implement it by yourself.. On Fri, Mar 15, 2013 at 10:09 AM, Lin Ma wrote: > Hello guys, > > Supposing I have one million documents, and each document ha

Re: Using an AnalyzerWrapper with ASCIIFoldingFilter

2013-03-15 Thread Steven Schlansker
On Mar 15, 2013, at 11:25 AM, "Uwe Schindler" wrote: > Hi, > > The API did not really change. The API definitely did change, as before you would override the now-final tokenStream method. But you are correct that this was not the root of the problem. > The bug is in your test: > If you wou

RE: Using an AnalyzerWrapper with ASCIIFoldingFilter

2013-03-15 Thread Uwe Schindler
Hi, The API did not really change. The bug is in your test: If you would carefully read the javadocs of the TokenStream interface, you would notice that your consumer does not follow the correct workflow: http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/analysis/TokenStream.html In sh

Using an AnalyzerWrapper with ASCIIFoldingFilter

2013-03-15 Thread Steven Schlansker
Hi everyone, I am trying to port forward to 4.2 some Lucene 3.2-era code that uses the ASCIIFoldingFilter. The token stream handling has changed significantly since them, and I cannot figure out what I am doing wrong. It seems that I should extend AnalyzerWrapper so that I can intercept the To

potential query performance issue

2013-03-15 Thread Lin Ma
Hello guys, Supposing I have one million documents, and each document has hundreds of features. For a given query, it also has hundreds of features. I want to fetch most relevant top 1000 documents by dot product related features of query and documents (query/document features are in the same feat

RE: re-indexing a field

2013-03-15 Thread Uwe Schindler
You have to reindex. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: ash nix [mailto:nixd...@gmail.com] > Sent: Friday, March 15, 2013 4:57 PM > To: java-user@lucene.apache.org > Subject: re-indexing a f

re-indexing a field

2013-03-15 Thread ash nix
Hi, I have time stamp field which I should have indexed as DoubleField for numericrange queries/filter to work. I got it indexed as DoubleDocValuesField. Is it possible to reindex this field? Don't want to create a new index as it will take lot of time. Pointer to some document or blog on reindexi

Re: Migrating SnowballAnalyzer to 4.1

2013-03-15 Thread Robert Muir
2013/2/28 Steve Rowe : > EnglishAnalyzer has used PorterStemmer instead of the English Snowball > stemmer since it was created in 2010 as part of LUCENE-2055[2]. I think this > is an oversight: EnglishAnalyzer should incorporate the best English stemmer > we've got, and Martin Porter says the

Re: Getting documents from suggestions

2013-03-15 Thread Bratislav Stojanovic
Awesome Steve, I'll try that and let you know. Thank you all for answers. On Fri, Mar 15, 2013 at 12:24 AM, Steve Rowe wrote: > Hi Bratislav, > > LUCENE-4517 sounds like what you want: < > https://issues.apache.org/jira/browse/LUCENE-4517>: "Suggesters: allow to > pass a user-defined predicate/f

Re: Luke?

2013-03-15 Thread Eric Charles
For the record, I happily use Luke (with Lucene 4.1) compiled from https://github.com/sonarme/luke. It is also mavenized (shipped with a pom.xml). Thx, Eric On 14/03/2013 09:10, dizh wrote: OK , tomorrow I will put it on spmewhere such as GitHub or googlecode. But, I really don't look into