Re: memory cost in forceMerge(1)

2015-08-10 Thread Erick Erickson
It is generally unnecessary to use forceMerge, that's a legacy from older versions of Lucene/Solr. Especially if the index is constantly changing, forceMerge generally is both expensive and not very useful. These indexes must be huge though if any of them are taking 8 hours. What's the background

memory cost in forceMerge(1)

2015-08-10 Thread 丁儒
GreetingsNow, i'm using lucene , the version is 4.10.3. For some reason, i called forceMerge(1) in the end,and the final space of the Index Library is 15 GB. . But i found that forceMerge(1) cost a lot of time, and on different machines ,the time differs. Is this caused by the different size o

Re: PerFieldAnalyzerWrapper does not seem to allow use of a custom analyzer

2015-08-10 Thread Bauer, Herbert S. (Scott)
I found the problem here. I had changed some method params and was inadvertently creating the fields I was having issues with as StringFields, which the analyzer fails silently against. From: , Scott Bauer mailto:bauer.sc...@mayo.edu>> Date: Friday, August 7, 2015 at 1:56 PM To: "java-user@luce

Re: GROUP BY in Lucene

2015-08-10 Thread Rob Audenaerde
You can write a custom (facet) collector to do this. I have done something similar, I'll describe my approach: For all the values that need grouping or aggregating, I have added a FacetField ( an AssociatedFacetField, so I can store the value alongside the ordinal) . The main search stays the same

Re: GROUP BY in Lucene

2015-08-10 Thread Michael McCandless
Lucene has a grouping module that has several approaches for grouping search hits, though it's only by a single field I believe. Mike McCandless http://blog.mikemccandless.com On Sun, Aug 9, 2015 at 2:55 PM, Gimantha Bandara wrote: > Hi all, > > Is there a way to achieve $subject? For example,

Re: Lucene TermsFilter lookup slow

2015-08-10 Thread Michael McCandless
OK, indeed, that version has the changes I was thinking of, specifically optimizing the case when only a single doc contains a term by inlining that docID into the terms dict. You should be able to improve on TermsFilter a bit because you know only 1 doc matches each ID, so after the first segment