Thanks for your input Uchida. I will try that out. I wonder what is the magic sauce in Luke's set of calls which allows it to create say top 100 terms even from a index with 100 million docs (small docs though for me). Looks like it goes thru every term and puts them in a priority queue and takes the top N.
regards. On Thu, Feb 19, 2015 at 2:10 AM, Tomoko Uchida <tomoko.uchida.1...@gmail.com > wrote: > Hi, > > I'm afraid there are no easy or straight way for your requirement. > I would try create an temporary tiny index from search results on the fly > in memory, and get top N terms from it by HighFreqTerms. > > http://lucene.apache.org/core/4_10_3/misc/org/apache/lucene/misc/HighFreqTerms.html > (The logic is almost same to Luke's top N terms feature) > > I have not tried ant not sure about this is practical approach in > performance, just an idea... > > Hope for it's help > Tomoko > > 2015-02-16 1:58 GMT+09:00 Shouvik Bardhan <sbard...@gisfederal.com>: > > > Apologies if I have missed it in discussions prior but I looked all > over. I > > looked at the Luke code and it does find high frequency terms on the > entire > > index. I am trying to get the top N high frequency terms in the documents > > returned from a search result. I came across something called > > FilterIndexReader but I don't think it is part of 4.X codebase. Any > pointer > > is appreciated. > > >