It seems to be the very similar discussion about this topic, I've just missed it. Number of approaches are there. http://mail-archives.apache.org/mod_mbox/lucene-java-user/201502.mbox/%3CCAON7oqQh4aXoKfWyn=7odzwc48h_vvjjaabpfadmqehstzz...@mail.gmail.com%3E
> Looks like it goes thru every term and puts them in a priority queue and takes the top N. yes, Luke's top N term (and Lucene's PriorityQueue under the food) is great and the implementation is very good reference. Regards, Tomoko 2015-02-19 22:44 GMT+09:00 Shouvik Bardhan <sbard...@gisfederal.com>: > Thanks for your input Uchida. I will try that out. I wonder what is the > magic sauce in Luke's set of calls which allows it to create say top 100 > terms even from a index with 100 million docs (small docs though for me). > Looks like it goes thru every term and puts them in a priority queue and > takes the top N. > > regards. > > On Thu, Feb 19, 2015 at 2:10 AM, Tomoko Uchida < > tomoko.uchida.1...@gmail.com > > wrote: > > > Hi, > > > > I'm afraid there are no easy or straight way for your requirement. > > I would try create an temporary tiny index from search results on the fly > > in memory, and get top N terms from it by HighFreqTerms. > > > > > http://lucene.apache.org/core/4_10_3/misc/org/apache/lucene/misc/HighFreqTerms.html > > (The logic is almost same to Luke's top N terms feature) > > > > I have not tried ant not sure about this is practical approach in > > performance, just an idea... > > > > Hope for it's help > > Tomoko > > > > 2015-02-16 1:58 GMT+09:00 Shouvik Bardhan <sbard...@gisfederal.com>: > > > > > Apologies if I have missed it in discussions prior but I looked all > > over. I > > > looked at the Luke code and it does find high frequency terms on the > > entire > > > index. I am trying to get the top N high frequency terms in the > documents > > > returned from a search result. I came across something called > > > FilterIndexReader but I don't think it is part of 4.X codebase. Any > > pointer > > > is appreciated. > > > > > >