Faster way for faceting?

Ivan Vasilev Mon, 24 Aug 2009 09:14:37 -0700

Hi All,

We use faceting in our app but it is very slow for the indexes that useour clients.First I will say what I understand under faceting - this is for eachterm for certain field to obtain 1. number of docs that contain it, 2.the total number of occurrences of the term in the index.

Now what we use to obtain the information:


      ...
      some code for obtained terms on which we will make faceting
      ...

       Term[] retTerms = new Term[terms.size()];
       int[] retFreqs = new int[retTerms.length];
       int[] retDocs = new int[retTerms.length];
       TermPositions tp = mSearcher.getIndexReader().termPositions();
       int i = 0;
       for(Iterator<Term> iter = terms.iterator(); iter.hasNext(); i++) {
           try {
               retTerms[i] = iter.next();
               tp.seek(retTerms[i]);
               while(tp.next()) {
   //                tp.read(new int[]{}, new int[]{});
//                    tp.doc();
                   retFreqs[i] += tp.freq();
                   retDocs[i]++;
               }
           } finally {
               if(tp != null) {
                   tp.close();
               }
           }
       }

Now what I discovered that is extremely faster for obtaining number ofdocs that contain each term.


       ...
      the same code for obtained terms on which we will make faceting
      ...

       Term[] retTerms = new Term[terms.size()];
       int[] retFreqs = new int[retTerms.length];
       int i = 0;
       long t1 = System.currentTimeMillis();
       for (Term currTerm : terms) {
           retTerms[i] = currTerm;
           retFreqs[i] = mSearcher.docFreq(currTerm);
           i++;
       }

I tested two code versions for obtaining 1 237 390 term facets. Thedifference in time was 10 times (second version wins). I know that thisis because Lucene index keeps for each term the number of docs thatcontain it.

My question - is there some way to obtain the total number ofoccurrences of the term in the index in some similar fast way?


Best Regards,
Ivan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Faster way for faceting?

Reply via email to