Hi All,

We use faceting in our app but it is very slow for the indexes that use our clients. First I will say what I understand under faceting - this is for each term for certain field to obtain 1. number of docs that contain it, 2. the total number of occurrences of the term in the index.
Now what we use to obtain the information:

      ...
      some code for obtained terms on which we will make faceting
      ...

       Term[] retTerms = new Term[terms.size()];
       int[] retFreqs = new int[retTerms.length];
       int[] retDocs = new int[retTerms.length];
       TermPositions tp = mSearcher.getIndexReader().termPositions();
       int i = 0;
       for(Iterator<Term> iter = terms.iterator(); iter.hasNext(); i++) {
           try {
               retTerms[i] = iter.next();
               tp.seek(retTerms[i]);
               while(tp.next()) {
   //                tp.read(new int[]{}, new int[]{});
//                    tp.doc();
                   retFreqs[i] += tp.freq();
                   retDocs[i]++;
               }
           } finally {
               if(tp != null) {
                   tp.close();
               }
           }
       }

Now what I discovered that is extremely faster for obtaining number of docs that contain each term.

       ...
      the same code for obtained terms on which we will make faceting
      ...

       Term[] retTerms = new Term[terms.size()];
       int[] retFreqs = new int[retTerms.length];
       int i = 0;
       long t1 = System.currentTimeMillis();
       for (Term currTerm : terms) {
           retTerms[i] = currTerm;
           retFreqs[i] = mSearcher.docFreq(currTerm);
           i++;
       }

I tested two code versions for obtaining 1 237 390 term facets. The difference in time was 10 times (second version wins). I know that this is because Lucene index keeps for each term the number of docs that contain it.

My question - is there some way to obtain the total number of occurrences of the term in the index in some similar fast way?

Best Regards,
Ivan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to