Hi All,
We use faceting in our app but it is very slow for the indexes that use
our clients.
First I will say what I understand under faceting - this is for each
term for certain field to obtain 1. number of docs that contain it, 2.
the total number of occurrences of the term in the index.
Now what we use to obtain the information:
...
some code for obtained terms on which we will make faceting
...
Term[] retTerms = new Term[terms.size()];
int[] retFreqs = new int[retTerms.length];
int[] retDocs = new int[retTerms.length];
TermPositions tp = mSearcher.getIndexReader().termPositions();
int i = 0;
for(Iterator<Term> iter = terms.iterator(); iter.hasNext(); i++) {
try {
retTerms[i] = iter.next();
tp.seek(retTerms[i]);
while(tp.next()) {
// tp.read(new int[]{}, new int[]{});
// tp.doc();
retFreqs[i] += tp.freq();
retDocs[i]++;
}
} finally {
if(tp != null) {
tp.close();
}
}
}
Now what I discovered that is extremely faster for obtaining number of
docs that contain each term.
...
the same code for obtained terms on which we will make faceting
...
Term[] retTerms = new Term[terms.size()];
int[] retFreqs = new int[retTerms.length];
int i = 0;
long t1 = System.currentTimeMillis();
for (Term currTerm : terms) {
retTerms[i] = currTerm;
retFreqs[i] = mSearcher.docFreq(currTerm);
i++;
}
I tested two code versions for obtaining 1 237 390 term facets. The
difference in time was 10 times (second version wins). I know that this
is because Lucene index keeps for each term the number of docs that
contain it.
My question - is there some way to obtain the total number of
occurrences of the term in the index in some similar fast way?
Best Regards,
Ivan
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org