Hi Simon,
10x for your answer.
Unfortunately the code that you suggest is compatible in speed with the
code that we use in our app (it was even a bit slower).
10x,
Ivan
Simon Willnauer wrote:
Hi there,
I'm not sure if the performance is considerable for you but you could try:
TermDocs termDocs = this.reader.termDocs(term);
int count = 0;
while(termDocs.next()){
count += termDocs.freq();
}
simon
On Mon, Aug 24, 2009 at 6:14 PM, Ivan Vasilev<ivasi...@sirma.bg> wrote:
Hi All,
We use faceting in our app but it is very slow for the indexes that use our
clients.
First I will say what I understand under faceting - this is for each term
for certain field to obtain 1. number of docs that contain it, 2. the total
number of occurrences of the term in the index.
Now what we use to obtain the information:
...
some code for obtained terms on which we will make faceting
...
Term[] retTerms = new Term[terms.size()];
int[] retFreqs = new int[retTerms.length];
int[] retDocs = new int[retTerms.length];
TermPositions tp = mSearcher.getIndexReader().termPositions();
int i = 0;
for(Iterator<Term> iter = terms.iterator(); iter.hasNext(); i++) {
try {
retTerms[i] = iter.next();
tp.seek(retTerms[i]);
while(tp.next()) {
// tp.read(new int[]{}, new int[]{});
// tp.doc();
retFreqs[i] += tp.freq();
retDocs[i]++;
}
} finally {
if(tp != null) {
tp.close();
}
}
}
Now what I discovered that is extremely faster for obtaining number of docs
that contain each term.
...
the same code for obtained terms on which we will make faceting
...
Term[] retTerms = new Term[terms.size()];
int[] retFreqs = new int[retTerms.length];
int i = 0;
long t1 = System.currentTimeMillis();
for (Term currTerm : terms) {
retTerms[i] = currTerm;
retFreqs[i] = mSearcher.docFreq(currTerm);
i++;
}
I tested two code versions for obtaining 1 237 390 term facets. The
difference in time was 10 times (second version wins). I know that this is
because Lucene index keeps for each term the number of docs that contain it.
My question - is there some way to obtain the total number of occurrences of
the term in the index in some similar fast way?
Best Regards,
Ivan
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org