Hello, I am trying to count the total of number of posting entries for terms having a given prefix in an index. Also count the number of such terms in the index.
The following is the code I am using for that. The problem is the result is not as expected. Can you point out if what am I doing something wrong: ASSUMPTION: Index has had no deletions. INPUT: prefix: the prefix that terms should match. VARIABLES: set: a set of unique terms found in the index having given prefix wordcount: the number of unique terms in the index having given prefix termFreqCount: final result which will be returned CODE: public long countTotalPositingEntriesInIndex(String prefix) { int wordCount = 0; int documentId = -1; long termFreqCount = 0; HashSet<String> set = new HashSet<String>(); for (int i = 0; i < index.length; i++) { while (documentId < index[i].getIndexReader().maxDoc() - 1) { documentId++; try { TermFreqVector tfv[] = index[i].getIndexReader() .getTermFreqVectors(documentId); if (tfv == null) continue; for (int fieldCount = 0; fieldCount < tfv.length; fieldCount++) { String terms[] = tfv[fieldCount].getTerms(); int termFreq[] = tfv[fieldCount].getTermFrequencies(); for (int termCount = 0; termCount < terms.length; termCount++) { if (terms[termCount].toLowerCase().startsWith( prefix.toLowerCase())) { if( !set.contains(terms[termCount])) { wordCount++; set.add(terms[termCount].toLowerCase()); } termFreqCount += termFreq[termCount]; } } } } catch (IOException e) { e.printStackTrace(); } } } return termFreqCount; }