Hi
I got it working before I saw your latest mail, the only problem is
that it doesn't look very efficient. This is my duplicate method, the
problem is that I have to enumerate through *every* term. This was worse
before because I was only interested
in terms that matched a particular field (column) but had enumerate
through every term whatever field it was part of, so I recreated my
index so that each document only contained a row number field, and a
second field for the value of the column, however this means I am going
to end up with a number of different indexes each solving a particular
problem.
paul
public List<Integer> getDuplicates()
{
List<Integer> matches = new ArrayList<Integer>();
try
{
IndexReader ir = IndexReader.open(directory);
TermEnum terms = ir.terms();
while (terms.next())
{
if (terms.docFreq() > 1)
{
TermDocs termDocs = ir.termDocs(terms.term());
while (termDocs.next())
{
Document d = ir.document(termDocs.doc());
matches.add(new
Integer(d.getField(ROW_NUMBER).stringValue()));
}
}
}
}
catch (IOException ioe)
{
ioe.printStackTrace();
}
return matches;
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]