Hi

I got it working before I saw your latest mail, the only problem is that it doesn't look very efficient. This is my duplicate method, the problem is that I have to enumerate through *every* term. This was worse before because I was only interested in terms that matched a particular field (column) but had enumerate through every term whatever field it was part of, so I recreated my index so that each document only contained a row number field, and a second field for the value of the column, however this means I am going to end up with a number of different indexes each solving a particular problem.

paul

public List<Integer> getDuplicates()
   {
       List<Integer> matches = new ArrayList<Integer>();
       try
       {
           IndexReader ir = IndexReader.open(directory);
           TermEnum terms = ir.terms();
           while (terms.next())
           {
               if (terms.docFreq() > 1)
               {
                   TermDocs termDocs = ir.termDocs(terms.term());
                   while (termDocs.next())
                   {
                       Document d = ir.document(termDocs.doc());
matches.add(new Integer(d.getField(ROW_NUMBER).stringValue()));
                   }
               }
           }

       }
       catch (IOException ioe)
       {
           ioe.printStackTrace();
       }
       return matches;
   }

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to