Re: Can I use Lucene to retrieve a list of duplicates

Paul Taylor Mon, 26 Feb 2007 08:52:09 -0800

Hi

I got it working before I saw your latest mail, the only problem isthat it doesn't look very efficient. This is my duplicate method, theproblem is that I have to enumerate through *every* term. This was worsebefore because I was only interestedin terms that matched a particular field (column) but had enumeratethrough every term whatever field it was part of, so I recreated myindex so that each document only contained a row number field, and asecond field for the value of the column, however this means I am goingto end up with a number of different indexes each solving a particularproblem.


paul

public List<Integer> getDuplicates()
   {
       List<Integer> matches = new ArrayList<Integer>();
       try
       {
           IndexReader ir = IndexReader.open(directory);
           TermEnum terms = ir.terms();
           while (terms.next())
           {
               if (terms.docFreq() > 1)
               {
                   TermDocs termDocs = ir.termDocs(terms.term());
                   while (termDocs.next())
                   {
                       Document d = ir.document(termDocs.doc());

matches.add(newInteger(d.getField(ROW_NUMBER).stringValue()));

                   }
               }
           }

       }
       catch (IOException ioe)
       {
           ioe.printStackTrace();
       }
       return matches;
   }

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Can I use Lucene to retrieve a list of duplicates

Reply via email to