Here's an excerpt from something I wrote to enumerate all the terms for a
field. I hacked out some of my tracing, so it may not even compile <G>.....

Basically, change the line "if (td.next())" to "while (td.next())" and every
time you stay in that loop for more than one cycle, you'll have duplicates
for that particular term....

 private void enumField(String field) throws Exception
   {
       long start = System.currentTimeMillis();
       TermEnum termEnum = this.reader.getIndexReader().terms(new
Term(field, ""));

       this.writer.println("");
       this.writer.println("");
       this.writer.println("");
       this.writer.println("Values for term " + field);

       TermDocs td = this.reader.getIndexReader().termDocs();
       Term term = termEnum.term();
       int idx = 0;
       int jdx = 0;

       while ((term != null) && term.field().equals(field)) {

           termEnum.next();
           td.seek(termEnum);

           if (td.next()) {
               ++jdx;
           }

           term = termEnum.term();
           ++idx;
       }
   }


Erick

On 2/26/07, Paul Taylor <[EMAIL PROTECTED]> wrote:

Hi,

Sorry I don't see how I get access to TermEnums. So far Ive created a
document per row, the first field holds the row id, then i have one
field per column, and checked  the index has been created ok with some
search querys.
I now want to pass a column to check, and receive  a list of all the
documents that contain  a  term  in that column which is used by at
least one other document for that column ( a duplicate term).

thanks paul

Chris Hostetter wrote:
> : Thanks this might do it, but do I need to know the terms beforehand, I
> : just want to return any terms with frequency more than one?
>
> no, TermEnum will let you iterate over all the terms ... you don't even
> need TermDocs if you just want the docFreq for each term (which would be
1
> if there are no duplicates)
>
> : Erick Erickson wrote:
> : > Sure, you can use the TermDocs/TermEnum classes. Basically, for a
term
> : > (probably column value in your app) these let you quickly answer the
> : > question "which (and how many) documents does this term appear in".
> : > What you get is the Lucene doc id, which let's you fetch all the
> : > information about the documents you want.
> : >
> : > Erick
> : >
> : > On 2/23/07, *Paul Taylor* <[EMAIL PROTECTED]
> : > <mailto:[EMAIL PROTECTED]>> wrote:
> : >
> : >     Hi I have Java Swing application with a table, I was considering
using
> : >     Lucene to index the data in the table. One task Id like to do is
> : >     for the
> : >     user to select 'Find Duplicate records for Column X', then I
would
> : >     filter the table to show only records where there is more than
one
> : >     with
> : >     the same value i.e duplicate for that column. Is there a way to
return
> : >     all the duplicates from a Lucene index.
> : >
> : >     thanks paul Taylor
> : >
> : >
---------------------------------------------------------------------
> : >     To unsubscribe, e-mail: [EMAIL PROTECTED]
> : >     <mailto:[EMAIL PROTECTED]>
> : >     For additional commands, e-mail:
[EMAIL PROTECTED]
> : >     <mailto:[EMAIL PROTECTED]>
> : >
> : >
> : >
------------------------------------------------------------------------
> : >
> : > Internal Virus Database is out-of-date.
> : > Checked by AVG Free Edition.
> : > Version: 7.1.394 / Virus Database: 268.16.5/616 - Release Date:
04/01/2007
> : >
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: [EMAIL PROTECTED]
> : For additional commands, e-mail: [EMAIL PROTECTED]
> :
>
>
>
> -Hoss
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to