MBER field
values if you use a FieldCache instead of fetching each document.
: Date: Mon, 26 Feb 2007 16:25:11 +
: From: Paul Taylor <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org, [EMAIL PROTECTED]
: To: Erick Erickson <[EMAIL PROTECTED]>
: Cc: java-user@lucene.apache.org
:
nt.
: Date: Mon, 26 Feb 2007 16:25:11 +
: From: Paul Taylor <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org, [EMAIL PROTECTED]
: To: Erick Erickson <[EMAIL PROTECTED]>
: Cc: java-user@lucene.apache.org
: Subject: Re: Can I use Lucene to retrieve a list of duplicates
:
:
Hi
I got it working before I saw your latest mail, the only problem is
that it doesn't look very efficient. This is my duplicate method, the
problem is that I have to enumerate through *every* term. This was worse
before because I was only interested
in terms that matched a particular field (
Here's an excerpt from something I wrote to enumerate all the terms for a
field. I hacked out some of my tracing, so it may not even compile .
Basically, change the line "if (td.next())" to "while (td.next())" and every
time you stay in that loop for more than one cycle, you'll have duplicate
Hi,
Sorry I don't see how I get access to TermEnums. So far Ive created a
document per row, the first field holds the row id, then i have one
field per column, and checked the index has been created ok with some
search querys.
I now want to pass a column to check, and receive a list of all
: Thanks this might do it, but do I need to know the terms beforehand, I
: just want to return any terms with frequency more than one?
no, TermEnum will let you iterate over all the terms ... you don't even
need TermDocs if you just want the docFreq for each term (which would be 1
if there are no
yes Ive seen this before thanks, it was an article that referred to this
that pointed me towards lucene in the first place :)
Erik Hatcher wrote:
On Feb 23, 2007, at 10:16 AM, Paul Taylor wrote:
Hi I have Java Swing application with a table, I was considering
using Lucene to index the data i
Thanks this might do it, but do I need to know the terms beforehand, I
just want to return any terms with frequency more than one?
Erick Erickson wrote:
Sure, you can use the TermDocs/TermEnum classes. Basically, for a term
(probably column value in your app) these let you quickly answer the
q
On Feb 23, 2007, at 10:16 AM, Paul Taylor wrote:
Hi I have Java Swing application with a table, I was considering
using Lucene to index the data in the table. One task Id like to do
is for the user to select 'Find Duplicate records for Column X',
then I would filter the table to show only
Sure, you can use the TermDocs/TermEnum classes. Basically, for a term
(probably column value in your app) these let you quickly answer the
question "which (and how many) documents does this term appear in". What you
get is the Lucene doc id, which let's you fetch all the information about
the doc
Hi I have Java Swing application with a table, I was considering using
Lucene to index the data in the table. One task Id like to do is for the
user to select 'Find Duplicate records for Column X', then I would
filter the table to show only records where there is more than one with
the same val
11 matches
Mail list logo