I don't know why the termDocs option did not work for you. Perhaps you did
not (re)open the searcher after the index was populated? Anyhow, here is a
small code snippet that does just this, see if it works for you, then you
can compare it to your code...
void numberOfTermOcc() throws Exception
Perhaps another comment on the same line - I think you would be able to get
more from your system by bounding the number of open searchers to 2:
- old, serving 'old' queries, would be soon closed;
- new, being opened and warmed up, and then serving 'new' queries;
Because... - if I understood ho
>> 4) Roughly how large is the index file in comparison to the size of the
>> input files?
>
> It depends on whether you store fields or just index them, plus
> there is also a compression (gzip -9 equivalent) option.
As an example - index size numbers I saw: when indexing 1M docs of ~20KB of
very
Beto Siless wrote:
Hi Andrej!
I'm taking a look to fuzzy signatures for near duplicate detection and
and I have seen your TextProfileSignature. The question is: If I index
the documents with their text signature, is there a way to filter near
duplicates at search time without comparing each d
It doesn't make sense to eliminate near duplicates during search time. But
if you are trying to cluster duplicates together then probably you want to
look at Carrot.
On 10/24/06, Beto Siless <[EMAIL PROTECTED]> wrote:
Hi Andrej!
I'm taking a look to fuzzy signatures for near duplicate detectio
Hi Andrej!
I'm taking a look to fuzzy signatures for near duplicate detection and
and I have seen your TextProfileSignature. The question is: If I index
the documents with their text signature, is there a way to filter near
duplicates at search time without comparing each document with all oth
Hi Karl!
I'm interested in near duplicate detection based on termFreqVectos. Now
I'm comparing all documents with each other (calculating the angle)...
Is there a way to avoid that?
Thanks!
Beto
karl wettin wrote:
17 okt 2006 kl. 17.54 skrev Find Me:
How to eliminate near duplicates from
When you create a Document by adding Field(s)
(http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html)
consider the last constructor which allows you to specify if the the field
will have its TermVector stored or not stored. Also, Luke has a column in
its document view wh
I don't know. How are this vectors stored?
Could you show me an example? (or documentation where I can find it)
2006/10/24, Samir Abdou <[EMAIL PROTECTED]>:
Hi,
You indexed without storing vectors! This is why the term vector is null.
Samir
-Message d'origine-
De: Paz Belmonte [mail
Could you specify why the score is not suitable? What is it you're trying to
do that isn't working correctly?
At a guess, I'd suspect that if you're using, say, StandardAnalyzer during
index time, the input stream is being tokenized differently than you expect.
And, depending upon what analyzer y
I use lucene to index the address information, because the address
information is so short, so I think use the Lucene Score computing is
not suitable.
who can give me some advices to index short address information.
the format of address is: name,address etc.
Hi,
You indexed without storing vectors! This is why the term vector is null.
Samir
-Message d'origine-
De : Paz Belmonte [mailto:[EMAIL PROTECTED]
Envoyé : mardi, 24. octobre 2006 12:30
À : java-user
Objet : Re: number of term occurrences
Hi,
I have tried this options too and the Te
Hi,
I have tried this options too and the Term Vector return null.
Which do you think that it is the problem?
2006/10/24, beatriz ramos <[EMAIL PROTECTED]>:
-- Forwarded message --
From: beatriz ramos <[EMAIL PROTECTED]>
Date: 24-Oct-2006 11:24
Subject: Re: number of term o
Hi, thanks for all your answers, but they don't work
I have tried the 3 options and with all of them we get termDoc = 0
I have checked my index with Luke software and termDoc is 1 here, so my
index is correct.
is it possible I have a problem with the reader? (because my index is
allright)
Thank
14 matches
Mail list logo