date:20061024

Re: number of term occurrences

2006-10-24 Thread Doron Cohen

I don't know why the termDocs option did not work for you. Perhaps you did not (re)open the searcher after the index was populated? Anyhow, here is a small code snippet that does just this, see if it works for you, then you can compare it to your code... void numberOfTermOcc() throws Exception

Re: index architectures

2006-10-24 Thread Doron Cohen

Perhaps another comment on the same line - I think you would be able to get more from your system by bounding the number of open searchers to 2: - old, serving 'old' queries, would be soon closed; - new, being opened and warmed up, and then serving 'new' queries; Because... - if I understood ho

Re: Scalability Questions

2006-10-24 Thread Doron Cohen

>> 4) Roughly how large is the index file in comparison to the size of the >> input files? > > It depends on whether you store fields or just index them, plus > there is also a compression (gzip -9 equivalent) option. As an example - index size numbers I saw: when indexing 1M docs of ~20KB of very

Re: near duplicates

2006-10-24 Thread Andrzej Bialecki

Beto Siless wrote: Hi Andrej! I'm taking a look to fuzzy signatures for near duplicate detection and and I have seen your TextProfileSignature. The question is: If I index the documents with their text signature, is there a way to filter near duplicates at search time without comparing each d

Re: near duplicates

2006-10-24 Thread Find Me

It doesn't make sense to eliminate near duplicates during search time. But if you are trying to cluster duplicates together then probably you want to look at Carrot. On 10/24/06, Beto Siless <[EMAIL PROTECTED]> wrote: Hi Andrej! I'm taking a look to fuzzy signatures for near duplicate detectio

Re: near duplicates

2006-10-24 Thread Beto Siless

Hi Andrej! I'm taking a look to fuzzy signatures for near duplicate detection and and I have seen your TextProfileSignature. The question is: If I index the documents with their text signature, is there a way to filter near duplicates at search time without comparing each document with all oth

Re: near duplicates

2006-10-24 Thread Beto Siless

Hi Karl! I'm interested in near duplicate detection based on termFreqVectos. Now I'm comparing all documents with each other (calculating the angle)... Is there a way to avoid that? Thanks! Beto karl wettin wrote: 17 okt 2006 kl. 17.54 skrev Find Me: How to eliminate near duplicates from

Re: number of term occurrences

2006-10-24 Thread Tricia Williams

When you create a Document by adding Field(s) (http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html) consider the last constructor which allows you to specify if the the field will have its TermVector stored or not stored. Also, Luke has a column in its document view wh

Re: number of term occurrences

2006-10-24 Thread Paz Belmonte

I don't know. How are this vectors stored? Could you show me an example? (or documentation where I can find it) 2006/10/24, Samir Abdou <[EMAIL PROTECTED]>: Hi, You indexed without storing vectors! This is why the term vector is null. Samir -Message d'origine- De: Paz Belmonte [mail

Re: index short text

2006-10-24 Thread Erick Erickson

Could you specify why the score is not suitable? What is it you're trying to do that isn't working correctly? At a guess, I'd suspect that if you're using, say, StandardAnalyzer during index time, the input stream is being tokenized differently than you expect. And, depending upon what analyzer y

index short text

2006-10-24 Thread zhongyi yuan

I use lucene to index the address information, because the address information is so short, so I think use the Lucene Score computing is not suitable. who can give me some advices to index short address information. the format of address is: name,address etc.

RE: number of term occurrences

2006-10-24 Thread Samir Abdou

Hi, You indexed without storing vectors! This is why the term vector is null. Samir -Message d'origine- De : Paz Belmonte [mailto:[EMAIL PROTECTED] Envoyé : mardi, 24. octobre 2006 12:30 À : java-user Objet : Re: number of term occurrences Hi, I have tried this options too and the Te

Re: number of term occurrences

2006-10-24 Thread Paz Belmonte

Hi, I have tried this options too and the Term Vector return null. Which do you think that it is the problem? 2006/10/24, beatriz ramos <[EMAIL PROTECTED]>: -- Forwarded message -- From: beatriz ramos <[EMAIL PROTECTED]> Date: 24-Oct-2006 11:24 Subject: Re: number of term o

Re: number of term occurrences

2006-10-24 Thread beatriz ramos

Hi, thanks for all your answers, but they don't work I have tried the 3 options and with all of them we get termDoc = 0 I have checked my index with Luke software and termDoc is 1 here, so my index is correct. is it possible I have a problem with the reader? (because my index is allright) Thank

Re: number of term occurrences

Re: index architectures

Re: Scalability Questions

Re: near duplicates

Re: near duplicates

Re: near duplicates

Re: near duplicates

Re: number of term occurrences

Re: number of term occurrences

Re: index short text

index short text

RE: number of term occurrences

Re: number of term occurrences

Re: number of term occurrences

14 matches

Site Navigation

Mail list logo

Footer information