Just to make sure that I understand this correctly,
the docs say: 

" By default, no more than 10,000 terms will be
indexed for a field."

Given your note, then the docs do not mean that no
more than 10,000 terms will be indexed, but that some
smaller number of terms will be indexed and only the
first 10,000 occurrances will be tallied.  

Is that correct?

Thanks
-MG

------ Original Message ------
Received: Mon, 21 Nov 2005 03:04:42 AM EST
From: Paul Elschot <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Subject: Re: TermFrequencies vector limits?

> On Monday 21 November 2005 02:16,
[EMAIL PROTECTED] wrote:
> > Hi.  I was wondering if anyone else has seen this
> > before.  I'm using  lucene 1.4.3 and have indexed
> > about 3000 text documents using the statement:
> > 
> > doc.add(Field.Text("contents", new FileReader(f),
> > true));
> > 
> > When I go and retrieve the term frequency vectors,
for
> > any document under about 90k, everything looks as
> > expected.  However for larger documents (I haven't
> > found the exact point, but I know that those over
128k
> > qualify) the sum of the term frequencies in the
vector
> > seems to max out at 10001.  
> ..
> 
> That's correct, have a look here for
IndexWriter.maxFieldLength :
>
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71
> 
> Regards,
> Paul Elschot
> 
> 


        
                
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to