Just to make sure that I understand this correctly, the docs say: " By default, no more than 10,000 terms will be indexed for a field."
Given your note, then the docs do not mean that no more than 10,000 terms will be indexed, but that some smaller number of terms will be indexed and only the first 10,000 occurrances will be tallied. Is that correct? Thanks -MG ------ Original Message ------ Received: Mon, 21 Nov 2005 03:04:42 AM EST From: Paul Elschot <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Subject: Re: TermFrequencies vector limits? > On Monday 21 November 2005 02:16, [EMAIL PROTECTED] wrote: > > Hi. I was wondering if anyone else has seen this > > before. I'm using lucene 1.4.3 and have indexed > > about 3000 text documents using the statement: > > > > doc.add(Field.Text("contents", new FileReader(f), > > true)); > > > > When I go and retrieve the term frequency vectors, for > > any document under about 90k, everything looks as > > expected. However for larger documents (I haven't > > found the exact point, but I know that those over 128k > > qualify) the sum of the term frequencies in the vector > > seems to max out at 10001. > .. > > That's correct, have a look here for IndexWriter.maxFieldLength : > http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71 > > Regards, > Paul Elschot > > __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]