Peter,

CharTokenizer may be the cause of the problem.
It is the parent Tokenizer of WhitespaceTokenizer
which is used by WhitespaceAnalyzer and it
has 255 bytes buffer.

How about using KeywordAnalyzer instead of WhitespaceAnalyzer?

Thanks,

Koji

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, February 01, 2006 6:08 AM
> To: java-user@lucene.apache.org
> Subject: maximum string length in index field
> 
> 
> I have some really long chemical names that I am storing in an index and
> it looks like they are being split into two terms.  Is there a way to
> increase the max term length?
> 
> Here is an example:
> 
> DTryptophanmethylLleucineethylLhprolinamidedeglycinamideluteinizing&nbsp
> ;hormonereleasing&nbsp;factor&nbsp;pig679010NN!#6-<h3>D<h0>-Tryptophan-7
> -(<h1>N<h0>-methyl-<h3>L<h0>-leucine)-9-(<h1>N<h0>-ethyl-<h3>L<h0>-proli
> namide)-10-deglycinamide-luteinizing&nbsp;hormone-releasing&nbsp;factor&
> nbsp;(pig)
> length of name: 298
> Number of docs in index: 1
> 
> sort_name:DTryptophanmethylLleucineethylLhprolinamidedeglycinamidelutein
> izing&nbsp;hormonereleasing&nbsp;factor&nbsp;pig679010NN!#6-<h3>D<h0>-Tr
> yptophan-7-(<h1>N<h0>-methyl-<h3>L<h0>-leucine)-9-(<h1>N<h0>-ethyl-<h3>L
> <h0>-prolinamide)-10-deglycinamide-luteinizing&nb   freq: 1
> 
> sort_name:sp;hormone-releasing&nbsp;factor&nbsp;(pig)   freq: 1
> 
> Total terms: 2   Total Occuracnes:2
> 
> I only put one name in the index using whitespace analyzer and making
> sure there are no whitespaces.  However there are two terms in the
> index.
> 
> Thanks,
> Peter
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to