Peter, CharTokenizer may be the cause of the problem. It is the parent Tokenizer of WhitespaceTokenizer which is used by WhitespaceAnalyzer and it has 255 bytes buffer.
How about using KeywordAnalyzer instead of WhitespaceAnalyzer? Thanks, Koji > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] > Sent: Wednesday, February 01, 2006 6:08 AM > To: java-user@lucene.apache.org > Subject: maximum string length in index field > > > I have some really long chemical names that I am storing in an index and > it looks like they are being split into two terms. Is there a way to > increase the max term length? > > Here is an example: > > DTryptophanmethylLleucineethylLhprolinamidedeglycinamideluteinizing  > ;hormonereleasing factor pig679010NN!#6-<h3>D<h0>-Tryptophan-7 > -(<h1>N<h0>-methyl-<h3>L<h0>-leucine)-9-(<h1>N<h0>-ethyl-<h3>L<h0>-proli > namide)-10-deglycinamide-luteinizing hormone-releasing factor& > nbsp;(pig) > length of name: 298 > Number of docs in index: 1 > > sort_name:DTryptophanmethylLleucineethylLhprolinamidedeglycinamidelutein > izing hormonereleasing factor pig679010NN!#6-<h3>D<h0>-Tr > yptophan-7-(<h1>N<h0>-methyl-<h3>L<h0>-leucine)-9-(<h1>N<h0>-ethyl-<h3>L > <h0>-prolinamide)-10-deglycinamide-luteinizing&nb freq: 1 > > sort_name:sp;hormone-releasing factor (pig) freq: 1 > > Total terms: 2 Total Occuracnes:2 > > I only put one name in the index using whitespace analyzer and making > sure there are no whitespaces. However there are two terms in the > index. > > Thanks, > Peter > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]