date:20050827

Re: Lucene does NOT use UTF-8.

2005-08-27 Thread Bill Janssen

Thanks for pointing this out, Marvin. I wish Sun (or someone) would document and register this particular character set encoding with IANA, so that it could be used outside of Java. As it stands now, it's essentially a bastard encoding, good for nothing, and one of the warts of Java. Lucene prob

Re: Lucene does NOT use UTF-8.

2005-08-27 Thread Ken Krugler

I've delved into the matter of Lucene and UTF-8 a little further, and I am discouraged by what I believe I've uncovered. Lucene should not be advertising that it uses "standard UTF-8" -- or even UTF-8 at all, since "Modified UTF-8" is _illegal_ UTF-8. Unfortunately this is how Sun documents t

Re: read past EOF

2005-08-27 Thread jian chen

Hi, It seems this problem only happens when the index files get really large. Could it be because java has trouble handling very large files on windows machine (guess there is max file size on windows)? In Lucene, I think there is a maxDoc kind of parameter that you can use to specify, when th

read past EOF

2005-08-27 Thread Ouyang, Hui

Hi, I had lots of "docs out of order" issues when the index is optimized. I did the changes based on the suggestion in this link http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23650 It seems this issue is solved. But some index have "read past EOF" when I do optimization. The index is over 2

Re: Lucene does NOT use UTF-8.

2005-08-27 Thread Marvin Humphrey

On Aug 26, 2005, at 10:14 PM, jian chen wrote: Hi, It seems to me that in theory, Lucene storage code could use true UTF-8 to store terms. Maybe it is just a legacy issue that the modified UTF-8 is used? It has been suggested that this discussion should move to the developer's list, s

Re: Lucene does NOT use UTF-8.

2005-08-27 Thread jian chen

Hi, It seems to me that in theory, Lucene storage code could use true UTF-8 to store terms. Maybe it is just a legacy issue that the modified UTF-8 is used? Cheers, Jian On 8/26/05, Marvin Humphrey <[EMAIL PROTECTED]> wrote: > > Greets, > > [crossposted to java-user@lucene.apache.org and [E

Re: Lucene does NOT use UTF-8.

Re: Lucene does NOT use UTF-8.

Re: read past EOF

read past EOF

Re: Lucene does NOT use UTF-8.

Re: Lucene does NOT use UTF-8.

6 matches

Site Navigation

Mail list logo

Footer information