Thanks for pointing this out, Marvin. I wish Sun (or someone) would
document and register this particular character set encoding with
IANA, so that it could be used outside of Java. As it stands now,
it's essentially a bastard encoding, good for nothing, and one of the
warts of Java.
Lucene prob
I've delved into the matter of Lucene and UTF-8 a little further,
and I am discouraged by what I believe I've uncovered.
Lucene should not be advertising that it uses "standard UTF-8" -- or
even UTF-8 at all, since "Modified UTF-8" is _illegal_ UTF-8.
Unfortunately this is how Sun documents t
Hi,
It seems this problem only happens when the index files get really large.
Could it be because java has trouble handling very large files on windows
machine (guess there is max file size on windows)?
In Lucene, I think there is a maxDoc kind of parameter that you can use to
specify, when th
Hi,
I had lots of "docs out of order" issues when the index is optimized. I did the
changes based on the suggestion in this link
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23650
It seems this issue is solved. But some index have "read past EOF" when I do
optimization. The index is over 2
On Aug 26, 2005, at 10:14 PM, jian chen wrote:
Hi,
It seems to me that in theory, Lucene storage code could use true
UTF-8 to
store terms. Maybe it is just a legacy issue that the modified
UTF-8 is
used?
It has been suggested that this discussion should move to the
developer's list, s
Hi,
It seems to me that in theory, Lucene storage code could use true UTF-8 to
store terms. Maybe it is just a legacy issue that the modified UTF-8 is
used?
Cheers,
Jian
On 8/26/05, Marvin Humphrey <[EMAIL PROTECTED]> wrote:
>
> Greets,
>
> [crossposted to java-user@lucene.apache.org and [E