Thomas David Rivers wrote:
> > Personally, I vote for u_int16_t... Unicode 16 bit, vs. ISO-10646
> > code page zero (other code pages aren't defined at all anyway, and
> > it matches Windows, in case you want to use an ELF library from a
> > Windows box, if you can figure out how).
> 
>  I noticed before that you mentioned you didn't want the
>  wchar_t to be int-sized (i.e. 32 bits.)  I was just wondering
>  why.
> 
>  If we "shrink" the size at this point, would that have some
>  impact on existing programs.  (Currently, the typedef
>  for `wchar_t' works down to an `int', if I'm not mistaken.)

My ulterior motives are:

o       Sloppily written code, ported from other platforms

o       Compatability with Windows (e.g. NTFS, VFAT32FS)

o       Complete disdain for ISO-10646 being 32 bits, when 16
        of them are never anything but 0, and were put there just
        so that people could grep -v other people's languages out
        of documents

o       I'll believe Hieroglyphics and Linear B when I see the
        fonts and the programs that use them.  Dead languages
        pretty much justify purpose-built linguistics software
        anyway.

o       A desire for raw storage of Unicode, rather than UTF-8 or
        UTF-7 encoding.  This last one is:

        o       UTF encoding is mostly so people using US-ASCII
                don't have to change their data (and to hell with
                the rest of the world).  ASCII centrism is why we're
                having to invent a new type today.

        o       UTF encoding breaks fixed field storage, which has
                always bean a measure of the number of characters
                you can put in a field.

        o       UTF encoding breaks the historical (and really nice)
                "size_of_file/sizeof(struct) := number_of_records"

        o       Not knowing if a character will take 1 byte or 5
                bytes means that your fixed length input fields in
                browsers have to be fixed at 1/5th the number of
                characters as bytes available to store the input
                result

        o       People might accept doubling data size for the benefit
                of internationalization.  They aren't going to accept
                a random multiplier between 1 and 5.

        o       Storage encoding and processing encoding should be
                the same thing, and not require conversion (yeah, I
                know, I was there for the comp.std.internat arguments
                with Ohta-san about hating Unicode because it didn't
                use EUC encoding, used Chinese dictionary ordering,
                and wan't "JIS-208 + extensions"; frankly, I think
                most Japanese don't care, as long as it works, which
                is why Windows hasn't suffered sales losses).

        I really, really hate doing field length conversions in code;
        I rather suspect it will lead to as many bugs as NUL terminated
        strings and "strcpy()" and "sprintf()" have led to buffer
        overflows.

More justification than I intended, but I think the GCC default on
most platforms was chosen to *intentionally* be incompatible with
Windows.  The decision should be made on technical merits, rather
than blind hatred.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Reply via email to