Tatsuo Ishii wrote:
Tom Lane wrote:
If I understood what I was reading, this would take several things:
* Remove the "special UTF-8 check" in pg_verifymbstr;
* Extend pg_utf2wchar_with_len and pg_utf_mblen to handle the 4-byte case;
* Set maxmblen to 4 in the pg_wchar_table[] entry for UTF-8.
Are
> Tom Lane wrote:
>
> > If I understood what I was reading, this would take several things:
> > * Remove the "special UTF-8 check" in pg_verifymbstr;
> > * Extend pg_utf2wchar_with_len and pg_utf_mblen to handle the 4-byte case;
> > * Set maxmblen to 4 in the pg_wchar_table[] entry for UTF-8.
> >
Oliver Jowett <[EMAIL PROTECTED]> writes:
> Does this change what client_encoding = UNICODE might produce? The JDBC
> driver will need some tweaking to handle this -- Java uses UTF-16
> internally and I think some supplementary character (?) scheme for
> values above 0x as of JDK 1.5.
You'r
Tom Lane wrote:
If I understood what I was reading, this would take several things:
* Remove the "special UTF-8 check" in pg_verifymbstr;
* Extend pg_utf2wchar_with_len and pg_utf_mblen to handle the 4-byte case;
* Set maxmblen to 4 in the pg_wchar_table[] entry for UTF-8.
Are there any other place
> -Original Message-
> From: Tom Lane [mailto:[EMAIL PROTECTED]
> Sent: Sunday, August 08, 2004 2:43 AM
> To: Dennis Bjorklund
> Cc: Tatsuo Ishii; John Hansen; [EMAIL PROTECTED];
> [EMAIL PROTECTED]
> Subject: Re: [PATCHES] [HACKERS] UNICODE characters above
Dennis Bjorklund <[EMAIL PROTECTED]> writes:
> On Sat, 7 Aug 2004, Tatsuo Ishii wrote:
>> Anyway my point is if current specification of Unicode only allows
>> 24-bit range, why we need to allow usage against the specification?
> Is there a specific reason you want to restrict it to 24 bits?
I se
> -Original Message-
> From: Dennis Bjorklund [mailto:[EMAIL PROTECTED]
> Sent: Saturday, August 07, 2004 11:23 PM
> To: John Hansen
> Cc: Takehiko Abe; [EMAIL PROTECTED]
> Subject: RE: [PATCHES] [HACKERS] UNICODE characters above 0x1
>
> On Sat, 7 Aug
On Sat, 7 Aug 2004, John Hansen wrote:
> Now, is it really 24 bits tho?
> Afaict, it's really 21 (0 - 10 or 0 - xxx1 )
Yes, up to 0x10 should be enough.
The 24 is not really important, this is all about what utf-8 strings to
accept as input. The strings are stored
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Dennis Bjorklund
> Sent: Saturday, August 07, 2004 10:48 PM
> To: Takehiko Abe
> Cc: [EMAIL PROTECTED]
> Subject: Re: [PATCHES] [HACKERS] UNICODE characters above 0x1
On Sat, 7 Aug 2004, Takehiko Abe wrote:
It looked like you sent the last mail only to me and not the list. I
assume it was a misstake and I send the reply to both.
> > Is there a specific reason you want to restrict it to 24 bits?
>
> ISO 10646 is said to have removed its private use codepoints
: [PATCHES] [HACKERS] UNICODE characters above 0x1
On Sat, 7 Aug 2004, John Hansen wrote:
> should not allow them to be stored, since there might me someone using
> the high ranges for a private character set, which could very well be
> included in the specification some day.
There
On Sat, 7 Aug 2004, Tatsuo Ishii wrote:
> More seriously, Unicode is filled with tons of confusion and
> inconsistency IMO. Remember that once Unicode adovocates said that the
> merit of Unicode was it only requires 16-bit width. Now they say they
> need surrogate pairs and 32-bit width chars...
>
s are allowed?
Regards,
John Hansen
-Original Message-
From: Tatsuo Ishii [mailto:[EMAIL PROTECTED]
Sent: Saturday, August 07, 2004 8:46 PM
To: John Hansen
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: Re: [PATCHES] [HACKERS] UNICODE characters
On Sat, 7 Aug 2004, John Hansen wrote:
> should not allow them to be stored, since there might me someone using
> the high ranges for a private character set, which could very well be
> included in the specification some day.
There are areas reserved for private character sets.
--
/Dennis Björk
> Yes, but the specification allows for 6byte sequences, or 32bit
> characters.
UTF-8 is just an encoding specification, not character set
specification. Unicode only has 17 256x256 planes in its
specification.
> As dennis pointed out, just because they're not used, doesn't mean we
> should not a
] [HACKERS] UNICODE characters above 0x1
> Dennis Bjorklund <[EMAIL PROTECTED]> writes:
> > ... This also means that the start byte can never start with 7 or 8
> > ones, that is illegal and should be tested for and rejected. So the
> > longest utf-8 sequence is 6 byt
> Dennis Bjorklund <[EMAIL PROTECTED]> writes:
> > ... This also means that the start byte can never start with 7 or 8
> > ones, that is illegal and should be tested for and rejected. So the
> > longest utf-8 sequence is 6 bytes (and the longest character needs 4
> > bytes (or 31 bits)).
>
> Tatsu
17 matches
Mail list logo