Hye-Shik Chang added the comment:
Sorry. I just found that the fix breaks few other test units.
I'll check.
--
___
Python tracker
<http://bugs.python.org/i
Hye-Shik Chang added the comment:
Right.
Here I upload a patch to fix the addressed problem on cjkcodecs.
Please test whether the patch corrects the behavior.
--
keywords: +patch
Added file: http://bugs.python.org/file13572/cjkcodecs-fix-statefulenc.diff
Hye-Shik Chang added the comment:
When I asked Taiwanese developers how often they use these character
sets, it appeared that they are almost useless in the usual computing
environment in Taiwan. This will only serve for a historical
compatibility and literal standard compliance. I'm
Hye-Shik Chang <[EMAIL PROTECTED]> added the comment:
pitrou, that's because Python source code can't be correctly tokenized
when it's encoded in few odd encodings like iso-2022 or shift-jis which
utilizes \, (, ) and " as second byte of two-byte character sequence.
Hye-Shik Chang <[EMAIL PROTECTED]> added the comment:
This problem is due to OpenBSD's libc bug.
It's fixed 3 days ago. (http://www.openbsd.org/cgi-
bin/cvsweb/src/lib/libc/string/wcschr.c#rev1.4)
We can workaround by replacing use of wcschr(ws, L'\0') to ws +
wc
Hye-Shik Chang <[EMAIL PROTECTED]> added the comment:
Committed patch "cjkmactemporary.diff" as r65988 in the py3k branch.
I'll open another issue for cjkcodecs implementation of Mac codecs.
--
resolution: -> fixed
status: open -> closed
___
Changes by Hye-Shik Chang <[EMAIL PROTECTED]>:
Added file: http://bugs.python.org/file11170/cjkmactemporary.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Changes by Hye-Shik Chang <[EMAIL PROTECTED]>:
Added file: http://bugs.python.org/file10749/maccjkcodecs-1-py3k.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Hye-Shik Chang <[EMAIL PROTECTED]> added the comment:
Added a patch that implements codecs for CJK Macintosh encodings.
I tried to implement that just alike the other existing CJK codecs,
but it required many inefficient mapping tables due to their odd
mappings (like this: u'ABCDE
Hye-Shik Chang added the comment:
I'll take this.
--
assignee: lemburg -> hyeshik.chang
nosy: +hyeshik.chang
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.pyt
Hye-Shik Chang added the comment:
I couldn't find an appropriate method to implement in situ
compressed mapping table. AFAIK, python has the smallest
mapping table footprint for each charset among major open
source transcoding programs. I have thought about the
compression many times
Hye-Shik Chang added the comment:
I have generated compressed mapping tables by several ways.
I extracted mapping data into individual files and reorganized
them by translating into Python source code or archiving into a zip file.
The following table shows the result: (in kilobytes)
(also
Hye-Shik Chang added the comment:
I've generated the mapping table from ICU's CNS11643-1992 mapping.
I see that CNS11643 is quite rarely used in the internet, but it's the
only national standard character set in Taiwan. Asking Taiwanese
python users, even they didn't think
Changes by Hye-Shik Chang:
--
title: Adding new CNS11643 support, a *huge* charset, in cjkcodecs -> Adding
new CNS11643, a *huge* charset, support in cjkcodecs
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.o
New submission from Hye-Shik Chang:
This patch adds CNS11643 support into Python unicode codecs.
CNS11643 is a huge character which is used in EUC-TW and ISO-2022-CN.
CJKCodecs have had the CNS11643 support for 4 years at least,
but I dropped it because of its huge size in integrating into
New submission from Hye-Shik Chang:
Illegal identifier makes python crash on UTF-8 source codes/interpreters.
Python 3.0x (py3k:57555M, Aug 27 2007, 21:23:47)
[GCC 3.4.6 [FreeBSD] 20060305] on freebsd6
>>> compile(b'#coding:utf-8\n\xfc', '', 'exec'
16 matches
Mail list logo