New submission from Hye-Shik Chang:
Illegal identifier makes python crash on UTF-8 source codes/interpreters.
Python 3.0x (py3k:57555M, Aug 27 2007, 21:23:47)
[GCC 3.4.6 [FreeBSD] 20060305] on freebsd6
>>> compile(b'#coding:utf-8\n\xfc', '', 'exec'
Hye-Shik Chang <[EMAIL PROTECTED]> added the comment:
Added a patch that implements codecs for CJK Macintosh encodings.
I tried to implement that just alike the other existing CJK codecs,
but it required many inefficient mapping tables due to their odd
mappings (like this: u'ABCDE
Changes by Hye-Shik Chang <[EMAIL PROTECTED]>:
Added file: http://bugs.python.org/file10749/maccjkcodecs-1-py3k.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Changes by Hye-Shik Chang <[EMAIL PROTECTED]>:
Added file: http://bugs.python.org/file11170/cjkmactemporary.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Hye-Shik Chang <[EMAIL PROTECTED]> added the comment:
Committed patch "cjkmactemporary.diff" as r65988 in the py3k branch.
I'll open another issue for cjkcodecs implementation of Mac codecs.
--
resolution: -> fixed
status: open -> closed
___
Hye-Shik Chang <[EMAIL PROTECTED]> added the comment:
This problem is due to OpenBSD's libc bug.
It's fixed 3 days ago. (http://www.openbsd.org/cgi-
bin/cvsweb/src/lib/libc/string/wcschr.c#rev1.4)
We can workaround by replacing use of wcschr(ws, L'\0') to ws +
wc
Hye-Shik Chang <[EMAIL PROTECTED]> added the comment:
pitrou, that's because Python source code can't be correctly tokenized
when it's encoded in few odd encodings like iso-2022 or shift-jis which
utilizes \, (, ) and " as second byte of two-byte character sequence.
New submission from Hye-Shik Chang:
This patch adds CNS11643 support into Python unicode codecs.
CNS11643 is a huge character which is used in EUC-TW and ISO-2022-CN.
CJKCodecs have had the CNS11643 support for 4 years at least,
but I dropped it because of its huge size in integrating into
Changes by Hye-Shik Chang:
--
title: Adding new CNS11643 support, a *huge* charset, in cjkcodecs -> Adding
new CNS11643, a *huge* charset, support in cjkcodecs
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.o
Hye-Shik Chang added the comment:
I've generated the mapping table from ICU's CNS11643-1992 mapping.
I see that CNS11643 is quite rarely used in the internet, but it's the
only national standard character set in Taiwan. Asking Taiwanese
python users, even they didn't think
Hye-Shik Chang added the comment:
I have generated compressed mapping tables by several ways.
I extracted mapping data into individual files and reorganized
them by translating into Python source code or archiving into a zip file.
The following table shows the result: (in kilobytes)
(also
Hye-Shik Chang added the comment:
I couldn't find an appropriate method to implement in situ
compressed mapping table. AFAIK, python has the smallest
mapping table footprint for each charset among major open
source transcoding programs. I have thought about the
compression many times
Hye-Shik Chang added the comment:
I'll take this.
--
assignee: lemburg -> hyeshik.chang
nosy: +hyeshik.chang
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.pyt
Hye-Shik Chang added the comment:
When I asked Taiwanese developers how often they use these character
sets, it appeared that they are almost useless in the usual computing
environment in Taiwan. This will only serve for a historical
compatibility and literal standard compliance. I'm
Hye-Shik Chang added the comment:
Right.
Here I upload a patch to fix the addressed problem on cjkcodecs.
Please test whether the patch corrects the behavior.
--
keywords: +patch
Added file: http://bugs.python.org/file13572/cjkcodecs-fix-statefulenc.diff
Hye-Shik Chang added the comment:
Sorry. I just found that the fix breaks few other test units.
I'll check.
--
___
Python tracker
<http://bugs.python.org/i
16 matches
Mail list logo