[issue5640] Wrong print() result when unicode error handler is not 'strict'

2009-04-01 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: Sorry. I just found that the fix breaks few other test units. I'll check. -- ___ Python tracker <http://bugs.python.org/i

[issue5640] Wrong print() result when unicode error handler is not 'strict'

2009-04-01 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: Right. Here I upload a patch to fix the addressed problem on cjkcodecs. Please test whether the patch corrects the behavior. -- keywords: +patch Added file: http://bugs.python.org/file13572/cjkcodecs-fix-statefulenc.diff

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2009-03-17 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: When I asked Taiwanese developers how often they use these character sets, it appeared that they are almost useless in the usual computing environment in Taiwan. This will only serve for a historical compatibility and literal standard compliance. I'm

[issue3594] PyTokenizer_FindEncoding() never succeeds

2008-09-03 Thread Hye-Shik Chang
Hye-Shik Chang <[EMAIL PROTECTED]> added the comment: pitrou, that's because Python source code can't be correctly tokenized when it's encoded in few odd encodings like iso-2022 or shift-jis which utilizes \, (, ) and " as second byte of two-byte character sequence.

[issue3685] Crash while compiling Python 3000 in OpenBSD 4.4

2008-08-26 Thread Hye-Shik Chang
Hye-Shik Chang <[EMAIL PROTECTED]> added the comment: This problem is due to OpenBSD's libc bug. It's fixed 3 days ago. (http://www.openbsd.org/cgi- bin/cvsweb/src/lib/libc/string/wcschr.c#rev1.4) We can workaround by replacing use of wcschr(ws, L'\0') to ws + wc

[issue1276] LookupError: unknown encoding: X-MAC-JAPANESE

2008-08-23 Thread Hye-Shik Chang
Hye-Shik Chang <[EMAIL PROTECTED]> added the comment: Committed patch "cjkmactemporary.diff" as r65988 in the py3k branch. I'll open another issue for cjkcodecs implementation of Mac codecs. -- resolution: -> fixed status: open -> closed ___

[issue1276] LookupError: unknown encoding: X-MAC-JAPANESE

2008-08-19 Thread Hye-Shik Chang
Changes by Hye-Shik Chang <[EMAIL PROTECTED]>: Added file: http://bugs.python.org/file11170/cjkmactemporary.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue1276] LookupError: unknown encoding: X-MAC-JAPANESE

2008-06-26 Thread Hye-Shik Chang
Changes by Hye-Shik Chang <[EMAIL PROTECTED]>: Added file: http://bugs.python.org/file10749/maccjkcodecs-1-py3k.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue1276] LookupError: unknown encoding: X-MAC-JAPANESE

2008-06-26 Thread Hye-Shik Chang
Hye-Shik Chang <[EMAIL PROTECTED]> added the comment: Added a patch that implements codecs for CJK Macintosh encodings. I tried to implement that just alike the other existing CJK codecs, but it required many inefficient mapping tables due to their odd mappings (like this: u'ABCDE&#x

[issue1276] LookupError: unknown encoding: X-MAC-JAPANESE

2008-02-24 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: I'll take this. -- assignee: lemburg -> hyeshik.chang nosy: +hyeshik.chang __ Tracker <[EMAIL PROTECTED]> <http://bugs.pyt

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-14 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: I couldn't find an appropriate method to implement in situ compressed mapping table. AFAIK, python has the smallest mapping table footprint for each charset among major open source transcoding programs. I have thought about the compression many times

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-14 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: I have generated compressed mapping tables by several ways. I extracted mapping data into individual files and reorganized them by translating into Python source code or archiving into a zip file. The following table shows the result: (in kilobytes) (also

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-11 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: I've generated the mapping table from ICU's CNS11643-1992 mapping. I see that CNS11643 is quite rarely used in the internet, but it's the only national standard character set in Taiwan. Asking Taiwanese python users, even they didn't think

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-11 Thread Hye-Shik Chang
Changes by Hye-Shik Chang: -- title: Adding new CNS11643 support, a *huge* charset, in cjkcodecs -> Adding new CNS11643, a *huge* charset, support in cjkcodecs __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.o

[issue2066] Adding new CNS11643 support, a *huge* charset, in cjkcodecs

2008-02-11 Thread Hye-Shik Chang
New submission from Hye-Shik Chang: This patch adds CNS11643 support into Python unicode codecs. CNS11643 is a huge character which is used in EUC-TW and ISO-2022-CN. CJKCodecs have had the CNS11643 support for 4 years at least, but I dropped it because of its huge size in integrating into

[issue1037] Ill-coded identifier crashes python when coding spec is utf-8

2007-08-27 Thread Hye-Shik Chang
New submission from Hye-Shik Chang: Illegal identifier makes python crash on UTF-8 source codes/interpreters. Python 3.0x (py3k:57555M, Aug 27 2007, 21:23:47) [GCC 3.4.6 [FreeBSD] 20060305] on freebsd6 >>> compile(b'#coding:utf-8\n\xfc', '', 'exec'