[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2013-07-24 Thread Jakub Wilk
Changes by Jakub Wilk : -- nosy: +jwilk ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/m

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2010-08-12 Thread STINNER Victor
STINNER Victor added the comment: Hyeshik Chang, who opened this issue, wrote (msg83672) "When I asked Taiwanese developers how often they use these character sets, it appeared that they are almost useless in the usual computing environment in Taiwan. This will only serve for a historical comp

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2010-08-08 Thread Terry J. Reedy
Terry J. Reedy added the comment: It seems to me that the last few messages suggest that this should be closed. -- nosy: +terry.reedy versions: +Python 3.2 -Python 2.7, Python 3.1 ___ Python tracker ___

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2009-03-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-03-17 13:30, Hye-Shik Chang wrote: > Hye-Shik Chang added the comment: > > When I asked Taiwanese developers how often they use these character > sets, it appeared that they are almost useless in the usual computing > environment in Taiwan. This w

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2009-03-17 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: When I asked Taiwanese developers how often they use these character sets, it appeared that they are almost useless in the usual computing environment in Taiwan. This will only serve for a historical compatibility and literal standard compliance. I'm quite neu

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2009-03-17 Thread Antoine Pitrou
Antoine Pitrou added the comment: Le mardi 17 mars 2009 à 10:56 +, Marc-Andre Lemburg a écrit : > +1 > > As mentioned several times on the ticket: static C data is not really > something to worry about these days. Well, I suggest that someone familiar with the codec-building machinery do t

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2009-03-17 Thread STINNER Victor
Changes by STINNER Victor : -- nosy: +haypo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.o

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2009-03-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-03-14 02:32, Antoine Pitrou wrote: > Antoine Pitrou added the comment: > > Based on the feedback above, it seems this should be committed, > shouldn't it? +1 As mentioned several times on the ticket: static C data is not really something to worry

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2009-03-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: Based on the feedback above, it seems this should be committed, shouldn't it? -- nosy: +pitrou stage: -> commit review type: -> feature request versions: +Python 2.7, Python 3.1 -Python 2.6, Python 3.0 ___ Python t

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Whether or not to keep placing all builtin modules into the Windows Python DLL is not really a question to be discussed on the tracker. Given the size of the Python DLL (around 2MB) and the extra 350kB that the support for CNS11643 would cost, I think such a

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-16 Thread Giovanni Bajo
Giovanni Bajo added the comment: Making the standard Windows Python DLL larger is not only a problem of disk size: it will make all packages produced by PyInstaller or py2exe larger, and that means lots of wasted bandwidth. I see that MvL is still -1 on simply splitting CJK codecs out, and vetos

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-14 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: In that case, I'm +1 on adding it. The OS won't load those tables unless really needed, so it's more a question of disk space than anything else. __ Tracker <[EMAIL PROTECTED]> _

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-14 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: I couldn't find an appropriate method to implement in situ compressed mapping table. AFAIK, python has the smallest mapping table footprint for each charset among major open source transcoding programs. I have thought about the compression many times, but every

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-14 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: I think Martin was looking for other optimizations that still leave the data in a static C const (in order to be shared between processes and only loaded on demand), but do compress the data representation, e.g. using some form of Huffman coding. While I don

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-14 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: I have generated compressed mapping tables by several ways. I extracted mapping data into individual files and reorganized them by translating into Python source code or archiving into a zip file. The following table shows the result: (in kilobytes) (also avail

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-11 Thread Hye-Shik Chang
Hye-Shik Chang added the comment: I've generated the mapping table from ICU's CNS11643-1992 mapping. I see that CNS11643 is quite rarely used in the internet, but it's the only national standard character set in Taiwan. Asking Taiwanese python users, even they didn't think that it's necessary to

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-11 Thread Kuang-che Wu
Kuang-che Wu added the comment: FYI, according to the new spec of cns11643-2004 (you can search the preview from http://www.cnsonline.com.tw/, at http://www.cnsonline.com.tw/preview/preview.jsp? general_no=1164300&language=C&pagecount=524). >From page 499, it mensioned an URL http://www.cnscode

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-11 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Some background information: http://www.cns11643.gov.tw/eng/word.jsp The most recent version appears to be: "CNS11643-2004", sometimes also called "CNS11643 version 3" or "CNS11643-3" (http://docs.hp.com/en/5991-7974/5991-7974.pdf). Here's the table for ver

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-11 Thread Martin v. Löwis
Martin v. Löwis added the comment: BTW, which version of CNS11643 does that implement? AFAICT, there is CNS 11643-1986 and CNS 11643-1992. Where did you get the Unicode mapping from? __ Tracker <[EMAIL PROTECTED]> ___

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-11 Thread Martin v. Löwis
Martin v. Löwis added the comment: I would like to see whether a compression mechanism of the tables could be found. If all else fails, compressing with raw zlib might improve things, but before that, I think other compression techniques should be studied. I'm still -1 on ad-hoc exclusion of ext

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-11 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: In this case let's put the cjkcodecs modules in their own DLL(s) on win32. -- nosy: +amaury.forgeotdarc __ Tracker <[EMAIL PROTECTED]> __ __

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-11 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: How often would this character set be needed ? In any case, using a (pre)compiler switch is not a good idea. Please add support to enable/disable the support via a configure switch. -- nosy: +lemburg __ Tracker <[EMA

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-11 Thread Hye-Shik Chang
Changes by Hye-Shik Chang: -- title: Adding new CNS11643 support, a *huge* charset, in cjkcodecs -> Adding new CNS11643, a *huge* charset, support in cjkcodecs __ Tracker <[EMAIL PROTECTED]>