Walter Dörwald <wal...@livinglogic.de> added the comment: Martin v. Löwis wrote: > Martin v. Löwis <mar...@v.loewis.de> added the comment: > > I think the patch is incorrect: the default value for the script > property ought to be Unknown, not Common (despite UCD.html saying the > contrary; see UTR#24 and Scripts.txt).
Fixed. > I'm puzzled why you use a hard-coded list of script names. The set of > scripts will certainly change across Unicode versions, and I think it > would be better to learn the script names from Scripts.txt. I hardcoded the list, because I saw no easy way to get the indexes consistent across both versions of the database. > Out of curiosity: how does the addition of the script property affect > the number of distinct database records, and the total size of the database? I'm not exactly sure how to measure this, but the length of _PyUnicode_Database_Records goes from 229 entries to 690 entries. If it's any help I can post the output of makeunicodedata.py. > I think a common application would be lower-cases script names, for more > efficient comparison; UCD has also changed the spelling of the script > names over time (from being all-capital before). So I propose that > a) two functions are provided: one with the original script names, and > one with the lower-case script names It this really neccessary, if we only have one version of the database? > b) keep cached versions of interned script name strings in separate > arrays, to avoid PyString_FromString every time. Implemented. > I'm doubtful that script names need to be provided for old database > versions, so I would be happy to not record the script for old versions, > and raise an exception if somebody tries to get the script for an old > database version - surely applications of the old database records won't > be accessing the script property, anyway. OK, I've removed the script_changes info for the old database. (And with this change the list of script names is no longer hardcoded). Here's a new version of the patch (unicode-script-2.diff). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6331> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com