Mark Mc Mahon <mtnbikingm...@gmail.com> added the comment: How about the following patch and tests...
Per: http://msdn.microsoft.com/en-us/library/aa369212(v=vs.85).aspx """The Identifier data type is a text string. Identifiers may contain the ASCII characters A-Z (a-z), digits, underscores (_), or periods (.). However, every identifier must begin with either a letter or an underscore.""" So the spec would say that colons are NOT allowed. Editing some entries in the File table of an MSI (using Orca from the MSI SDK) and running the validation confirms that. All the following were flagged as errors: 'KDiff3EXE;"ASDF@#$', 'chmFile-', 'pdfFile(', 'hgbook]', 'TortoisePlinkEXE]', 'Hg.Cämd' I also did some speed testing (just in case non/regex might be slow) Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from timeit import timeit >>> setup = 'import string\nidentifier_chars = string.ascii_letters + >>> string.digits + "._"\ntmp_str = []' >>> timeit("re.sub(r'[^a-zA-Z_\.]', '_', 'somefilename.txt')", setup = "import >>> re") 4.434621757767205 >>> setup = 'import string\nidentifier_chars = string.ascii_letters + >>> string.digits + "._"\ntmp_str = []' >>> timeit('"".join([c if c in identifier_chars else "_" for c in >>> "somefilename.txt"])', setup) 3.3757537425069906 >>> ---------- keywords: +patch nosy: +markm Added file: http://bugs.python.org/file21408/make_id_fix_and_test.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue2694> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com