Mark Mc Mahon <mtnbikingm...@gmail.com> added the comment:

How about the following patch and tests...

Per: http://msdn.microsoft.com/en-us/library/aa369212(v=vs.85).aspx
"""The Identifier data type is a text string. Identifiers may contain the
ASCII characters A-Z (a-z), digits, underscores (_), or periods (.). However, 
every identifier must begin with either a letter or an underscore."""

So the spec would say that colons are NOT allowed. Editing some entries in the 
File table of an MSI (using Orca from the MSI SDK) and running the validation 
confirms that.

All the following were flagged as errors:
'KDiff3EXE;"ASDF@#$', 'chmFile-', 'pdfFile(', 'hgbook]', 'TortoisePlinkEXE]', 
'Hg.Cämd'

I also did some speed testing (just in case non/regex might be slow)
Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on 
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import timeit
>>> setup = 'import string\nidentifier_chars = string.ascii_letters + 
>>> string.digits + "._"\ntmp_str = []'
>>> timeit("re.sub(r'[^a-zA-Z_\.]', '_', 'somefilename.txt')", setup = "import 
>>> re")
4.434621757767205
>>> setup = 'import string\nidentifier_chars = string.ascii_letters + 
>>> string.digits + "._"\ntmp_str = []'
>>> timeit('"".join([c if c in identifier_chars else "_" for c in 
>>> "somefilename.txt"])', setup)
3.3757537425069906
>>>

----------
keywords: +patch
nosy: +markm
Added file: http://bugs.python.org/file21408/make_id_fix_and_test.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue2694>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to