John Machin wrote: > [expletives deleted] and it was wrong anyway (according to your > requirements); > using \w would keep '_' which is *NOT* alphanumeric.
Actually the perl is correct, the explanation was the faulty part. When in doubt, trust the code. Plus I explicitly allowed _ further down, so the mistake should have been fairly obvious. > >>> alphabet = 'qwertyuiopasdfghjklzxcvbnm' # Look, Ma, no thought > required!! Monkey see, monkey type. I won't dignify that with a response. The code that is, I could give a toss about the comments. If you enjoy using such verbose, error-prone representations in your code, god help anyone maintaining it. Including you six months later. Quick, find the difference between these sets at a glance: 'qwertyuiopasdfghjklzxcvbnm' 'abcdefghijklmnopqrstuvwxyz' 'abcdefghijklmnopprstuvwxyz' 'abcdefghijk1mnopqrstuvwxyz' 'qwertyuopasdfghjklzxcvbnm' # no fair peeking And I won't even bring up locales. > >>> keepchars = set(alphabet + alphabet.upper() + '1234567890-.') > >>> fixer = lambda x: ''.join(c for c in x if c in keepchars) Those darn monkeys, always think they're so clever! ;) if "you can" == "you should": do(it) else: do(not) >> Sadly I can find no such beast. Anyone have any insight? As of now, >> regexes look like the best solution. > > I'll leave it to somebody else to dredge up the standard riposte to your > last sentence :-) If the monstrosity above is the best you've got, regexes are clearly the better solution. Readable trumps inscrutable any day. > One point on your requirements: replacing unwanted characters instead of > deleting them may be better -- theoretically possible problems with > deleting are: (1) duplicates (foo and foo_ become the same) (2) '_' > becomes '' which is not a valid filename. Which is why I perform checks for emptiness and uniqueness after the strip. I decided long ago that stripping is preferable to replacement here. > And a legibility problem: if > you hate '_' and ' ' so much, why not change them to '-'? _ is allowed. And I do prefer -, but not for legibility. It doesn't require me to hit Shift. > Oh and just in case the fix was accidentally applied to a path: > > keepchars.update(os.sep) > if os.altsep: keepchars.update(os.altsep) Nope, like I said this is strictly a filename. Stripping out path components is the first thing I do. But thanks for pointing out these common pitfalls for members of our studio audience. Tell him what he's won, Johnny! ;) -- http://mail.python.org/mailman/listinfo/python-list