On Mon, 18 Oct 2010 01:04:09 +0100, Rhodri James wrote: > On Sun, 17 Oct 2010 20:59:22 +0100, Dun Peal <dunpea...@gmail.com> > wrote: > >> `all_ascii(L)` is a function that accepts a list of strings L, and >> returns True if all of those strings contain only ASCII chars, False >> otherwise. >> >> What's the fastest way to implement `all_ascii(L)`? >> >> My ideas so far are: >> >> 1. Match against a regexp with a character range: `[ -~]` 2. Use >> s.decode('ascii') >> 3. `return all(31< ord(c) < 127 for s in L for c in s)` > > Don't call it "all_ascii" when you don't mean that; all_printable would > be more accurate,
Neither is accurate. all_ascii would be: all(ord(c) <= 127 for c in string for string in L) all_printable would be considerably harder. As far as I can tell, there's no simple way to tell if a character is printable. You can look at the Unicode category, given by unicodedata.category(c), and then decide whether or not it is printable. (Note though that printable characters will not necessarily print, since the later relies on there being a glyph available to print. Not all fonts include glyphs for all printable character.) It might be easier to just ignore control characters, and assume everything else is printable: all(unicodedata.category(c) != 'Cc' for c in string for string in L) If you limit yourself to bytes instead of strings, it's easier: import string all(c in string.printable for c in s for s in L) As for what is faster, that's what timeit and the profiler are for: timeit to find out which is faster, and the profiler to find out whether it's worse spending the time to find out which is faster. -- Steven -- http://mail.python.org/mailman/listinfo/python-list