On Wednesday 17 June 2009, Lie Ryan wrote: > Wolfgang Rohdewald wrote: > > On Wednesday, 17. June 2009, Steven D'Aprano wrote: > >> while text: > >> for c in text: > >> if c not in printable: return False > > > > that is one loop per character. > > unless printable is a set
that would still execute the line "if c not in..." once for every single character, against just one regex call. With bigger block sizes, the advantage of regex should increase. > > wouldn't it be faster to apply a regex to text? > > something like > > > > while text: > > if re.search(r'\W',text): return False > > > > regex? Don't even start... Here comes a cProfile test. Note that the first variant of Steven would always have stopped after the first char. After fixing that making it look like variant 2 with block size=1, I now have 3 variants: Variant 1 Blocksize 1 Variant 2 Blocksize 65536 Variant 3 Regex on Blocksize 65536 testing for a file with 400k bytes shows regex as a clear winner. Doing the same for an 8k file: variant 2 takes 3ms, Regex takes 5ms. Variants 2 and 3 take about the same time for a file with 20k. python ascii.py | grep CPU 398202 function calls in 1.597 CPU seconds 13 function calls in 0.104 CPU seconds 1181 function calls in 0.012 CPU seconds import re import cProfile from string import printable def ascii_file1(name): with open(name, 'rb') as f: c = f.read(1) while c: if c not in printable: return False c = f.read(1) return True def ascii_file2(name): bs = 65536 with open(name, 'rb') as f: text = f.read(bs) while text: for c in text: if c not in printable: return False text = f.read(bs) return True def ascii_file3(name): bs = 65536 search = r'[^%s]' % re.escape(printable) reco = re.compile(search) with open(name, 'rb') as f: text = f.read(bs) while text: if reco.search(text): return False text = f.read(bs) return True def test(fun): if fun('/tmp/x'): print 'is ascii' else: print 'is not ascii' cProfile.run("test(ascii_file1)") cProfile.run("test(ascii_file2)") cProfile.run("test(ascii_file3)") -- Wolfgang -- http://mail.python.org/mailman/listinfo/python-list