U presume entirely to much. I have a preprocessor that normalizes documents while performing other more complex operations. Theres nothing buggy about what im doing
On May 8, 1:46 pm, Steven D'Aprano <st...@remove-this- cybersource.com.au> wrote: > On Sat, 08 May 2010 10:19:16 -0700, dasacc22 wrote: > > Hi > > > This is a simple question. I'm looking for the fastest way to calculate > > the leading whitespace (as a string, ie ' '). > > Is calculating the amount of leading whitespace really the bottleneck in > your application? If not, then trying to shave off microseconds from > something which is a trivial part of your app is almost certainly a waste > of your time. > > [...] > > > a = ' some content\n' > > b = a.strip() > > c = ' '*(len(a)-len(b)) > > I take it that you haven't actually tested this code for correctness, > because it's buggy. Let's test it: > > >>> leading_whitespace = " "*2 + "\t"*2 > >>> a = leading_whitespace + "some non-whitespace text\n" > >>> b = a.strip() > >>> c = " "*(len(a)-len(b)) > >>> assert c == leading_whitespace > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > AssertionError > > Not only doesn't it get the whitespace right, but it doesn't even get the > *amount* of whitespace right: > > >>> assert len(c) == len(leading_whitespace) > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > AssertionError > > It doesn't even work correctly if you limit "whitespace" to mean spaces > and nothing else! It's simply wrong in every possible way. > > This is why people say that premature optimization is the root of all > (programming) evil. Instead of wasting time and energy trying to optimise > code, you should make it correct first. > > Your solutions 2 and 3 are also buggy. And solution 3 can be easily re- > written to be more straightforward. Instead of the complicated: > > > def get_leading_whitespace(s): > > def _get(): > > for x in s: > > if x != ' ': > > break > > yield x > > return ''.join(_get()) > > try this version: > > def get_leading_whitespace(s): > accumulator = [] > for c in s: > if c in ' \t\v\f\r\n': > accumulator.append(c) > else: > break > return ''.join(accumulator) > > Once you're sure this is correct, then you can optimise it: > > def get_leading_whitespace(s): > t = s.lstrip() > return s[:len(s)-len(t)] > > >>> c = get_leading_whitespace(a) > >>> assert c == leading_whitespace > > Unless your strings are very large, this is likely to be faster than any > other pure-Python solution you can come up with. > > -- > Steven -- http://mail.python.org/mailman/listinfo/python-list