On May 8, 1:16 pm, dasacc22 <dasac...@gmail.com> wrote: > On May 8, 12:59 pm, Patrick Maupin <pmau...@gmail.com> wrote: > > > > > On May 8, 12:19 pm, dasacc22 <dasac...@gmail.com> wrote: > > > > Hi > > > > This is a simple question. I'm looking for the fastest way to > > > calculate the leading whitespace (as a string, ie ' '). > > > > Here are some different methods I have tried so far > > > --- solution 1 > > > > a = ' some content\n' > > > b = a.strip() > > > c = ' '*(len(a)-len(b)) > > > > --- solution 2 > > > > a = ' some content\n' > > > b = a.strip() > > > c = a.partition(b[0])[0] > > > > --- solution 3 > > > > def get_leading_whitespace(s): > > > def _get(): > > > for x in s: > > > if x != ' ': > > > break > > > yield x > > > return ''.join(_get()) > > > > --- > > > > Solution 1 seems to be about as fast as solution 2 except in certain > > > circumstances where the value of b has already been determined for > > > other purposes. Solution 3 is slower due to the function overhead. > > > > Curious to see what other types of solutions people might have. > > > > Thanks, > > > Daniel > > > Well, you could try a solution using re, but that's probably only > > likely to be faster if you can use it on multiple concatenated lines. > > I usually use something like your solution #1. One thing to be aware > > of, though, is that strip() with no parameters will strip *any* > > whitespace, not just spaces, so the implicit assumption in your code > > that what you have stripped is spaces may not be justified (depending > > on the source data). OTOH, depending on how you use that whitespace > > information, it may not really matter. But if it does matter, you can > > use strip(' ') > > > If speed is really an issue for you, you could also investigate > > mxtexttools, but, like re, it might perform better if the source > > consists of several batched lines. > > > Regards, > > Pat > > Hi, > > thanks for the info. Using .strip() to remove all whitespace in > solution 1 is a must. If you only stripped ' ' spaces then line > endings would get counted in the len() call and when multiplied > against ' ', would produce an inaccurate result. Regex is > significantly slower for my purposes but ive never heard of > mxtexttools. Even if it proves slow its spurred my curiousity as to > what functionality it provides (on an unrelated note)
Could you reorganize your code to do multiple lines at a time? That might make regex competitive. Regards, Pat -- http://mail.python.org/mailman/listinfo/python-list