On May 8, 5:18 pm, Patrick Maupin <pmau...@gmail.com> wrote: > On May 8, 1:16 pm, dasacc22 <dasac...@gmail.com> wrote: > > > > > > > On May 8, 12:59 pm, Patrick Maupin <pmau...@gmail.com> wrote: > > > > On May 8, 12:19 pm, dasacc22 <dasac...@gmail.com> wrote: > > > > > Hi > > > > > This is a simple question. I'm looking for the fastest way to > > > > calculate the leading whitespace (as a string, ie ' '). > > > > > Here are some different methods I have tried so far > > > > --- solution 1 > > > > > a = ' some content\n' > > > > b = a.strip() > > > > c = ' '*(len(a)-len(b)) > > > > > --- solution 2 > > > > > a = ' some content\n' > > > > b = a.strip() > > > > c = a.partition(b[0])[0] > > > > > --- solution 3 > > > > > def get_leading_whitespace(s): > > > > def _get(): > > > > for x in s: > > > > if x != ' ': > > > > break > > > > yield x > > > > return ''.join(_get()) > > > > > --- > > > > > Solution 1 seems to be about as fast as solution 2 except in certain > > > > circumstances where the value of b has already been determined for > > > > other purposes. Solution 3 is slower due to the function overhead. > > > > > Curious to see what other types of solutions people might have. > > > > > Thanks, > > > > Daniel > > > > Well, you could try a solution using re, but that's probably only > > > likely to be faster if you can use it on multiple concatenated lines. > > > I usually use something like your solution #1. One thing to be aware > > > of, though, is that strip() with no parameters will strip *any* > > > whitespace, not just spaces, so the implicit assumption in your code > > > that what you have stripped is spaces may not be justified (depending > > > on the source data). OTOH, depending on how you use that whitespace > > > information, it may not really matter. But if it does matter, you can > > > use strip(' ') > > > > If speed is really an issue for you, you could also investigate > > > mxtexttools, but, like re, it might perform better if the source > > > consists of several batched lines. > > > > Regards, > > > Pat > > > Hi, > > > thanks for the info. Using .strip() to remove all whitespace in > > solution 1 is a must. If you only stripped ' ' spaces then line > > endings would get counted in the len() call and when multiplied > > against ' ', would produce an inaccurate result. Regex is > > significantly slower for my purposes but ive never heard of > > mxtexttools. Even if it proves slow its spurred my curiousity as to > > what functionality it provides (on an unrelated note) > > Could you reorganize your code to do multiple lines at a time? That > might make regex competitive. > > Regards, > Pat
I have tried this already, the problem here is that it's not a trivial matter. Iterating over each line is unavoidable, and I found that using various python builtins to perform string operations (like say the wonderful partition builtin) during each iteration works 3 fold faster then regexing the entire document with various needs. Another issue is having to keep a line count and when iterating over regex matches and counting lines, it doesn't scale nearly as well as a straight python solution using builtins to process the information. At the heart of this here, determining the leading white-space is a trivial matter. I have much more complex problems to deal with. I was much more interested in seeing what kind of solutions ppl would come up with to such a problem, and perhaps uncover something new in python that I can apply to a more complex problem. What spurred the thought was this piece written up by guido concerning "what's the best way to convert a list of integers into a string". It's a simple question where concepts are introduced that can lead to solving more complex problems. http://www.python.org/doc/essays/list2str.html -- http://mail.python.org/mailman/listinfo/python-list