On May 9, 8:28 am, John Machin <sjmac...@lexicon.net> wrote: > dasacc22 <dasacc22 <at> gmail.com> writes: > > > > > U presume entirely to much. I have a preprocessor that normalizes > > documents while performing other more complex operations. Theres > > nothing buggy about what im doing > > Are you sure? > > Your "solution" calculates (the number of leading whitespace characters) + > (the > number of TRAILING whitespace characters). > > Problem 1: including TRAILING whitespace. > Example: "content" + 3 * " " + "\n" has 4 leading spaces according to your > reckoning; should be 0. > Fix: use lstrip() instead of strip() > > Problem 2: assuming all whitespace characters have *effective* width the same > as > " ". > Examples: TAB has width 4 or 8 or whatever you want it to be. There are quite > a > number of whitespace characters, even when you stick to ASCII. When you look > at > Unicode, there are heaps more. Here's a list of BMP characters such that > character.isspace() is True, showing the Unicode codepoint, the Python repr(), > and the name of the character (other than for control characters): > > U+0009 u'\t' ? > U+000A u'\n' ? > U+000B u'\x0b' ? > U+000C u'\x0c' ? > U+000D u'\r' ? > U+001C u'\x1c' ? > U+001D u'\x1d' ? > U+001E u'\x1e' ? > U+001F u'\x1f' ? > U+0020 u' ' SPACE > U+0085 u'\x85' ? > U+00A0 u'\xa0' NO-BREAK SPACE > U+1680 u'\u1680' OGHAM SPACE MARK > U+2000 u'\u2000' EN QUAD > U+2001 u'\u2001' EM QUAD > U+2002 u'\u2002' EN SPACE > U+2003 u'\u2003' EM SPACE > U+2004 u'\u2004' THREE-PER-EM SPACE > U+2005 u'\u2005' FOUR-PER-EM SPACE > U+2006 u'\u2006' SIX-PER-EM SPACE > U+2007 u'\u2007' FIGURE SPACE > U+2008 u'\u2008' PUNCTUATION SPACE > U+2009 u'\u2009' THIN SPACE > U+200A u'\u200a' HAIR SPACE > U+200B u'\u200b' ZERO WIDTH SPACE > U+2028 u'\u2028' LINE SEPARATOR > U+2029 u'\u2029' PARAGRAPH SEPARATOR > U+202F u'\u202f' NARROW NO-BREAK SPACE > U+205F u'\u205f' MEDIUM MATHEMATICAL SPACE > U+3000 u'\u3000' IDEOGRAPHIC SPACE > > Hmmm, looks like all kinds of widths, from zero upwards.
I unfortunately mixed the solution with a string that would never make it in the state i typed it in, the trailing whitespace This is my fault -- http://mail.python.org/mailman/listinfo/python-list