On 2015-05-17 21:39, Johannes Bauer wrote: > Hey there, > > so that textwrap.wrap() breks non-breaking spaces, is this a bug or > intended behavior? For example: > > Python 3.4.0 (default, Apr 11 2014, 13:05:11) > [GCC 4.8.2] on linux > > >>> import textwrap > >>> for line in textwrap.wrap("foo dont\xa0break " * 20): > >>> print(line) > ... > foo dont break foo dont break foo dont break foo dont break foo dont > break foo dont break foo dont break foo dont break foo dont break > foo dont break foo dont break foo dont break foo dont break foo > dont break foo dont break foo dont break foo dont break foo dont > break foo dont break foo dont break > > Apparently it does recognize that \xa0 is a kind of space, but it > thinks it can break any space. The point of \xa0 being exactly to > avoid this kind of thing. > > Any remedy or ideas?
Since it uses a TextWrapper class, you can subclass that and then assert that the spaces found for splitting aren't non-breaking spaces. Note that, to use the "\u00a0" notation, the particular string has to be a non-raw string. You can compare the two regular expressions with those in the original source file in your $STDLIB/textwrap.py import textwrap import re class MyWrapper(textwrap.TextWrapper): wordsep_re = re.compile( '((?!\u00a0)\\s+|' # any whitespace r'[^\s\w]*\w+[^0-9\W]-(?=\w+[^0-9\W])|' # hyphenated words r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash # This less funky little regex just split on recognized spaces. E.g. # "Hello there -- you goof-ball, use the -b option!" # splits into # Hello/ /there/ /--/ /you/ /goof-ball,/ /use/ /the/ /-b/ /option!/ wordsep_simple_re = re.compile('((?!\u00a0)\\s+)') s = 'foo dont\u00a0break ' * 20 wrapper = MyWrapper() for line in wrapper.wrap(s): print(line) Based on my tests, it gives the results you were looking for. -tkc -- https://mail.python.org/mailman/listinfo/python-list