[issue22687] horrible performance of textwrap.wrap() with a long word

2015-03-24 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___ _

[issue22687] horrible performance of textwrap.wrap() with a long word

2015-03-24 Thread Roundup Robot
Roundup Robot added the comment: New changeset 7bd87a219813 by Serhiy Storchaka in branch 'default': Issue #22687: Fixed some corner cases in breaking words in tetxtwrap. https://hg.python.org/cpython/rev/7bd87a219813 -- nosy: +python-dev ___ Python t

[issue22687] horrible performance of textwrap.wrap() with a long word

2015-02-15 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: wordsplit_3.patch is wordsplit_2.patch with few added comments in tests. Is it enough? -- Added file: http://bugs.python.org/file38149/wordsplit_3.patch ___ Python tracker _

[issue22687] horrible performance of textwrap.wrap() with a long word

2015-01-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Ping. What can I do to move this issue forward? -- ___ Python tracker ___ ___ Python-bugs-list mai

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-21 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: So what the patch (with mitigated tests) is more preferable? -- ___ Python tracker ___ ___ Python-

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: > What I usually do in cases like this is to add the tests but mark > them with comments saying that the tests test current behavior but > are not testing parts of the (currently defined) API. That way > you know if a change changes behavior and then can decide

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > To clarify, I would be fine with the previous patch if it didn't add the > tests. The absent of tests could cause introducing new non-detected bugs and reappearing old bugs. > Obvious according to which rules? If you think a word should be splitted befor

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread R. David Murray
R. David Murray added the comment: What I usually do in cases like this is to add the tests but mark them with comments saying that the tests test current behavior but are not testing parts of the (currently defined) API. That way you know if a change changes behavior and then can decide if t

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Aren't ['this-', 'is-a', '-useful-', 'feature'] and > ['what-d', "'you-", 'call-', 'it.'] obvious bugs? Obvious according to which rules? If we want to improve the behaviour of textwrap, IMHO it should be in a separate issue. And someone would have to study

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: To clarify, I would be fine with the previous patch if it didn't add the tests. -- ___ Python tracker ___ __

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : Added file: http://bugs.python.org/file37190/wordsplit_2.patch ___ Python tracker ___ ___ Python-bugs-list maili

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Thank you David. If splitting single letter surrounded with hyphens is desirable, here is more complicated patch which does this. It deviates from original code more, but it doesn't look break any reasonable example. > The textwrap behaviour is underspecifie

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread R. David Murray
R. David Murray added the comment: https://owl.english.purdue.edu/owl/resource/576/01/ Rule 8. So, no, in the middle of the word single letters aren't a problem, only at the beginning or the end of the word. -- nosy: +r.david.murray ___ Python trac

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: > This is old rule. \w{2,}-(?=\w{2,} -- single letter shouldn't be separated. I don't agree. This was an implementation detail. There was no test, and it wasn't specified anywhere. If you think single letter shouldn't be separated, there should be some grammat

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: This is old rule. \w{2,}-(?=\w{2,} -- single letter shouldn't be separated. But there was a bug in such simple regex, it splits a word after non-word character (in particular apostrophe or hyphen) if it followed by word characters and hyphen. There were atte

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- versions: -Python 2.7, Python 3.4 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: I don't understand: +expect = ("this-|is-a-useful-|feature-|for-|" + "reformatting-|posts-|from-|tim-|peters'ly").split('|') +self.check_wrap(text, 1, expect, break_long_words=False) +self.check_split(text, expect) Why w

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread Georg Brandl
Georg Brandl added the comment: LGTM. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.or

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-13 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here is a patch which is closer to current code but solves complexity issue and also fixes some bugs in current code. $ ./python -c "import textwrap; print(textwrap.wrap('this-is-a-useful-feature', width=1, break_long_words=False))" ['this-', 'is-a', '-usefu

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Tests are not perfect. But this is intentional design. The part of initial regex: r'\w{2,}-(?=\w{2,})|' # hyphenated words Now it is more complicated. Note '(?=\w{2,})'. -- ___ Python tracker

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-12 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I suppose this is common rule in many languages. I frankly don't know about this rule. And the tests don't check for it, so for me it's not broken. -- ___ Python tracker __

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Why not? I guess it depends on English's rules for word splitting, which I > don't know. I suppose this is common rule in many languages. And current code supports it (there is a special code in the regex to ensure this rule). > In any case, this issue is

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-12 Thread Antoine Pitrou
Antoine Pitrou added the comment: > But this is not totally correct, one-letter word should not be > separated. Why not? I guess it depends on English's rules for word splitting, which I don't know. In any case, this issue is not about improving correctness, only performance. -- _

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Current regex produces insane result. $ ./python -c "import textwrap; print(textwrap.wrap('this-is-a-useful-feature', width=1, break_long_words=False))" ['this-', 'is-a', '-useful-', 'feature'] Antoine's regex produces more correct result for this case: ['th

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Possessive quantifiers (issue433030) is not a panacea. They allow to speed up regular expressions, but the complexity is still quadratic. Antoine's patch makes the complexity linear. -- ___ Python tracker

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: Yes... but in both cases the result is nonsensical, and untested. -- ___ Python tracker ___ ___ Pyth

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Oh, sorry, I tested your first patch. Your second patch is faster than current code to me. But it changes behavior. >>> textwrap.wrap('"1a-2b', width=5) ['"1a-', '2b'] With the patch the result is ['"1a-2', 'b']. --

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: Are you sure? I get the reverse results here (second patch): Unpatched: $ ./python -m timeit -s 'import textwrap; s = "abcde " * 10**4' -- 'textwrap.wrap(s)' 10 loops, best of 3: 27 msec per loop Patched: $ ./python -m timeit -s 'import textwrap; s = "abcde "

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Unfortunately there are two disadvantages: 1. wordsep_re and wordsep_simple_re are public attributes and user code can depend on this. Changing their is a way to customize TextWrapper. 2. This is slowdown common case (no abnormally long words): $ ./python -

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: Actually, it is enough to change the regexp while still using re.split(). Updated patch attached. -- Added file: http://bugs.python.org/file37180/wordsplit_complexity2.patch ___ Python tracker

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: Here is a patch which solves the algorithmic complexity issue by using a different scheme: instead of splitting, match words incrementally. -- keywords: +patch nosy: +pitrou stage: needs patch -> patch review versions: -Python 2.7, Python 3.4 Added fil

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-11-08 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: May be atomic grouping or possessive quantifiers (issue433030) will help with this issue. -- ___ Python tracker ___ _

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-10-21 Thread Ben Roberts
Changes by Ben Roberts : -- nosy: +roippi ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.o

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-10-21 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: This particular case is related to the behavior of the wordsep_re regular expression in worst case. When text contains long sequence of words characters which is not ended by a hypen, or long sequence of non-word and non-space characters (and in some other c

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-10-21 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- nosy: +georg.brandl, serhiy.storchaka versions: +Python 3.5 ___ Python tracker ___ ___ Python-bugs-li

[issue22687] horrible performance of textwrap.wrap() with a long word

2014-10-21 Thread Piotr Engelking
New submission from Piotr Engelking: Wrapping a paragraph containing a long word takes a lot of time: $ time python3 -c 'import textwrap; textwrap.wrap("a" * 2 ** 16)' real3m14.923s user3m14.792s sys 0m0.016s $ A straightforward replacement is 5000 times faster: $ time python3 -c