[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-29 Thread Jacques Grove
Jacques Grove added the comment: Do we expect this to work on 64 bit Linux and python 2.6.5? I've compiled and run some of my code through this, and there seems to be issues with non-greedy quantifier matching (at least relative to the old re module): $ cat test.py import re, regex

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-29 Thread Jacques Grove
Jacques Grove added the comment: Here's another inconsistency (same setup as before, running issue2636-20101029.zip code): $ cat test.py import re, regex text = "\n S" regexp = '[^a]{2}[A-Z]' print re.findall(regexp, text) print regex.findall(regexp, text) $ pyth

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-29 Thread Jacques Grove
Jacques Grove added the comment: And another (with issue2636-20101030.zip): $ cat test.py import re, regex text = "XYABCYPPQ\nQ DEF" regexp = 'X(Y[^Y]+?){1,2}(\ |Q)+DEF' print re.findall(regexp, text) print regex.findall(regexp, text) $ pytho

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-30 Thread Jacques Grove
Jacques Grove added the comment: Here's one that really falls in the category of "don't do that"; but I found this because I was limiting the system recursion level to somewhat less than the standard 1000 (for other reasons), and I had some shorter duplicate patterns i

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-30 Thread Jacques Grove
Jacques Grove added the comment: And another, bit less pathological, testcase. Sorry for the ugly testcase; it was much worse before I boiled it down :-) $ cat test.py import re, regex text = "\nTest\nxyz\nxyz\nEnd" regexp = '(\nTest(\n+.+?){0,2}?)?\n+End' print re.

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-01 Thread Jacques Grove
Jacques Grove added the comment: OK, I think this might be the last one I will find for the moment: $ cat test.py import re, regex text = "test?" regexp = "test\?" sub_value = "result\?" print repr(re.sub(regexp, sub_value, text)) print repr(regex.sub(regexp

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-01 Thread Jacques Grove
Jacques Grove added the comment: Spoke too soon, although this might be a valid divergence in behavior: $ cat test.py import re, regex text = "test: 2" print regex.sub('(test)\W+(\d+)(?:\W+(TEST)\W+(\d))?', '\\2 \\1, \\4 \\3', text) print re.sub('(test)\

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-01 Thread Jacques Grove
Jacques Grove added the comment: Another, with backreferences: import re, regex text = "TEST, BEST; LEST ; Lest 123 Test, Best" regexp = "(?i)(.{1,40}?),(.{1,40}?)(?:;)+(.{1,80}).{1,40}?\\3(\ |;)+(.{1,80}?)\\1" print re.findall(regexp, text) print regex.findall(reg

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-27 Thread Jacques Grove
Jacques Grove added the comment: Testing issue2636-20101224.zip: Nested modifiers seems to hang the regex compilation when used in a non-capturing group e.g.: re.compile("(?:(?i)foo)") or re.compile("(?:(?u)foo)") No problem on stock Python 2.6.5 regex engine. The

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-27 Thread Jacques Grove
Jacques Grove added the comment: Another re.compile performance issue (I've seen a couple of others, but I'm still trying to simplify the test-cases): re.compile("(?ui)(a\s?b\s?c\s?d\s?e\s?f\s?g\s?h\s?i\s?j\s?k\s?l\s?m\s?n\s?o\s?p\s?q\s?r\s?s\s?t\s?u\s?v\s?w\s?y\s?z\

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-28 Thread Jacques Grove
Jacques Grove added the comment: Thanks, issue2636-20101228a.zip also resolves my compilation speed issues I had on other (very) complex regexes. Found this one: re.search("(X.*?Y\s*){3}(X\s*)+AB:", "XY\nX Y\nX Y\nXY\nXX AB:") produces a search hit with stock python

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-28 Thread Jacques Grove
Jacques Grove added the comment: Here is a somewhat crazy pattern (slimmed down from something much larger and more complex, which didn't finish compiling even after several minutes): re.compile("(?:(?:[23][0-9]|3[79]|0?[1-9])(?:[Aa][Aa]|[Aa][Aa]|[Aa][Aa])??(?:[Aa]{3}(?:[Aa]{4

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-29 Thread Jacques Grove
Jacques Grove added the comment: More an observation than a bug: I understand that we're trading memory for performance, but I've noticed that the peak memory usage is rather high, e.g.: $ cat test.py import os import regex as re def resident(): for line in open('

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-29 Thread Jacques Grove
Jacques Grove added the comment: Yeah, issue2636-20101230.zip DOES reduce memory usage significantly (30-50%) in my use cases; however, it also tanks performance overall by 35% for me, so I'll prefer to stick with issue2636-20101229.zip (or some variant of it). Maybe a regex compile

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-29 Thread Jacques Grove
Jacques Grove added the comment: re.search('\d{4}(\s*\w)?\W*((?!\d)\w){2}', "XX") matches on stock 2.6.5 regex module, but not on issue2636-20101230.zip or issue2636-20101229.zip (which I've fallen back to for now) --

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-29 Thread Jacques Grove
Jacques Grove added the comment: Another one that diverges between stock regex and issue2636-20101229.zip: re.search('A\s*?.*?(\n+.*?\s*?){0,2}\(X', 'A\n1\nS\n1 (X') -- ___ Python tracker <http://bu

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-31 Thread Jacques Grove
Jacques Grove added the comment: Thanks for putting up the hg repo, makes it much easier to follow. Getting back to the performance regression I reported in msg124904: I've verified that if I take the hg commit 7abd9f9bb1 , and I back out the guards changes manually, while leavin

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-31 Thread Jacques Grove
Jacques Grove added the comment: You're correct, after the change: regex.search(r'\d{4}(\s*\w)?\W*((?!\d)\w){2}', "XX") doesn't match (i.e. as before commit 7abd9f9bb1). I was, however, just trying to narrow down which part of the code change killed th

[issue5102] urllib2.py timeouts do not propagate across redirects for 2.6.1 (and 3.x?)

2009-01-29 Thread Jacques Grove
New submission from Jacques Grove : When doing a urllib2 fetch of a url that results in a redirect, the connection to the redirect does not pass along the timeout of the original url opener. The result is that the redirected url fetch (which is a new request) will get the default socket timeout

[issue5103] ssl.SSLSocket timeout not working correctly when remote end is hanging

2009-01-29 Thread Jacques Grove
New submission from Jacques Grove : In ssl.py of Python 2.6.1 we have this code in SSLSocket.__init__(): if do_handshake_on_connect: timeout = self.gettimeout() try: self.settimeout(None) self.do_handshake