Jacques Grove added the comment:
Do we expect this to work on 64 bit Linux and python 2.6.5? I've compiled and
run some of my code through this, and there seems to be issues with non-greedy
quantifier matching (at least relative to the old re module):
$ cat test.py
import re, regex
Jacques Grove added the comment:
Here's another inconsistency (same setup as before, running
issue2636-20101029.zip code):
$ cat test.py
import re, regex
text = "\n S"
regexp = '[^a]{2}[A-Z]'
print re.findall(regexp, text)
print regex.findall(regexp, text)
$ pyth
Jacques Grove added the comment:
And another (with issue2636-20101030.zip):
$ cat test.py
import re, regex
text = "XYABCYPPQ\nQ DEF"
regexp = 'X(Y[^Y]+?){1,2}(\ |Q)+DEF'
print re.findall(regexp, text)
print regex.findall(regexp, text)
$ pytho
Jacques Grove added the comment:
Here's one that really falls in the category of "don't do that"; but I found
this because I was limiting the system recursion level to somewhat less than
the standard 1000 (for other reasons), and I had some shorter duplicate
patterns i
Jacques Grove added the comment:
And another, bit less pathological, testcase. Sorry for the ugly testcase; it
was much worse before I boiled it down :-)
$ cat test.py
import re, regex
text = "\nTest\nxyz\nxyz\nEnd"
regexp = '(\nTest(\n+.+?){0,2}?)?\n+End'
print re.
Jacques Grove added the comment:
OK, I think this might be the last one I will find for the moment:
$ cat test.py
import re, regex
text = "test?"
regexp = "test\?"
sub_value = "result\?"
print repr(re.sub(regexp, sub_value, text))
print repr(regex.sub(regexp
Jacques Grove added the comment:
Spoke too soon, although this might be a valid divergence in behavior:
$ cat test.py
import re, regex
text = "test: 2"
print regex.sub('(test)\W+(\d+)(?:\W+(TEST)\W+(\d))?', '\\2 \\1, \\4 \\3', text)
print re.sub('(test)\
Jacques Grove added the comment:
Another, with backreferences:
import re, regex
text = "TEST, BEST; LEST ; Lest 123 Test, Best"
regexp = "(?i)(.{1,40}?),(.{1,40}?)(?:;)+(.{1,80}).{1,40}?\\3(\
|;)+(.{1,80}?)\\1"
print re.findall(regexp, text)
print regex.findall(reg
Jacques Grove added the comment:
Testing issue2636-20101224.zip:
Nested modifiers seems to hang the regex compilation when used in a
non-capturing group e.g.:
re.compile("(?:(?i)foo)")
or
re.compile("(?:(?u)foo)")
No problem on stock Python 2.6.5 regex engine.
The
Jacques Grove added the comment:
Another re.compile performance issue (I've seen a couple of others, but I'm
still trying to simplify the test-cases):
re.compile("(?ui)(a\s?b\s?c\s?d\s?e\s?f\s?g\s?h\s?i\s?j\s?k\s?l\s?m\s?n\s?o\s?p\s?q\s?r\s?s\s?t\s?u\s?v\s?w\s?y\s?z\
Jacques Grove added the comment:
Thanks, issue2636-20101228a.zip also resolves my compilation speed issues I had
on other (very) complex regexes.
Found this one:
re.search("(X.*?Y\s*){3}(X\s*)+AB:", "XY\nX Y\nX Y\nXY\nXX AB:")
produces a search hit with stock python
Jacques Grove added the comment:
Here is a somewhat crazy pattern (slimmed down from something much larger and
more complex, which didn't finish compiling even after several minutes):
re.compile("(?:(?:[23][0-9]|3[79]|0?[1-9])(?:[Aa][Aa]|[Aa][Aa]|[Aa][Aa])??(?:[Aa]{3}(?:[Aa]{4
Jacques Grove added the comment:
More an observation than a bug:
I understand that we're trading memory for performance, but I've noticed that
the peak memory usage is rather high, e.g.:
$ cat test.py
import os
import regex as re
def resident():
for line in open('
Jacques Grove added the comment:
Yeah, issue2636-20101230.zip DOES reduce memory usage significantly (30-50%) in
my use cases; however, it also tanks performance overall by 35% for me, so
I'll prefer to stick with issue2636-20101229.zip (or some variant of it).
Maybe a regex compile
Jacques Grove added the comment:
re.search('\d{4}(\s*\w)?\W*((?!\d)\w){2}', "XX")
matches on stock 2.6.5 regex module, but not on issue2636-20101230.zip or
issue2636-20101229.zip (which I've fallen back to for now)
--
Jacques Grove added the comment:
Another one that diverges between stock regex and issue2636-20101229.zip:
re.search('A\s*?.*?(\n+.*?\s*?){0,2}\(X', 'A\n1\nS\n1 (X')
--
___
Python tracker
<http://bu
Jacques Grove added the comment:
Thanks for putting up the hg repo, makes it much easier to follow.
Getting back to the performance regression I reported in msg124904:
I've verified that if I take the hg commit 7abd9f9bb1 , and I back out the
guards changes manually, while leavin
Jacques Grove added the comment:
You're correct, after the change:
regex.search(r'\d{4}(\s*\w)?\W*((?!\d)\w){2}', "XX")
doesn't match (i.e. as before commit 7abd9f9bb1).
I was, however, just trying to narrow down which part of the code change killed
th
New submission from Jacques Grove :
When doing a urllib2 fetch of a url that results in a redirect, the
connection to the redirect does not pass along the timeout of the
original url opener. The result is that the redirected url fetch (which
is a new request) will get the default socket timeout
New submission from Jacques Grove :
In ssl.py of Python 2.6.1 we have this code in SSLSocket.__init__():
if do_handshake_on_connect:
timeout = self.gettimeout()
try:
self.settimeout(None)
self.do_handshake
20 matches
Mail list logo