[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-08-27 Thread Steven D'Aprano
Steven D'Aprano added the comment: I'm not sure if this belongs here, or on the Google code project page, so I'll add it in both places :) Feature request: please change the NEW flag to something else. In five or six years (give or take), the re module will be long forgotten, compatibility wi

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-07-11 Thread Eric Snow
Changes by Eric Snow : -- nosy: +ericsnow ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-07-11 Thread Collin Winter
Changes by Collin Winter : -- nosy: -collinwinter ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.p

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-07-11 Thread Alec Koumjian
Alec Koumjian added the comment: Thanks, Matthew. I did not realize I could access either of those. I should be able to build a helper function now to do what I want. -- ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-07-11 Thread Brian Curtin
Changes by Brian Curtin : -- nosy: -brian.curtin ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.py

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-07-11 Thread Matthew Barnett
Matthew Barnett added the comment: The new regex imlementation is hosted here: https://code.google.com/p/mrab-regex-hg/ The span of m['a_thing'] is m.span('a_thing'), if that helps. The named groups are listed on the pattern object, which can be accessed via m.re: >>> m.re <_regex.Pattern o

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-07-10 Thread Alec Koumjian
Alec Koumjian added the comment: I apologize if this is the wrong place for this message. I did not see the link to a separate list. First let me explain what I am trying to accomplish. I would like to be able to take an unknown regular expression that contains both named and unnamed groups

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-05-10 Thread Brian Curtin
Brian Curtin added the comment: Issues with Regexp should probably be handled on the Regexp tracker. -- nosy: +brian.curtin ___ Python tracker ___ ___

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-05-10 Thread Jonathan Halcrow
Jonathan Halcrow added the comment: It seems that _regex_unicode.c is missing from setup.py, adding it to ext_modules fixes my previous issue. -- ___ Python tracker ___

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-05-10 Thread Jonathan Halcrow
Jonathan Halcrow added the comment: I'm having a problem using the current version (0.1.20110504) with python 2.5 on OSX 10.5. When I try to import regex I get the following import error: dlopen(/python2.5/site-packages/_regex.so, 2): Symbol not found: _re_is_same_char_ign Referenced from:

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-03-15 Thread Matthew Barnett
Matthew Barnett added the comment: I've fixed the problem with iterators for both Python 3 and Python 2. They can now be shared safely across threads. I've updated the release on PyPI. -- ___ Python tracker _

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-03-14 Thread Gregory P. Smith
Gregory P. Smith added the comment: Okay. Can you push your setup.py and README and such as well? Your pypi release tarballs should match the hg repo and ideally include a mention of what hg revision they are generated from. :) -gps On Mon, Mar 14, 2011 at 5:25 PM, Matthew Barnett wrote: > >

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-03-14 Thread Matthew Barnett
Matthew Barnett added the comment: @Gregory: I've added you to the project. I'm currently trying to fix a problem with iterators shared across threads. As a temporary measure, the current release on PyPI doesn't enable multithreading for them. The mrab-regex-hg project doesn't have those sou

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-03-14 Thread Gregory P. Smith
Gregory P. Smith added the comment: Could you add me as a member or admin on the mrab-regex-hg project? I've got a few things I want to fix in the code as I start looking into the state of this module. gpsmith at gmail dot com is my google account. There are some fixes in the upstream pytho

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-03-11 Thread Alex
Changes by Alex : -- nosy: +alex ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-03-08 Thread Davide Rizzo
Changes by Davide Rizzo : -- nosy: +davide.rizzo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyt

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-01-25 Thread Matthew Barnett
Matthew Barnett added the comment: I've reduced the size of some internal tables. -- ___ Python tracker ___ ___ Python-bugs-list maili

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-01-16 Thread Matthew Barnett
Matthew Barnett added the comment: That line crept in somehow. As it's been there since the 2010-12-24 release and you're the first one to have a problem with it (and you've already fixed it), it looks like a new upload isn't urgently needed (I don't have any other changes to make at present

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-01-14 Thread ronnix
ronnix added the comment: The regex 0.1.20110106 package fails to install with Python 2.6, due to the use of 2.7 string formatting syntax in setup.py: print("Copying {} to {}".format(unicodedata_db_h, SRC_DIR)) This line should be changed to: print("Copying {0} to {1}".format(unicode

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-01-03 Thread Matthew Barnett
Matthew Barnett added the comment: I've just done a bug fix. The issue is at: https://code.google.com/p/mrab-regex-hg/ BTW, Jacques, I trust that your regression tests don't test how long a regex takes to fail to match, because a bug could cause such a non-match to occur too quickly, bef

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-31 Thread Jacques Grove
Jacques Grove added the comment: You're correct, after the change: regex.search(r'\d{4}(\s*\w)?\W*((?!\d)\w){2}', "XX") doesn't match (i.e. as before commit 7abd9f9bb1). I was, however, just trying to narrow down which part of the code change killed the performance on my regression tests

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-31 Thread Matthew Barnett
Matthew Barnett added the comment: Just to check, does this still work with your changes of msg124959? regex.search(r'\d{4}(\s*\w)?\W*((?!\d)\w){2}', "XX") For me it fails to match! -- ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-31 Thread Matthew Barnett
Matthew Barnett added the comment: Why not? :-) -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-31 Thread Jacques Grove
Jacques Grove added the comment: Thanks for putting up the hg repo, makes it much easier to follow. Getting back to the performance regression I reported in msg124904: I've verified that if I take the hg commit 7abd9f9bb1 , and I back out the guards changes manually, while leaving the FAST_IN

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-30 Thread Matthew Barnett
Matthew Barnett added the comment: Even after much uninstalling and reinstalling (and reboots) I never got TortoiseSVN to work properly, so I switched to TortoiseHg. The sources are now at: https://code.google.com/p/mrab-regex-hg/ -- ___ Pytho

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-30 Thread Matthew Barnett
Matthew Barnett added the comment: msg124904: It would, of course, be slower on first use, but I'm surprised that it's (that much) slower afterwards. msg124905, msg124906: I have those matching now. msg124931: The sources are in TortoiseBzr, but I couldn't upload, so I exported to TortoiseSV

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-30 Thread Robert Xiao
Robert Xiao added the comment: Do you have it in any kind of repository at all? Even a private SVN repo or something like that? -- ___ Python tracker ___ ___

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-30 Thread Matthew Barnett
Matthew Barnett added the comment: The project is now at: https://code.google.com/p/mrab-regex/ Unfortunately it doesn't have the revision history. I don't know why not. -- ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-30 Thread Georg Brandl
Georg Brandl added the comment: Hearty +1. I have the hope of putting this in 3.3, and for that I'd like to see how the code matures, which is much easier when in version control. -- ___ Python tracker __

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-29 Thread Gregory P. Smith
Gregory P. Smith added the comment: As belopolsky said... *please* move this development into version control. Put it up in an Hg repo on code.google.com. or put it on github. *anything* other than repeatedly posting entire zip file source code drops to a bugtracker. -- __

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-29 Thread Jacques Grove
Jacques Grove added the comment: Another one that diverges between stock regex and issue2636-20101229.zip: re.search('A\s*?.*?(\n+.*?\s*?){0,2}\(X', 'A\n1\nS\n1 (X') -- ___ Python tracker _

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-29 Thread Jacques Grove
Jacques Grove added the comment: re.search('\d{4}(\s*\w)?\W*((?!\d)\w){2}', "XX") matches on stock 2.6.5 regex module, but not on issue2636-20101230.zip or issue2636-20101229.zip (which I've fallen back to for now) -- ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-29 Thread Jacques Grove
Jacques Grove added the comment: Yeah, issue2636-20101230.zip DOES reduce memory usage significantly (30-50%) in my use cases; however, it also tanks performance overall by 35% for me, so I'll prefer to stick with issue2636-20101229.zip (or some variant of it). Maybe a regex compile-time opt

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-29 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101230.zip is a new version of the regex module. I've delayed the building of the tables for fast searching until their first use, which, hopefully, will mean that fewer will be actually built. -- Added file: http://bugs.python.org/file20

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-29 Thread Jacques Grove
Jacques Grove added the comment: More an observation than a bug: I understand that we're trading memory for performance, but I've noticed that the peak memory usage is rather high, e.g.: $ cat test.py import os import regex as re def resident(): for line in open('/proc/%d/status' % os.ge

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-28 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101229.zip is a new version of the regex module. It now compiles the pattern quickly. -- Added file: http://bugs.python.org/file20185/issue2636-20101229.zip ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-28 Thread Jacques Grove
Jacques Grove added the comment: Here is a somewhat crazy pattern (slimmed down from something much larger and more complex, which didn't finish compiling even after several minutes): re.compile("(?:(?:[23][0-9]|3[79]|0?[1-9])(?:[Aa][Aa]|[Aa][Aa]|[Aa][Aa])??(?:[Aa]{3}(?:[Aa]{4})?|[Aa]{3}(?:[A

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-28 Thread Jacques Grove
Jacques Grove added the comment: Thanks, issue2636-20101228a.zip also resolves my compilation speed issues I had on other (very) complex regexes. Found this one: re.search("(X.*?Y\s*){3}(X\s*)+AB:", "XY\nX Y\nX Y\nXY\nXX AB:") produces a search hit with stock python 2.6.5 regex library, but

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-28 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101228a.zip is a new version of the regex module. It now compiles the pattern quickly. -- Added file: http://bugs.python.org/file20182/issue2636-20101228a.zip ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-27 Thread Jacques Grove
Jacques Grove added the comment: Another re.compile performance issue (I've seen a couple of others, but I'm still trying to simplify the test-cases): re.compile("(?ui)(a\s?b\s?c\s?d\s?e\s?f\s?g\s?h\s?i\s?j\s?k\s?l\s?m\s?n\s?o\s?p\s?q\s?r\s?s\s?t\s?u\s?v\s?w\s?y\s?z\s?a\s?b\s?c\s?d)") complet

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-27 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101228.zip is a new version of the regex module. Sorry for the delay, the fix took me a bit longer than I expected. :-) -- Added file: http://bugs.python.org/file20176/issue2636-20101228.zip ___ Python

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-27 Thread Jacques Grove
Jacques Grove added the comment: Testing issue2636-20101224.zip: Nested modifiers seems to hang the regex compilation when used in a non-capturing group e.g.: re.compile("(?:(?i)foo)") or re.compile("(?:(?u)foo)") No problem on stock Python 2.6.5 regex engine. The unnested version of the

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-24 Thread Matthew Barnett
Matthew Barnett added the comment: It does have an SSH key. It's probably something simple that I'm missing. I think that the only change I'm likely to make is to a support script I use; it currently uses hard-coded paths, etc, to do its magic. :-) --

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-24 Thread R. David Murray
R. David Murray added the comment: I suspect it would help if there are more changes, though. I believe that to push to launchpad you have to upload an ssh key. Not sure why you'd get "no such account", though. Barry would probably know :) -- ___

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-24 Thread Matthew Barnett
Matthew Barnett added the comment: I've been trying to push the history to Launchpad, completely without success; it just won't authenticate (no such account, even though I can log in!). I doubt that the history would be much use to you anyway. --

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-23 Thread Jeffrey C. Jacobs
Jeffrey C. Jacobs added the comment: +1 on VC -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.py

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-23 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: I would like to start reviewing this code, but dated zip files on a tracker make a very inefficient VC setup. Would you consider exporting your development history to some public VC system? -- nosy: +belopolsky

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-23 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101224.zip is a new version of the regex module. Case-insensitive matching is now faster. The matching functions and methods now accept a keyword argument to release the GIL during matching to enable other Python threads to run concurrently:

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-13 Thread Éric Araujo
Changes by Éric Araujo : -- stage: -> patch review type: compile error -> feature request versions: +Python 3.3 -Python 2.6 ___ Python tracker ___ ___

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-10 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101210.zip is a new version of the regex module. I've extended the additional checks of the previous version. It has been tested with Python 2.5 to Python 3.2b1. -- Added file: http://bugs.python.org/file20001/issue2636-20101210.zip

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-06 Thread Zach Dwiel
Zach Dwiel added the comment: Here is the terminal log of what happens when I try to install and then import regex. Any ideas what is going on? $ python setup.py install running install running build running build_py creating build creating build/lib.linux-i686-2.6 copying Python2/regex.py ->

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-06 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101207.zip is a new version of the regex module. It includes additional checks against pathological regexes. -- Added file: http://bugs.python.org/file19965/issue2636-20101207.zip ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-29 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101130.zip is a new version of the regex module. Added 'special_only' keyword parameter (default False) to regex.escape. When True, regex.escape escapes only 'special' characters, such as '?'. -- Added file: http://bugs.python.org/file198

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-23 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101123.zip is a new version of the regex module. Oops, sorry, the weird behaviour of msg11 was a bug. :-( -- Added file: http://bugs.python.org/file19786/issue2636-20101123.zip ___ Python tracker <

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-23 Thread R. David Murray
R. David Murray added the comment: Please don't change the type, this issue is about the feature request of adding this regex engine to the stdlib. I'm sure Matthew will get back to you about your question. -- type: behavior -> feature request ___

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-23 Thread Steve Moran
Steve Moran added the comment: Forgive me if this is just a stupid oversight. I'm a linguist and use UTF-8 for "special" characters for linguistics data. This often includes multi-byte Unicode character sequences that are composed as one grapheme. For example the í̵ (if it's displaying corre

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-20 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101121.zip is a new version of the regex module. The captures didn't work properly with lookarounds or atomic groups. -- Added file: http://bugs.python.org/file19723/issue2636-20101121.zip ___ Python tr

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-19 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101120.zip is a new version of the regex module. The match object now supports additional methods which return information on all the successful matches of a repeated capture group. The API was inspired by that of .Net: matchobject.captures(

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-13 Thread Vlastimil Brom
Vlastimil Brom added the comment: Thank you very much! a quick test with my custom unicodedata with 6.0 on py 2.7 seems ok. I hope, there won't be problems with "cooperation" of the more recent internal data with the original 5.2 database in python 2.x releases. vbr -- __

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-13 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101113.zip is a new version of the regex module. It now supports Unicode 6.0.0. -- Added file: http://bugs.python.org/file19597/issue2636-20101113.zip ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-13 Thread Vlastimil Brom
Vlastimil Brom added the comment: I'd have liked to suggest updating the underlying unicode data to the latest standard 6.0, but it turns out, it might be problematic with the cross-version compatibility; according to the clarification in http://bugs.python.org/issue10400 the 3... versions ar

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-11 Thread Alex Willmer
Alex Willmer added the comment: On Thu, Nov 11, 2010 at 10:20 PM, Vlastimil Brom wrote: > Maybe I am missing something, but the result in regex seem ok to me: > \A is treated like A in a character set; I think it's me who missed something. I'd assumed that all backslash patterns (including \A

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-11 Thread Matthew Barnett
Matthew Barnett added the comment: It looks like a similar problem to msg116252 and msg116276. -- ___ Python tracker ___ ___ Python-bu

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-11 Thread Vlastimil Brom
Vlastimil Brom added the comment: Maybe I am missing something, but the result in regex seem ok to me: \A is treated like A in a character set; when the test string is changed to "A b c" or in the case insensitive search the A is matched. [\A\s]\w doesn't match the starting "a", as it is not f

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-11 Thread Alex Willmer
Alex Willmer added the comment: The re module throws an exception for re.compile(r'[\A\w]'). latest regex doesn't, but I don't think the pattern is matching correctly. Shouldn't findall(r'[\A]\w', 'a b c') return ['a'] and findall(r'[\A\s]\w', 'a b c') return ['a', ' b', ' c'] ? Python 2.6.6 (r

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-05 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101106.zip is a new version of the regex module. Fix for issue 10328, which regex also shared. -- Added file: http://bugs.python.org/file19514/issue2636-20101106.zip ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-02 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101102a.zip is a new version of the regex module. msg120204 relates to issue #1519638 "Unmatched group in replacement". In 'regex' an unmatched group is treated as an empty string in a replacement template. This behaviour is more in keeping with

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-02 Thread Vlastimil Brom
Vlastimil Brom added the comment: Sorry for the noise, please, forgot my previous msg120215; I somehow managed to keep an older version of _regex_core.py along with the new regex.py in the Lib directory, which are obviously incompatible. After updating the files correctly, the mentioned example

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-02 Thread Vlastimil Brom
Vlastimil Brom added the comment: There seems to be a bug in the handling of numbered backreferences in sub() in issue2636-20101102.zip I believe, it would be a fairly new regression, as it would be noticed rather soon. (tested on Python 2.7; winXP) >>> re.sub("([xy])", "-\\1-", "abxc") 'ab-x-

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-01 Thread Jacques Grove
Jacques Grove added the comment: Another, with backreferences: import re, regex text = "TEST, BEST; LEST ; Lest 123 Test, Best" regexp = "(?i)(.{1,40}?),(.{1,40}?)(?:;)+(.{1,80}).{1,40}?\\3(\ |;)+(.{1,80}?)\\1" print re.findall(regexp, text) print regex.findall(regexp, text) $ python test.py

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-01 Thread Jacques Grove
Jacques Grove added the comment: Spoke too soon, although this might be a valid divergence in behavior: $ cat test.py import re, regex text = "test: 2" print regex.sub('(test)\W+(\d+)(?:\W+(TEST)\W+(\d))?', '\\2 \\1, \\4 \\3', text) print re.sub('(test)\W+(\d+)(?:\W+(TEST)\W+(\d))?', '\\2 \\

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-01 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101102.zip is a new version of the regex module. -- Added file: http://bugs.python.org/file19460/issue2636-20101102.zip ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-01 Thread Jacques Grove
Jacques Grove added the comment: OK, I think this might be the last one I will find for the moment: $ cat test.py import re, regex text = "test?" regexp = "test\?" sub_value = "result\?" print repr(re.sub(regexp, sub_value, text)) print repr(regex.sub(regexp, sub_value, text)) $ python test.

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-01 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101101.zip is a new version of the regex module. I hope it's finally fixed this time! :-) -- Added file: http://bugs.python.org/file19456/issue2636-20101101.zip ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-30 Thread Jacques Grove
Jacques Grove added the comment: And another, bit less pathological, testcase. Sorry for the ugly testcase; it was much worse before I boiled it down :-) $ cat test.py import re, regex text = "\nTest\nxyz\nxyz\nEnd" regexp = '(\nTest(\n+.+?){0,2}?)?\n+End' print re.findall(regexp, text) p

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-30 Thread Jacques Grove
Jacques Grove added the comment: Here's one that really falls in the category of "don't do that"; but I found this because I was limiting the system recursion level to somewhat less than the standard 1000 (for other reasons), and I had some shorter duplicate patterns in a big regex. Here is

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-30 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101030a.zip is a new version of the regex module. This bug was a bit more difficult to fix, but I think it's OK now! -- Added file: http://bugs.python.org/file19435/issue2636-20101030a.zip ___ Python tr

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-29 Thread Jacques Grove
Jacques Grove added the comment: And another (with issue2636-20101030.zip): $ cat test.py import re, regex text = "XYABCYPPQ\nQ DEF" regexp = 'X(Y[^Y]+?){1,2}(\ |Q)+DEF' print re.findall(regexp, text) print regex.findall(regexp, text) $ python test.py [('YPPQ\n', ' ')] [] -- __

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-29 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101030.zip is a new version of the regex module. I've also added yet more to the unit tests. -- Added file: http://bugs.python.org/file19422/issue2636-20101030.zip ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-29 Thread Jacques Grove
Jacques Grove added the comment: Here's another inconsistency (same setup as before, running issue2636-20101029.zip code): $ cat test.py import re, regex text = "\n S" regexp = '[^a]{2}[A-Z]' print re.findall(regexp, text) print regex.findall(regexp, text) $ python test.py [' S'] [] I m

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-29 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101029.zip is a new version of the regex module. I've also added to the unit tests. -- Added file: http://bugs.python.org/file19419/issue2636-20101029.zip ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-29 Thread Matthew Barnett
Matthew Barnett added the comment: That's a bug. I'll fix it as soon has I've reinstalled the SDK. -- ___ Python tracker ___ ___ Pyth

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-29 Thread Jacques Grove
Jacques Grove added the comment: Do we expect this to work on 64 bit Linux and python 2.6.5? I've compiled and run some of my code through this, and there seems to be issues with non-greedy quantifier matching (at least relative to the old re module): $ cat test.py import re, regex text = "

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-14 Thread Vlastimil Brom
Vlastimil Brom added the comment: Sorry for the noise, it seems, I can go back to the 32-bit python for now then... vbr -- ___ Python tracker ___

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-14 Thread Matthew Barnett
Matthew Barnett added the comment: I am not able to build or test a 64-bit version. The update was to the source files to ensure that if it is compiled for 64 bits then the string positions will also be 64-bit. This change was prompted by a poster who tried to use the re module of a 64-bit P

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-14 Thread Vlastimil Brom
Vlastimil Brom added the comment: Well, it seemed to me too, I happened to read the last post from Matthew, msg118243, in the sense that he made some updates which need testing on a 64 bit system (I am unsure, whether hardware architecture, OS type, python build or something else was meant); b

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-14 Thread Martin v . Löwis
Martin v. Löwis added the comment: Vlastil, what makes you think that issue2636-20101009.zip is a 64-bit version? I can only find 32-bit DLLs in it. -- ___ Python tracker ___ __

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-14 Thread Vlastimil Brom
Vlastimil Brom added the comment: I tried to give the 64-bit version a try, but I might have encountered a more general difficulties. I tested this on Windows 7 Home Premium (Czech), the system is 64-bit (or I've hoped so sofar :-), according to System info: x64-based PC I installed Python 2.7

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-08 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101009.zip is a new version of the regex module. It appears from a posting in python-list and a closer look at the docs that string positions in the 're' module are limited to 32 bits, even on 64-bit builds. I think it's because of things like:

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-21 Thread Vlastimil Brom
Vlastimil Brom added the comment: Well, of course, the surrogates probably shouldn't be handled separately in one module independently of the rest of the standard library. (I actually don't know such narrow implementation (although it is mentioned in those unicode quidelines http://unicode.o

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-21 Thread Matthew Barnett
Matthew Barnett added the comment: I use Python 3, where len("\U00010337") == 2 on a narrow build. Yes, wide Unicode on a narrow build is a problem: >>> regex.findall("\\U00010337", "a\U00010337bc") [] >>> regex.findall("(?i)\\U00010337", "a\U00010337bc") [] I'm not sure how (or whether!) to

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-20 Thread Vlastimil Brom
Vlastimil Brom added the comment: I like the idea of the general "new" flag introducing the reasonable, backwards incompatible behaviour; one doesn't have to remember a list of non-standard flags to get this features. While I recognise, that the module probably can't work correctly with wide

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-17 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100918.zip is a new version of the regex module. I've added 'pos' and 'endpos' arguments to regex.sub and regex.subn and refactored a little. I can't think of any other features that need to be added or see any more speed improvements. Have I m

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-12 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100913.zip is a new version of the regex module. I've removed the ZEROWIDTH flag and added the NEW flag, which turns on the new behaviour such as splitting on zero-width matches and positional flags. If the NEW flag isn't turned on then the inlin

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-12 Thread Vlastimil Brom
Vlastimil Brom added the comment: Just another rather marginal findings; differences between regex and re: >>> regex.findall(r"[\B]", "aBc") ['B'] >>> re.findall(r"[\B]", "aBc") [] (Python 2.7 ... on win32; regex - issue2636-20100912.zip) I believe, regex is more correct here, as uppercase \B

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-12 Thread Brian Curtin
Changes by Brian Curtin : -- nosy: -brian.curtin ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.py

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-12 Thread Matthew Barnett
Matthew Barnett added the comment: OK, so would it be OK if there was, say, a NEW (N) flag which made the inline flags (?flags) scoped and allowed splitting on zero-width matches? -- ___ Python tracker ___

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-12 Thread Vlastimil Brom
Vlastimil Brom added the comment: Thank you both for the explanations; I somehow suspected, there would be some strong reasoning for the conservative approach with regard to the backward compatibility. Thanks for the block() and script() offer, Matthew, but I believe, this might clutter the i

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-12 Thread Georg Brandl
Georg Brandl added the comment: Matthew, I understand why you want to have these flags scoped, and if you designed a regex dialect from scratch, that would be the way to go. However, if we want to integrate this in Python 3.2 or 3.3, this is an absolute killer if it's not backwards compatibl

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-12 Thread Matthew Barnett
Matthew Barnett added the comment: The tests for re include these regexes: a.b(?s) a.*(?s)b I understand what Georg said previously about some people preferring to put them at the end, but I personally wouldn't do that because some regex implementations support scoped inline flags, a

  1   2   3   >