[issue46410] TypeError when parsing regexp with unicode named character sequence escape
Matthew Barnett added the comment: They're not supported in string literals either: Python 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> "\N{KEYCAP NUMBER SIGN}" File "", line 1 "\N{KEYCAP NUMBER SIGN}" ^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-21: unknown Unicode character name -- ___ Python tracker <https://bugs.python.org/issue46410> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46515] Benefits Of Phool Makhana
Change by Matthew Barnett : -- stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue46515> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46627] Regex hangs indefinitely
Matthew Barnett added the comment: That pattern has: (?P[^]]+)+ Is that intentional? It looks wrong to me. -- ___ Python tracker <https://bugs.python.org/issue46627> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46825] slow matching on regular expression
Matthew Barnett added the comment: The expression is a repeated alternative where the first alternative is a repeat. Repeated repeats can result in a lot of attempts and backtracking and should be avoided. Try this instead: (0|1(01*0)*1)+ -- ___ Python tracker <https://bugs.python.org/issue46825> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13169] Regular expressions with 0 to 65536 repetitions raises OverflowError
Matthew Barnett added the comment: The quantifiers use 65535 to represent no upper limit, so ".{0,65535}" is equivalent to ".*". For example: >>> re.match(".*", "x" * 10).span() (0, 10) >>> re.match(".{0,65535}", "x" * 10).span() (0, 10) but: >>> re.match(".{0,65534}", "x" * 10).span() (0, 65534) -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue13169> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13169] Regular expressions with 0 to 65536 repetitions raises OverflowError
Matthew Barnett added the comment: The limit is an implementation detail. The pattern is compiled into codes which are then interpreted, and it just happens that the codes are (usually) 16 bits, giving a range of 0..65535, but it uses 65535 to represent no limit and doesn't warn if you actually write 65535. There's an alternative regex implementation here: http://pypi.python.org/pypi/regex -- ___ Python tracker <http://bugs.python.org/issue13169> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13592] repr(regex) doesn't include actual regex
Matthew Barnett added the comment: In reply to Ezio, the repr of a large string, list, tuple or dict is also long. The repr of a compiled regex should probably also show the flags, but should it just be the numeric value? -- ___ Python tracker <http://bugs.python.org/issue13592> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13592] repr(regex) doesn't include actual regex
Matthew Barnett added the comment: Actually, one possibility that occurs to me is to provide the flags within the pattern. The .pattern attribute gives the original pattern, but repr could give the flags in-line at the start of the pattern: >>> # Assuming Python 3. >>> r = re.compile("a", re.I) >>> r.flags 34 >>> r.pattern 'a' >>> repr(r) "<_sre.SRE_Pattern '(?i)a'>" I'm not sure how to make it eval-able, unless you mean something more like: >>> repr(r) "re.Regex('(?i)a')" where re.Regex == re.compile, which would be more meaningful than: >>> repr(r) "re.compile('(?i)a')" -- ___ Python tracker <http://bugs.python.org/issue13592> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13592] repr(regex) doesn't include actual regex
Matthew Barnett added the comment: I'm just adding this to the regex module and I've come up against a possible issue. The regex module supports named lists, which could be very big. Should the entire contents of those lists also be shown in the repr?They would have to be if the repr is to be a eval-able. -- ___ Python tracker <http://bugs.python.org/issue13592> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13652] Creating lambda functions in a loop has unexpected results when resolving variables used as arguments
Matthew Barnett added the comment: That's not a bug. This might help to explain what's going on: What do (lambda) function closures capture in Python? http://stackoverflow.com/questions/2295290/what-do-lambda-function-closures-capture-in-python -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue13652> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12177] re.match raises MemoryError
Matthew Barnett added the comment: This also raises MemoryError: re.match(r'()*?1', 'a1') but none of these do: re.match(r'()+1', 'a1') re.match(r'()*1', 'a1') -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue12177> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: The new regex imlementation is hosted here: https://code.google.com/p/mrab-regex-hg/ The span of m['a_thing'] is m.span('a_thing'), if that helps. The named groups are listed on the pattern object, which can be accessed via m.re: >>> m.re <_regex.Pattern object at 0x0161DE30> >>> m.re.groupindex {'another_thing': 3, 'a_thing': 1} so you can use that to create a reverse dict to go from the index to the name or None. (Perhaps the pattern object should have such a .group_name attribute.) -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12671] urlopen returning empty string
New submission from Matthew Barnett : Someone over at StackOverflow had a problem with urlopen in Python 3.2.1: http://stackoverflow.com/questions/6892573/problem-with-urlopen/6892843#6892843 This is the code: from urllib.request import urlopen f = urlopen('http://online.wsj.com/mdc/public/page/2_3020-tips.html?mod=topnav_2_3000') page = f.read() f.close() With Python 3.1 and Python 3.2 it works OK, but with Python 3.2.1 the read returns an empty string. -- components: Library (Lib) messages: 141481 nosy: mrabarnett priority: normal severity: normal status: open title: urlopen returning empty string type: behavior versions: Python 3.2 ___ Python tracker <http://bugs.python.org/issue12671> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12671] urlopen returning empty string
Matthew Barnett added the comment: Just been told this bug has already been reported as issue #12576. -- resolution: -> duplicate ___ Python tracker <http://bugs.python.org/issue12671> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12671] urlopen returning empty string
Changes by Matthew Barnett : -- status: open -> closed ___ Python tracker <http://bugs.python.org/issue12671> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12728] Python re lib fails case insensitive matches on Unicode data
Changes by Matthew Barnett : -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue12728> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Changes by Matthew Barnett : -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue12729> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12730] Python's casemapping functions are untrustworthy due to narrow/wide build issues
Changes by Matthew Barnett : -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue12730> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a
Changes by Matthew Barnett : -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue12731> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12732] Can't portably use Unicode in Python identifiers
Changes by Matthew Barnett : -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue12732> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12733] Request for grapheme support in Python re lib
Changes by Matthew Barnett : -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue12733> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12734] Request for property support in Python re lib
Changes by Matthew Barnett : -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue12734> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12735] request full Unicode collation support in std python library
Changes by Matthew Barnett : -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue12735> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Changes by Matthew Barnett : -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue12736> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Matthew Barnett added the comment: In a narrow build, a codepoint in the astral plane is encoded as surrogate pair. I could implement a workaround for it in the regex module, but I think that the proper place to fix it is in the language as a whole, perhaps by implementing PEP 393 ("Flexible String Representation"). -- ___ Python tracker <http://bugs.python.org/issue12729> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Matthew Barnett added the comment: There are occasions when you want to do string slicing, often of the form: pos = my_str.index(x) endpos = my_str.index(y) substring = my_str[pos : endpos] To me that suggests that if UTF-8 is used then it may be worth profiling to see whether caching the last 2 positions would be beneficial. -- ___ Python tracker <http://bugs.python.org/issue12729> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Matthew Barnett added the comment: You're right about starting the second search from where the first finished. Caching the position would be an advantage there. The memory cost of extra pointers wouldn't be so bad if UTF-8 took less space than the current format. Regex isn't used as much as in Perl. BTW, the current re module was introduced in Python 1.5, the previous regex and regsub modules being removed in Python 2.5. -- ___ Python tracker <http://bugs.python.org/issue12729> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)
Matthew Barnett added the comment: On a narrow build, "\N{MATHEMATICAL SCRIPT CAPITAL A}" is stored as 2 code units, and neither re nor regex recombine them when compiling a regex or looking for a match. regex supports \xNN, \u and \U and \N{XYZ} itself, so they can be used in a raw string literal, but it doesn't recombine code units. I could add recombination to regex at some point if time has passed and no further progress has been made in the language's support for Unicode. -- ___ Python tracker <http://bugs.python.org/issue12749> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Matthew Barnett added the comment: Have a look here: http://98.245.80.27/tcpc/OSCON2011/gbu/index.html -- ___ Python tracker <http://bugs.python.org/issue12729> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Matthew Barnett added the comment: For what it's worth, I've had idea about string storage, roughly based on how *nix stores data on disk. If a string is small, point to a block of codepoints. If a string is medium-sized, point to a block of pointers to codepoint blocks. If a string is large, point to a block of pointers to pointer blocks. This means that a large string doesn't need a single large allocation. The level of indirection can be increased as necessary. For simplicity, all codepoint blocks contain the same number of codepoints, except the final codepoint block, which may contain fewer. A codepoint block may use the minimum width necessary (1, 2 or 4 bytes) to store all of its codepoints. This means that there are no surrogates and that different sections of the string can be stored in different widths to reduce memory usage. -- ___ Python tracker <http://bugs.python.org/issue12729> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12753] \N{...} neglects formal aliases and named sequences from Unicode charnames namespace
Matthew Barnett added the comment: For the "Line_Break" property, one of the possible values is "Inseparable", with 2 permitted aliases, the shorter "IN" (which is reasonable) and "Inseperable" (ouch!). -- ___ Python tracker <http://bugs.python.org/issue12753> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12789] re.Scanner don't support more then 2 groups on regex
Matthew Barnett added the comment: Even if this bug is fixed, it still won't work as you expect, and this s why. The Scanner function accepts a list of 2-tuples. The first item of the tuple is a regex and the second is a function. For example: re.Scanner([(r"\d+", number), (r"\w+", word)]) The Scanner function then builds a regex, using the given regexes as alternatives, each wrapped as a capture group: r"(\d+)|(\w+)" When matching, it sees which group captured and uses that to decide which function it should call, so, for example, if group 1 matched, it calls "number", and if group 2 matched, it calls "word". When you introduce capture groups into the regexes, it gets confused. If your regex matches, it'll see that groups 1 and 2 match, so it'll try to call the second function, but there's isn't one... -- ___ Python tracker <http://bugs.python.org/issue12789> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Matthew Barnett added the comment: There are some oddities in Unicode case-folding. Under full case-folding, both "\N{LATIN CAPITAL LETTER SHARP S}" and "\N{LATIN SMALL LETTER SHARP S}" fold to "ss", which means that those codepoints match each other. However, under simple case-folding, they fold to themselves, which means that those codepoints _don't_ match each other. -- ___ Python tracker <http://bugs.python.org/issue12736> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Matthew Barnett added the comment: The regex module currently uses simple case-folding, although I'm working towards full case-folding, as listed in http://www.unicode.org/Public/UNIDATA/CaseFolding.txt. -- ___ Python tracker <http://bugs.python.org/issue12736> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Matthew Barnett added the comment: The regex module supports nested sets and set operations, eg. r"[[a-z]--[aeiou]]" (the letters from 'a' to 'z', except the vowels). This means that literal '[' in a set needs to be escaped. For example, re module sees "[][()]..." as: [ start of set ] literal ']' [() literals '[', '(', ')' ] end of set ... ... but the regex module sees it as: [ start of set ] literal ']' [()] nested set [()] ... ... Thus: >>> s = u'void foo ( type arg1 [, type arg2 ] )' >>> regex.sub(r'(?<=[][()]) |(?!,) (?!\[,)(?=[][(),])', '', s) u'void foo ( type arg1 [, type arg2 ] )' >>> regex.sub('(?<=[]\[()]) |(?!,) (?!\[,)(?=[]\[(),])', '', s) u'void foo(type arg1 [, type arg2])' If it can't parse it as a nested set, it tries again as a non-nested set (like re), but there are bound to be regexes where it could be either. -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Matthew Barnett added the comment: I think I need a show of hands. Should the default be old behaviour (like re) or new behaviour? (It might be old now, new later.) Should there be a NEW flag (as at present), or an OLD flag, or a VERSION parameter (0=old, 1=new, 2=?)? -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Matthew Barnett added the comment: The least disruptive change would be to have a NEW flag for the new behaviour, as at present, and an OLD flag for the old behaviour. Currently the default is old behaviour, but in the future it will be new behaviour. The differences would be: Old behaviour : New behaviour - - Global inline flags : Positional inline flags Can't split on zero-width match : Can split on zero-width match Simple sets : Nested sets and set operations The only change would be that nested sets wouldn't be supported in the old behaviour. There are also additional escape sequences, eg \X is no longer treated as "X", but as they look like escape sequences you really shouldn't be relying on that. (It's similar to writing Windows paths in non-raw string literals: "\T" == "\\T", but "\t" == chr(9).) -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Matthew Barnett added the comment: So, VERSION0 and VERSION1, with "(?V0)" and "(?V1)" in the pattern? -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7951] Should str.format allow negative indexes when used for __getitem__ access?
Matthew Barnett added the comment: I agree with Kamil and Germán. I would've expected negative indexes for sequences to work. Negative indexes for fields is a different matter. -- ___ Python tracker <http://bugs.python.org/issue7951> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20100814.zip is a new version of the regex module. I've added default Unicode word boundaries and renamed the Pattern and Match classes. Over to you, Alex. :-) -- Added file: http://bugs.python.org/file18532/issue2636-20100814.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7255] "Default" word boundaries for Unicode data?
Matthew Barnett added the comment: These have been added to the new 'regex' module. See issue #2636 or PyPI at: http://pypi.python.org/pypi/regex -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue7255> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7255] "Default" word boundaries for Unicode data?
Matthew Barnett added the comment: If you're on Windows (x86, 32-bit) then compilation isn't necessary - just use the appropriate _regex.pyd. -- ___ Python tracker <http://bugs.python.org/issue7255> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20100816.zip is a new version of the regex module. Unfortunately I came across a bug in the handing of sets. More unit tests added. -- Added file: http://bugs.python.org/file18541/issue2636-20100816.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20100824.zip is a new version of the regex module. More speedups. Getting towards Perl speed now, depending on the regex. :-) -- Added file: http://bugs.python.org/file18621/issue2636-20100824.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20100912.zip is a new version of the regex module. More speedups. I've been comparing the speed against Perl wherever possible. In some cases Perl is lightning fast, probably because regex is built into the language and it doesn't have to parse method arguments (for some short regexes a large part of the processing time is spent in PyArg_ParseTupleAndKeywords!). In other cases, where it has to use Unicode codepoints outside the 8-bit range, or character properties such as \p{Alpha}, its performance is simply appalling! :-) -- Added file: http://bugs.python.org/file18854/issue2636-20100912.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: Another flag? Hmm. How about this instead: if a scoped flag appears at the end of a regex (and would therefore normally have no effect) then it's treated as though it's at the start of the regex. Thus: foo(?i) is treated like: (?i)foo -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: The tests for re include these regexes: a.b(?s) a.*(?s)b I understand what Georg said previously about some people preferring to put them at the end, but I personally wouldn't do that because some regex implementations support scoped inline flags, although others, like re, don't. I think that second regex is a bit perverse, though! :-) On the other matter, I could make the Unicode script and block available through a couple of functions if you need them, eg: # Using Python 3 here >>> regex.script("A") 'Latin' >>> regex.block("A") 'BasicLatin' -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: OK, so would it be OK if there was, say, a NEW (N) flag which made the inline flags (?flags) scoped and allowed splitting on zero-width matches? -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20100913.zip is a new version of the regex module. I've removed the ZEROWIDTH flag and added the NEW flag, which turns on the new behaviour such as splitting on zero-width matches and positional flags. If the NEW flag isn't turned on then the inline flags are global, like in the re module. You were right about those bugs in the regex module, Vlastimil. :-( I've left the permissiveness of the sets in, at least for the moment, or until someone complains about it! Incidentally: >>> re.findall(r"[\B]", "aBc") [] >>> re.findall(r"[\c]", "aBc") ['c'] so it is a bug in the re module (it's putting a non-word-boundary in a set). -- Added file: http://bugs.python.org/file18865/issue2636-20100913.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1708652] Exact matching
Matthew Barnett added the comment: Does this request still stand? If so then I'll add it to the new regex module. -- ___ Python tracker <http://bugs.python.org/issue1708652> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20100918.zip is a new version of the regex module. I've added 'pos' and 'endpos' arguments to regex.sub and regex.subn and refactored a little. I can't think of any other features that need to be added or see any more speed improvements. Have I missed anything important? :-) -- Added file: http://bugs.python.org/file18913/issue2636-20100918.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1708652] Exact matching
Matthew Barnett added the comment: '$' matches at the end of the string or at a newline at the end of a string (if multiline mode isn't turned on). '\Z' matches only at the end of the string. If not even the OP is convinced of the need, then I have no objection to closing. -- ___ Python tracker <http://bugs.python.org/issue1708652> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2027] Module containing C implementations of common text algorithms
Matthew Barnett added the comment: I've started on a module called 'texttools'. So far it has Levenshtein and Porter (both coded in C). If there's interest I'll put it on PyPI. Suggestions for other additions? -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue2027> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: I use Python 3, where len("\U00010337") == 2 on a narrow build. Yes, wide Unicode on a narrow build is a problem: >>> regex.findall("\\U00010337", "a\U00010337bc") [] >>> regex.findall("(?i)\\U00010337", "a\U00010337bc") [] I'm not sure how (or whether!) to handle surrogate pairs. It _would_ make things more complicated. I suppose the moral is that if you want to use wide Unicode then you really should use a wide build. -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101009.zip is a new version of the regex module. It appears from a posting in python-list and a closer look at the docs that string positions in the 're' module are limited to 32 bits, even on 64-bit builds. I think it's because of things like: Py_BuildValue("i", ...) where 'i' indicates the size of a C int, which, at least in Windows compilers, is 32-bits in both 32-bit and 64-bit builds. The regex module shared the same problem. I've changed such code to: Py_BuildValue("n", ...) and so forth, which indicates Py_ssize_t. Unfortunately I'm not able to confirm myself that this will fix the problem on 64 bits. -- Added file: http://bugs.python.org/file19168/issue2636-20101009.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: I am not able to build or test a 64-bit version. The update was to the source files to ensure that if it is compiled for 64 bits then the string positions will also be 64-bit. This change was prompted by a poster who tried to use the re module of a 64-bit Python build on a 30GB memmapped file but found that the string positions were still limited to 32 bits. It looked like a 64-bit build of the regex module would have the same limitation. -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: That's a bug. I'll fix it as soon has I've reinstalled the SDK. -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101029.zip is a new version of the regex module. I've also added to the unit tests. -- Added file: http://bugs.python.org/file19419/issue2636-20101029.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101030.zip is a new version of the regex module. I've also added yet more to the unit tests. -- Added file: http://bugs.python.org/file19422/issue2636-20101030.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101030a.zip is a new version of the regex module. This bug was a bit more difficult to fix, but I think it's OK now! -- Added file: http://bugs.python.org/file19435/issue2636-20101030a.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101101.zip is a new version of the regex module. I hope it's finally fixed this time! :-) -- Added file: http://bugs.python.org/file19456/issue2636-20101101.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101102.zip is a new version of the regex module. -- Added file: http://bugs.python.org/file19460/issue2636-20101102.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101102a.zip is a new version of the regex module. msg120204 relates to issue #1519638 "Unmatched group in replacement". In 'regex' an unmatched group is treated as an empty string in a replacement template. This behaviour is more in keeping with regex implementations in other languages. msg120206 was caused by not all group references being made case-insensitive when they should be. -- Added file: http://bugs.python.org/file19469/issue2636-20101102a.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10328] re.sub[n] doesn't seem to handle /Z replacements correctly in all cases
Matthew Barnett added the comment: It's a bug caused by trying to avoid getting stuck when a zero-width match is found. Basically the fix is to advance one character after a zero-width match, but that doesn't always give the correct result. There are a number of related issues like issue #1647489 ("zero-length match confuses re.finditer()"). -- ___ Python tracker <http://bugs.python.org/issue10328> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101106.zip is a new version of the regex module. Fix for issue 10328, which regex also shared. -- Added file: http://bugs.python.org/file19514/issue2636-20101106.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: It looks like a similar problem to msg116252 and msg116276. -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101113.zip is a new version of the regex module. It now supports Unicode 6.0.0. -- Added file: http://bugs.python.org/file19597/issue2636-20101113.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11307] re engine exhaustively explores more than necessary
Matthew Barnett added the comment: It's a known issue (see issue #1662581, for example). There's a new implementation at PyPI which doesn't have this problem: http://pypi.python.org/pypi/regex -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue11307> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: @Gregory: I've added you to the project. I'm currently trying to fix a problem with iterators shared across threads. As a temporary measure, the current release on PyPI doesn't enable multithreading for them. The mrab-regex-hg project doesn't have those sources yet. I'll update them later today, either to the release on PyPI, or to a fixed version if all goes well... -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: I've fixed the problem with iterators for both Python 3 and Python 2. They can now be shared safely across threads. I've updated the release on PyPI. -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6210] Exception Chaining missing method for suppressing context
Matthew Barnett added the comment: I've been looking through the list of current keywords and the best syntax I could come up with for suppressing the context is: try: x / y except ZeroDivisionError as e: raise as Exception( 'Invalid value for y' ) The rationale is that it's saying "forget about the original exception (if any), raise _as though_ this is the original exception". -- ___ Python tracker <http://bugs.python.org/issue6210> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11665] Regexp findall freezes
Matthew Barnett added the comment: Alex is correct. This part: [^<>]* can match an empty string, and it's nested with a repeated group. It stalls, repeatedly matching an empty string. Incidentally, my regex implementation (available on PyPI) returns []. -- ___ Python tracker <http://bugs.python.org/issue11665> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11733] Implement a `Counter.elements_count` method
Matthew Barnett added the comment: The name isn't meaningful to me. My preference would be for something like "total_count". -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue11733> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11775] `bool(Counter({'a': 0})) is True`
Matthew Barnett added the comment: It depends on what kind of object it's like. If it's like a dict then your example is clearly not empty, but if it's like a set then it /is/ empty, in which case it's empty if: all(count == 0 for count in my_counter.values()) -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue11775> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11947] re.IGNORECASE does not match literal "_" (underscore)
Matthew Barnett added the comment: help(re.sub) says: sub(pattern, repl, string, count=0) and re.IGNORECASE has a value of 2. Therefore this: re.sub("_", "X", subject, re.IGNORECASE) is telling it to replace at most 2 occurrences of "_". -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue11947> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11947] re.IGNORECASE does not match literal "_" (underscore)
Matthew Barnett added the comment: I don't know how much code that might break. It might not be that much; I can't remember when I last used re.sub without the default count. -- ___ Python tracker <http://bugs.python.org/issue11947> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11957] re.sub confusion between count and flags args
Matthew Barnett added the comment: Something like "" may be more Pythonic. -- ___ Python tracker <http://bugs.python.org/issue11957> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12078] re.sub() replaces only several matches
Matthew Barnett added the comment: Argument 4 of re.sub is the maximum number of replacements, NOT flags: Help on function sub in module re: sub(pattern, repl, string, count=0, flags=0) Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return a replacement string to be used. re.I is 2, so you're telling it to perform at most 2 replacements. -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue12078> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12130] regex 0.1.20110514 findall overlapped not working with 'start of string' expression
Matthew Barnett added the comment: Replied to the regex bug tracker. -- ___ Python tracker <http://bugs.python.org/issue12130> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7132] Regexp: capturing groups in repetitions
Matthew Barnett added the comment: Earlier this week I discovered that .Net supports repeated capture and its API suggested a much cleaner approach than what Perl offered, so I'll be adding it to the regex module at: http://pypi.python.org/pypi/regex The new methods will follow the example of .group() & co. Given a match object m, m.group(i) returns the last match of group i (or None if there's no match), so I'll be adding m.captures(i) to return a tuple of the captures (an empty tuple if there's no match). I'll also be adding m.starts(i), m.ends(i) and m.spans(i). The issue for this work is #2636. Units tests are welcome. -- ___ Python tracker <http://bugs.python.org/issue7132> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101120.zip is a new version of the regex module. The match object now supports additional methods which return information on all the successful matches of a repeated capture group. The API was inspired by that of .Net: matchobject.captures([group1, ...]) Returns a tuple of the strings matched in a group or groups. Compare with matchobject.group([group1, ...]). matchobject.starts([group]) Returns a tuple of the start positions. Compare with matchobject.start([group]). matchobject.ends([group]) Returns a tuple of the end positions. Compare with matchobject.end([group]). matchobject.spans([group]) Returns a tuple of the spans. Compare with matchobject.span([group]). -- Added file: http://bugs.python.org/file19651/issue2636-20101120.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101121.zip is a new version of the regex module. The captures didn't work properly with lookarounds or atomic groups. -- Added file: http://bugs.python.org/file19723/issue2636-20101121.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1859] textwrap doesn't linebreak on "\n"
Matthew Barnett added the comment: I'd be interested in having a go if I knew what the desired behaviour was, ie unit tests to confirm what was 'correct'. How should it handle line breaks? Should it treat them like any other whitespace as at present, should it honour them, or should it get another option, eg 'honor_breaks' (if US spelling is the standard for Python's libraries)? -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue1859> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101123.zip is a new version of the regex module. Oops, sorry, the weird behaviour of msg11 was a bug. :-( -- Added file: http://bugs.python.org/file19786/issue2636-20101123.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1859] textwrap doesn't linebreak on "\n"
Matthew Barnett added the comment: textwrap_2010-11-23.diff is my attempt to provide a fix, if it's wanted/needed. -- Added file: http://bugs.python.org/file19791/textwrap_2010-11-23.diff ___ Python tracker <http://bugs.python.org/issue1859> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10532] A bug related to matching the empty string
Matthew Barnett added the comment: The spans say this: >>> for m in re.finditer('((.d.)*)*', 'adb'): print(m.span()) (0, 3) (3, 3) There's an non-empty match followed by an empty match. IHMO, not a bug. -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue10532> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2650] re.escape should not escape underscore
Matthew Barnett added the comment: Re the regex module (issue #2636), would a good compromise be: regex.escape(user_input, special_only=True) to maintain compatibility? -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue2650> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101130.zip is a new version of the regex module. Added 'special_only' keyword parameter (default False) to regex.escape. When True, regex.escape escapes only 'special' characters, such as '?'. -- Added file: http://bugs.python.org/file19881/issue2636-20101130.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101207.zip is a new version of the regex module. It includes additional checks against pathological regexes. -- Added file: http://bugs.python.org/file19965/issue2636-20101207.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101210.zip is a new version of the regex module. I've extended the additional checks of the previous version. It has been tested with Python 2.5 to Python 3.2b1. -- Added file: http://bugs.python.org/file20001/issue2636-20101210.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10704] Regex 0.1.20101210 Python 3.1 install problem Mac OS X 10.6.5
Matthew Barnett added the comment: I use Windows XP, so I can't help with MacOS X. >From the error log it looks like it doesn't like the sources for Python either! -- ___ Python tracker <http://bugs.python.org/issue10704> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10703] Regex 0.1.20101210
Matthew Barnett added the comment: The regex module is intended to replace the re module, so its default behaviour is the same: in Python 2, regexes default to matching ASCII, and in Python 3, they default to matching Unicode. If you want to use a regex on a Unicode string in Python 2 then you need to set the Unicode flag, either by providing the UNICODE flag or by putting "(?u)" in the regex itself. -- ___ Python tracker <http://bugs.python.org/issue10703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101224.zip is a new version of the regex module. Case-insensitive matching is now faster. The matching functions and methods now accept a keyword argument to release the GIL during matching to enable other Python threads to run concurrently: matches = regex.findall(pattern, string, concurrent=True) This should be used only when it's guaranteed that the string won't change during matching. The GIL is always released when working on instances of the builtin (immutable) string classes because that's known to be safe. -- Added file: http://bugs.python.org/file20154/issue2636-20101224.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: I've been trying to push the history to Launchpad, completely without success; it just won't authenticate (no such account, even though I can log in!). I doubt that the history would be much use to you anyway. -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: It does have an SSH key. It's probably something simple that I'm missing. I think that the only change I'm likely to make is to a support script I use; it currently uses hard-coded paths, etc, to do its magic. :-) -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6210] Exception Chaining missing method for suppressing context
Changes by Matthew Barnett : -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue6210> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101228.zip is a new version of the regex module. Sorry for the delay, the fix took me a bit longer than I expected. :-) -- Added file: http://bugs.python.org/file20176/issue2636-20101228.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6210] Exception Chaining missing method for suppressing context
Matthew Barnett added the comment: Regarding syntax, I'm undecided between: raise with new_exception and: raise new_exception with caught_exception I think that the second form is clearer: try: ... exception SomeException as ex: raise SomeOtherException() with ex (I'd prefer 'with' to Steven's 'from') but the first form doesn't force you to provide a name: try: ... exception SomeException: raise with SomeOtherException() and the syntax also means that you can't chain another exception like this: try: ... exception SomeException as ex: raise SomeOtherException() with YetAnotherException() although perhaps Python should just rely on the programmer's good judgement. :-) -- ___ Python tracker <http://bugs.python.org/issue6210> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101228a.zip is a new version of the regex module. It now compiles the pattern quickly. -- Added file: http://bugs.python.org/file20182/issue2636-20101228a.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett added the comment: issue2636-20101229.zip is a new version of the regex module. It now compiles the pattern quickly. -- Added file: http://bugs.python.org/file20185/issue2636-20101229.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com