[issue25054] Capturing start of line '^'

2017-12-02 Thread Matthew Barnett
Matthew Barnett added the comment: findall() and finditer() consist of multiple uses of search(), basically, as do sub() and split(), so we want the same rule to apply to them all. -- ___ Python tracker <https://bugs.python.org/issue25

[issue33566] re.findall() dead locked whent the expected ending char not occur until end of string

2018-05-18 Thread Matthew Barnett
Matthew Barnett added the comment: You don't give the value of 'newlines', but the problem is probably catastrophic backtracking, not deadlock. -- nosy: +mrabarnett ___ Python tracker <https://bugs.pyt

[issue33721] os.path.exists() ought to return False if pathname contains NUL

2018-05-31 Thread Matthew Barnett
Matthew Barnett added the comment: It also raises a ValueError on Windows. For other invalid paths on Windows it returns False. -- nosy: +mrabarnett ___ Python tracker <https://bugs.python.org/issue33

[issue33785] Crash caused by pasting ๐Œˆ๐Œ– into python

2018-06-06 Thread Matthew Barnett
Matthew Barnett added the comment: For clarity, the first is '\U00010308\U00010316' and the second is '\U00010306\U00010300\U0001030B'. The BMP is the Basic Multilingual Plane, which covers the codepoints in the range U+ to U+. Some software has a problem dea

[issue34605] Avoid master/slave terminology

2018-09-07 Thread Matthew Barnett
Matthew Barnett added the comment: Not all uses of the word "master" are associated with slavery, e.g. "master craftsman", "master copy", "master file table". I think it's best to avoid use of master/slave where practicable, but other u

[issue34738] Distutils: ZIP files don't include directory entries

2018-09-19 Thread Matthew Barnett
Matthew Barnett added the comment: I don't see a problem with this. If the zip file has 'dist/file1.py' then you know to create a directory when unzipping. If you want to indicate that there's an empty directory 'foo', then put 'foo/' in the

[issue34763] Python lacks 0x4E17

2018-09-21 Thread Matthew Barnett
Matthew Barnett added the comment: Unicode 11.0.0 has ๅ… (U+5345) as being numeric and having the value 30. What's the difference between that and U+4E17? I notice that they look at lot alike. Are they different variants, perhaps traditional vs simpl

[issue34763] Python lacks 0x4E17

2018-09-21 Thread Matthew Barnett
Change by Matthew Barnett : -- Removed message: https://bugs.python.org/msg326015 ___ Python tracker <https://bugs.python.org/issue34763> ___ ___ Python-bug

[issue34763] Python lacks 0x4E17

2018-09-21 Thread Matthew Barnett
Change by Matthew Barnett : -- Removed message: https://bugs.python.org/msg326014 ___ Python tracker <https://bugs.python.org/issue34763> ___ ___ Python-bug

[issue34763] Python lacks 0x4E17

2018-09-21 Thread Matthew Barnett
Change by Matthew Barnett : -- Removed message: https://bugs.python.org/msg326013 ___ Python tracker <https://bugs.python.org/issue34763> ___ ___ Python-bug

[issue34763] Python lacks 0x4E17

2018-09-21 Thread Matthew Barnett
Change by Matthew Barnett : -- Removed message: https://bugs.python.org/msg326012 ___ Python tracker <https://bugs.python.org/issue34763> ___ ___ Python-bug

[issue34694] Dismiss To Avoid Slave/Master wording cause it easier for non English spoken programmers

2018-09-26 Thread Matthew Barnett
Change by Matthew Barnett : -- nosy: -mrabarnett ___ Python tracker <https://bugs.python.org/issue34694> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue35072] re.sub does not play nice with chr(92)

2018-10-26 Thread Matthew Barnett
Matthew Barnett added the comment: @Ezio: the value of stringy_thingy is irrelevant because it never gets that far; it fails when it tries to parse the replacement, which occurs before attempting any matching. I can't reproduce the difference either. -- status: pending -&

[issue7940] re.finditer and re.findall should support negative end positions

2013-05-25 Thread Matthew Barnett
Matthew Barnett added the comment: I've attached a patch. -- keywords: +patch Added file: http://bugs.python.org/file30377/issue7940.patch ___ Python tracker <http://bugs.python.org/i

[issue814253] Grouprefs in lookbehind assertions

2013-05-25 Thread Matthew Barnett
Matthew Barnett added the comment: Issue #2636 resulted in the regex module, which supports variable-length look-behinds. I don't know how much work it would take even to put a limited fixed-length look-behind fix for this into the re module, so I'm afraid the issue must r

[issue1693050] \w not helpful for non-Roman scripts

2013-05-26 Thread Matthew Barnett
Matthew Barnett added the comment: I had to check what re does in Python 3.3: >>> print(len(re.match(r'\w+', 'เคนเคฟเคจเฅเคฆเฅ€').group())) 1 Regex does this: >>> print(len(regex.match(r'\w+', 'เคนเคฟเคจเฅเคฆเฅ€').group())) 6 -- ___

[issue7940] re.finditer and re.findall should support negative end positions

2013-05-26 Thread Matthew Barnett
Matthew Barnett added the comment: Like the OP, I would've expected it to handle negative indexes the way that strings do. In practice, I wouldn't normally provide negative indexes; I'd use some string or regex method to determine the search limits, and then pass them to findit

[issue1693050] \w not helpful for non-Roman scripts

2013-05-28 Thread Matthew Barnett
Matthew Barnett added the comment: I'm not sure what you're saying. The re module in Python 3.3 matches only the first codepoint, treating the second codepoint as not part of a word, whereas the regex module matches all 6 codepoints, treating them all as part of a s

[issue1693050] \w not helpful for non-Roman scripts

2013-05-29 Thread Matthew Barnett
Matthew Barnett added the comment: You could've obtained it from msg76556 or msg190100: >>> print(ascii('เคนเคฟเคจเฅเคฆเฅ€')) '\u0939\u093f\u0928\u094d\u0926\u0940' >>> import re, regex >>> print(ascii(re.match(r"\w+", >>> &#x

[issue1693050] \w not helpful for non-Roman scripts

2013-05-29 Thread Matthew Barnett
Matthew Barnett added the comment: UTF-16 has nothing to do with it, that's just an encoding (a pair of them actually, UTF-16LE and UTF-16BE). And I don't know why you thought I was using findall in msg190100 when the examples were u

[issue18190] RuntimeError raised with re.search + re.DOTALL on empty string

2013-06-11 Thread Matthew Barnett
Matthew Barnett added the comment: Also in Python 3.3.2, but not Python 3.2. I haven't tested Python 3.3.1 or Python 3.3.0. -- versions: +Python 3.3 ___ Python tracker <http://bugs.python.org/is

[issue18286] Python 3.3 - Slowing down computer

2013-07-02 Thread Matthew Barnett
Matthew Barnett added the comment: > with open('url_list.txt') as f: > > content = f.readlines() > content = ''.join(content) > Why are you reading all of the lines and then joining them together like that? Why not just do: content = f.read() >

[issue18406] unicodedata.itergraphemes / str.itergraphemes / str.graphemes

2013-07-09 Thread Matthew Barnett
Matthew Barnett added the comment: This is basically what the regex module does, written in Python: def get_grapheme_cluster_break(codepoint): """Gets the "Grapheme Cluster Break" property of a codepoint. The properties defined here:

[issue13083] _sre: getstring() releases the buffer before using it

2013-07-17 Thread Matthew Barnett
Matthew Barnett added the comment: It looks like this was fixed for issue #14212. -- ___ Python tracker <http://bugs.python.org/issue13083> ___ ___ Python-bug

[issue18468] re.group() should never return a bytearray

2013-07-18 Thread Matthew Barnett
Matthew Barnett added the comment: There's also the fact that the match object keeps a reference to the target string anyway: >>> import re >>> t = memoryview(b"a") >>> t >>> m = re.match(b"a", t) >>> m.string On that su

[issue16964] Add 'm' format specifier for mon_grouping etc.

2013-07-22 Thread Matthew Barnett
Matthew Barnett added the comment: I've attached my attempt at a patch. -- keywords: +patch nosy: +mrabarnett Added file: http://bugs.python.org/file31009/issue16964.patch ___ Python tracker <http://bugs.python.org/is

[issue18556] ctypes' U_set() doesn't check return value of PyUnicode_AsWideChar()

2013-07-25 Thread Matthew Barnett
Matthew Barnett added the comment: Re msg193703: A little before that, 'value' is INCREF'ed, and then: wstr = PyUnicode_AsUnicodeAndSize(value, &size); if (wstr == NULL) return NULL; Shouldn't 'value' be DECREF'ed before r

[issue18614] Enhanced \N{} escapes for Unicode strings

2013-08-01 Thread Matthew Barnett
Matthew Barnett added the comment: I've attached a patch for this. -- keywords: +patch nosy: +mrabarnett Added file: http://bugs.python.org/file31112/issue18614.patch ___ Python tracker <http://bugs.python.org/is

[issue18647] re.error: nothing to repeat

2013-08-04 Thread Matthew Barnett
Matthew Barnett added the comment: Suppose you have a repeated pattern, such as "(?:...)*" or "(?:...){0,100}". If, after matching the subpattern, the text position hasn't changed, and none of the capture groups have changed, then there has been no progress, and the s

[issue18647] re.error: nothing to repeat

2013-08-04 Thread Matthew Barnett
Matthew Barnett added the comment: Python's current regex engine isn't so coded. That's the reason for the up-front check. -- ___ Python tracker <http://bugs.pyt

[issue18662] re.escape should not escape the hyphen

2013-08-05 Thread Matthew Barnett
Matthew Barnett added the comment: The help says: """>>> help(re.escape) Help on function escape in module re: escape(pattern) Escape all the characters in pattern except ASCII letters, numbers and '_'. """ The complementar

[issue18662] re.escape should not escape the hyphen

2013-08-06 Thread Matthew Barnett
Matthew Barnett added the comment: I can think of a real disadvantage with the current behaviour: it messes up Unicode graphemes. For example: >>> print('เคนเคฟเคจเฅเคฆเฅ€') เคนเคฟเคจเฅเคฆเฅ€ >>> print(re.escape('เคนเคฟเคจเฅเคฆเฅ€')) \เคน\เคฟ\เคจ\เฅ\เคฆ\เฅ€ Of course, that's only a problem i

[issue18685] Restore re performance to pre-PEP393 level

2013-08-08 Thread Matthew Barnett
Matthew Barnett added the comment: It appears that in your tests Python 3.2 is faster with Unicode than bytestrings and that unpatched Python 3.4 is a lot slower. I get somewhat different results (Windows XP Pro, 32-bit): C:\Python32\python.exe -m timeit -s "import re; f = re.compile(

[issue18685] Restore re performance to pre-PEP393 level

2013-08-09 Thread Matthew Barnett
Matthew Barnett added the comment: @Antoine: Are you on the same OS as Serhiy? IIRC, wasn't the performance regression that wxjmfauth complained about in Python 3.3 apparent on Windows, but not on Linux? -- ___ Python tracker

[issue18685] Restore re performance to pre-PEP393 level

2013-08-09 Thread Matthew Barnett
Matthew Barnett added the comment: With the patch the results are: C:\Python34\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*10" "f(x)" 1 loops, best of 3: 113 usec per loop C:\Python34\python.exe -m timeit -s &qu

[issue18647] re.error: nothing to repeat

2013-08-11 Thread Matthew Barnett
Matthew Barnett added the comment: I think you're probably right. -- ___ Python tracker <http://bugs.python.org/issue18647> ___ ___ Python-bugs-list m

[issue18832] New regex module degrades re performance

2013-08-25 Thread Matthew Barnett
Matthew Barnett added the comment: The 'regex' module is not part of the CPython distribution, so it's not covered by this tracker. -- ___ Python tracker <http://bugs.pyt

[issue18986] Add a case-insensitive case-preserving dict

2013-09-09 Thread Matthew Barnett
Matthew Barnett added the comment: Surely a case-insensitive dict should use str.casefold, not str.lower? -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue18

[issue18986] Add a case-insensitive case-preserving dict

2013-09-09 Thread Matthew Barnett
Matthew Barnett added the comment: mappeddict? Re defaultdict, you could write a dict that does all of these things, called superdict! :-) -- ___ Python tracker <http://bugs.python.org/issue18

[issue14462] In re's named group the name cannot contain unicode characters

2012-04-29 Thread Matthew Barnett
Matthew Barnett added the comment: It doesn't work in regex, but it probably should. IMHO, if it's a valid identifier, then it should be allowed. -- ___ Python tracker <http://bugs.python.o

[issue14991] Option for regex groupdict() to show only matching names

2012-06-17 Thread Matthew Barnett
Matthew Barnett added the comment: @rhettinger: The problem with "nodefault" is that it's negative, so that "nodefault=False" means that you don't not want the default, if you see what I mean. I think that "suppress" would be better: mo.groupdict(

[issue15077] Regexp match goes into infinite loop

2012-06-28 Thread Matthew Barnett
Matthew Barnett added the comment: It's not a bug, it's a pathological regex (i.e. it causes catastrophic backtracking). It also works correctly in the "regex" module. -- ___ Python tracker <http://bug

[issue15216] Support setting the encoding on a text stream after creation

2012-06-30 Thread Matthew Barnett
Matthew Barnett added the comment: Would a "set_encoding" method be Pythonic? I would've preferred an "encoding" property which flushes the output when it's changed. -- nosy: +mrabarnett ___ Python tracker <

[issue15372] Python is missing alternative for common quoting character

2012-07-16 Thread Matthew Barnett
Matthew Barnett added the comment: A codepoint such as "รฉ" ("\N{LATIN SMALL LETTER E WITH ACUTE}") can be decomposed to "\u0065\u0301" ("\N{LATIN SMALL LETTER E}\N{COMBINING ACUTE ACCENT"), but "\u201c" ("\N{LEFT DOUBLE QUOTATION

[issue13592] repr(regex) doesn't include actual regex

2012-07-19 Thread Matthew Barnett
Matthew Barnett added the comment: Python 2.7 is the end of the Python 2 line, and it's closed except for security fixes. -- ___ Python tracker <http://bugs.python.org/is

[issue15515] Regular expression match does not return

2012-07-31 Thread Matthew Barnett
Matthew Barnett added the comment: That's because it uses a pathological regular expression (catastrophic backtracking). The problem lies here: (\\?[\w\.\-]+)+ -- ___ Python tracker <http://bugs.python.org/is

[issue15515] Regular expression match does not return

2012-07-31 Thread Matthew Barnett
Matthew Barnett added the comment: It's probably inappropriate for me to mention that the alternative 'regex' module on PyPI completes promptly, so I won't. :-) -- ___ Python tracker <http://bug

[issue15537] MULTILINE confuses re.split

2012-08-02 Thread Matthew Barnett
Matthew Barnett added the comment: There are actually 2 issues here: 1. The third argument is 'maxsplit', the fourth is 'flags'. 2. It never splits on a zero-width match. See issue 3262. -- ___ Python tracker <http://bug

[issue15606] re.VERBOSE doesn't ignore certain whitespace

2012-08-10 Thread Matthew Barnett
Matthew Barnett added the comment: Ideally, yes, that whitespace should be ignored. The question is whether it's worth fixing the code for the small case of when there's whitespace within "tokens", such as within "(?:". Usually those who use verbose mode use whit

[issue10076] Regex objects became uncopyable in 2.5

2012-08-26 Thread Matthew Barnett
Matthew Barnett added the comment: Is it necessary to actually copy it? Isn't the pattern object immutable? -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/is

[issue15956] backreference to named group does not work

2012-09-18 Thread Matthew Barnett
Matthew Barnett added the comment: There needed to be a way of referring to named groups in the replacement template. The existing form \groupnumber clearly wouldn't work. Other regex implementations, such as Perl, do have \g and also \k (for named groups). In my implementation I

[issue16203] Proposal: add re.fullmatch() method

2012-10-12 Thread Matthew Barnett
Matthew Barnett added the comment: '$' will match at the end of the string or just before the final '\n': >>> re.match(r'abc$', 'abc\n') <_sre.SRE_Match object at 0x00F15448> So shouldn't you be using r'\Z' instea

[issue16203] Proposal: add re.fullmatch() method

2012-10-13 Thread Matthew Barnett
Matthew Barnett added the comment: Tim, my point is that if the MULTILINE flag happens to be turned on, '$' won't just match at the end of the string (or slice), it'll also match at a newline, so wrapping the pattern in (?:...)$ in that case could give the wrong answer,

[issue16203] Proposal: add re.fullmatch() method

2012-10-13 Thread Matthew Barnett
Matthew Barnett added the comment: It certainly appears to ignore the whitespace, even if the "(?x)" is at the end of the pattern or in the middle of a group. Another point we need to consider is that the user might want to use a pre-compil

[issue16203] Proposal: add re.fullmatch() method

2012-10-16 Thread Matthew Barnett
Matthew Barnett added the comment: I'm about to add this to my regex implementation and, naturally, I want it to have the same name for compatibility. However, I'm not that keen on "fullmatch" and would prefer "matchall&quo

[issue16203] Proposal: add re.fullmatch() method

2012-10-16 Thread Matthew Barnett
Matthew Barnett added the comment: re2's FullMatch method contrasts with its PartialMatch method, which re doesn't have! -- ___ Python tracker <http://bugs.python.o

[issue16203] Proposal: add re.fullmatch() method

2012-10-16 Thread Matthew Barnett
Matthew Barnett added the comment: OK, in order to avoid bikeshedding, "fullmatch" it is. -- ___ Python tracker <http://bugs.python.org/issue16203> ___ ___

[issue20998] fullmatch isn't matching correctly under re.IGNORECASE

2014-03-20 Thread Matthew Barnett
Matthew Barnett added the comment: FWIW, here's my own attempt at a patch. -- Added file: http://bugs.python.org/file34538/issue20998.patch ___ Python tracker <http://bugs.python.org/is

[issue20998] fullmatch isn't matching correctly under re.IGNORECASE

2014-04-04 Thread Matthew Barnett
Matthew Barnett added the comment: > > -(!ctx->match_all || ctx->ptr == state->end)) { > > +ctx->ptr == state->end) { > > Why this check is not needed anymore? > After stepping through the code for that regex that fails, I con

[issue21283] A escape character is used when a REGEXP is an argument of "strip" string function

2014-04-17 Thread Matthew Barnett
Matthew Barnett added the comment: The argument isn't a regex, it's a raw string literal consisting of the characters " (quote), \ (backslash), ' (apostrophe), < (less than) and > (greater than). -- ___ Python tracke

[issue21516] pathlib.Path(...).is_dir() crashes on some directories (Windows)

2014-05-16 Thread Matthew Barnett
Matthew Barnett added the comment: I wouldn't call it a crash. It's an exception. -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.o

[issue21551] Behavior of word boundaries in regexes unexpected

2014-05-21 Thread Matthew Barnett
Matthew Barnett added the comment: See also issue #852532, issue #3262 and issue #988761. -- ___ Python tracker <http://bugs.python.org/issue21551> ___ ___ Pytho

[issue11204] re module: strange behaviour of space inside {m, n}

2012-12-02 Thread Matthew Barnett
Matthew Barnett added the comment: Interesting. In my regex module (http://pypi.python.org/pypi/regex) I have: bool(regex.match(pat, "bb", regex.VERBOSE)) # True bool(regex.match(pat, "b{1,3}", regex.VERBOSE)) # False because I thought that when the VERBOSE flag is turned

[issue11204] re module: strange behaviour of space inside {m, n}

2012-12-02 Thread Matthew Barnett
Matthew Barnett added the comment: The question is whether re should always treat 'b{1, 3}a' as a literal, even with the VERBOSE flag. I've checked with Perl 5.14.2, and it agrees with re: adding a space _always_ makes it a literal, even with the 'x' flag (/b{1, 3}a/x

[issue16619] LOAD_GLOBAL used to load `None` under certain circumstances

2012-12-05 Thread Matthew Barnett
Matthew Barnett added the comment: The same problem occurs with both `False` and `True`. -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue16

[issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.

2012-12-14 Thread Matthew Barnett
Matthew Barnett added the comment: In function SRE_MATCH, the code for SRE_OP_GROUPREF (line 1290) contains this: while (p < e) { if (ctx->ptr >= end || SRE_CHARGET(state, ctx->ptr, 0) != SRE_CHARGET(state, p, 0)) RETURN_FAILURE; p += sta

[issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.

2012-12-15 Thread Matthew Barnett
Matthew Barnett added the comment: OK, here's a patch. -- keywords: +patch Added file: http://bugs.python.org/file28321/issue16688.patch ___ Python tracker <http://bugs.python.org/is

[issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.

2012-12-15 Thread Matthew Barnett
Matthew Barnett added the comment: I found another bug while looking through the source. On line 495 in function SRE_COUNT: if (maxcount < end - ptr && maxcount != 65535) end = ptr + maxcount*state->charsize; where 'end' and 'ptr' are of type &

[issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.

2012-12-15 Thread Matthew Barnett
Matthew Barnett added the comment: I found another bug while looking through the source. On line 495 in function SRE_COUNT: if (maxcount < end - ptr && maxcount != 65535) end = ptr + maxcount*state->charsize; where 'end' and 'ptr' are of type &

[issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.

2012-12-15 Thread Matthew Barnett
Matthew Barnett added the comment: I haven't found any other issues, so here's the second patch. -- Added file: http://bugs.python.org/file28325/issue16688#2.patch ___ Python tracker <http://bugs.python.o

[issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.

2012-12-16 Thread Matthew Barnett
Matthew Barnett added the comment: Here are some tests for the issue. -- Added file: http://bugs.python.org/file28330/issue16688#3.patch ___ Python tracker <http://bugs.python.org/issue16

[issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.

2012-12-16 Thread Matthew Barnett
Matthew Barnett added the comment: Oops! :-( Now corrected. -- Added file: http://bugs.python.org/file28332/issue16688#3.patch ___ Python tracker <http://bugs.python.org/issue16

[issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.

2012-12-16 Thread Matthew Barnett
Changes by Matthew Barnett : Removed file: http://bugs.python.org/file28330/issue16688#3.patch ___ Python tracker <http://bugs.python.org/issue16688> ___ ___ Python-bug

[issue1075356] exceeding obscure weakproxy bug

2012-12-19 Thread Matthew Barnett
Matthew Barnett added the comment: The patch "issue1075356.patch" is my attempt to fix this bug. 'PyArg_ParseTuple', etc, eventually call 'convertsimple'. What this patch does is to insert some code at the start of 'convertsimple' that checks whether the

[issue16741] `int()`, `float()`, etc think python strings are null-terminated

2012-12-21 Thread Matthew Barnett
Matthew Barnett added the comment: Python takes a long way round when converting strings to int. It does the following (I'll be talking about Python 3.3 here): 1. In function 'fix_decimal_and_space_to_ascii', the different kinds of spaces are converted to " " and the

[issue16741] `int()`, `float()`, etc think python strings are null-terminated

2012-12-23 Thread Matthew Barnett
Matthew Barnett added the comment: It occurred to me that the truncation of the string when building the error message could cause a UnicodeDecodeError: >>> int("1".ljust(199) + "\u0100") Traceback (most recent call last): File "", line

[issue16741] `int()`, `float()`, etc think python strings are null-terminated

2012-12-29 Thread Matthew Barnett
Matthew Barnett added the comment: I've attached a patch. It now reports an invalid literal as-is: >>> int("#\N{ARABIC-INDIC DIGIT ONE}") Traceback (most recent call last): File "", line 1, in int("#\N{ARABIC-INDIC DIGIT ONE}") ValueError:

[issue16741] `int()`, `float()`, etc think python strings are null-terminated

2012-12-30 Thread Matthew Barnett
Matthew Barnett added the comment: I've attached a small additional patch for truncating the UTF-8. I don't know whether it's strictly necessary, but I don't know that it's unnecessary either! (Better safe than sorry.) -- Added file: http://bugs.python.org/fil

[issue16870] re fails to match ^ when start index is specified ?

2013-01-05 Thread Matthew Barnett
Matthew Barnett added the comment: The semantics of '^' are common to many different regex implementations, including those of Perl and C#. The 'pos' argument merely gives the starting position the search (C# also lets you provide a starting position, and behaves in

[issue13899] re pattern r"[\A]" should work like "A" but matches nothing. Ditto B and Z.

2013-01-07 Thread Matthew Barnett
Matthew Barnett added the comment: I've attached a patch. -- keywords: +patch Added file: http://bugs.python.org/file28614/issue13899.patch ___ Python tracker <http://bugs.python.org/is

[issue9669] regexp: zero-width matches in MIN_UNTIL

2013-01-15 Thread Matthew Barnett
Matthew Barnett added the comment: I've attached my attempt at a patch. -- keywords: +patch Added file: http://bugs.python.org/file28744/issue9669.patch ___ Python tracker <http://bugs.python.org/i

[issue17016] _sre: avoid relying on pointer overflow

2013-01-22 Thread Matthew Barnett
Matthew Barnett added the comment: Lines 1000 and 1084 will be a problem only if you're near the top of the address space. This is because: 1. ctx->pattern[1] will always be <= ctx->pattern[2]. 2. A value of 65535 in ctx->pattern[2] means unlimited, even though SRE_CODE i

[issue17016] _sre: avoid relying on pointer overflow

2013-01-23 Thread Matthew Barnett
Matthew Barnett added the comment: You're checking "int offset", but what happens with "unsigned int offset"? -- ___ Python tracker <http:

[issue13169] Regular expressions with 0 to 65536 repetitions raises OverflowError

2013-01-23 Thread Matthew Barnett
Matthew Barnett added the comment: IMHO, I don't think that MAXREPEAT should be defined in sre_constants.py _and_ SRE_MAXREPEAT defined in sre_constants.h. (In the latter case, why is it in decimal?) I think that it should be defined in one place, namely sre_constants.h, perhaps as: #d

[issue16203] Proposal: add re.fullmatch() method

2013-02-04 Thread Matthew Barnett
Matthew Barnett added the comment: I've attached a patch. -- Added file: http://bugs.python.org/file28955/issue16203_mrab.patch ___ Python tracker <http://bugs.python.org/is

[issue16203] Proposal: add re.fullmatch() method

2013-02-05 Thread Matthew Barnett
Matthew Barnett added the comment: 3 of the tests expect None when using 'fullmatch'; they won't return None when using 'match'. -- ___ Python tracker <http:

[issue17047] Fix double double words words

2013-02-06 Thread Matthew Barnett
Matthew Barnett added the comment: These are the ones that I think are wrong: Doc/c-api/long.rst:206 Return a C :c:type:`size_t` representation of of *pylong*. *pylong* must be Doc/c-api/long.rst:218 Return a C :c:type:`unsigned PY_LONG_LONG` representation of of *pylong*. Doc

[issue17184] re.VERBOSE doesn't respect whitespace in '( ?P...)'

2013-02-11 Thread Matthew Barnett
Matthew Barnett added the comment: It does look like a duplicate to me. -- ___ Python tracker <http://bugs.python.org/issue17184> ___ ___ Python-bugs-list mailin

[issue19055] Regular expressions: * does not match as many repetitions as possible.

2013-09-19 Thread Matthew Barnett
Matthew Barnett added the comment: The behaviour is correct. Here's a summary of what's happening:- First iteration of the repeated group: Try the first branch. Can match "a". Second iteration of the repeated group: Try the first branch. Can't match "

[issue19279] UTF-7 to UTF-8 decoding crash

2013-10-17 Thread Matthew Barnett
Matthew Barnett added the comment: The bytestring literal isn't valid. It starts with b" and later on has an unescaped " followed by more characters. Also, the usual way to decode by using the .decode method. I get this: >>> content = b"+1911\' rel=\'st

[issue19408] Regex with set of characters and groups raises error

2013-10-26 Thread Matthew Barnett
Matthew Barnett added the comment: The traceback says "bad character range" because ord('+') == 43 and ord('*') == 42. It's not surprising that it complains if the range isn't valid. -- ___ Python tra

[issue19443] add to dict fails after 1,000,000 items on py 2.7.5

2013-10-30 Thread Matthew Barnett
Matthew Barnett added the comment: Works for me: Python 2.7.5, 64-bit, Windows 8.1 -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue19

[issue16203] Proposal: add re.fullmatch() method

2013-11-16 Thread Matthew Barnett
Matthew Barnett added the comment: I don't know that it's not needed. -- ___ Python tracker <http://bugs.python.org/issue16203> ___ ___ Python-bugs-l

[issue19823] for-each on list aborts earlier than expected

2013-11-28 Thread Matthew Barnett
Matthew Barnett added the comment: This issue is best posted to python-list and only posted here if it's agreed that it's a bug. Anyway: 1. You have "self.flows" and "flows", but haven't said what they are. 2. It's recommended that you don't modi

[issue14460] In re's positive lookbehind assertion repetition works

2014-06-26 Thread Matthew Barnett
Matthew Barnett added the comment: Lookarounds can contain capture groups: >>> import re >>> re.search(r'a(?=(.))', 'ab').groups() ('b',) >>> re.search(r'(?<=(.))b', 'ab').groups() ('a',) so lookaro

[issue14460] In re's positive lookbehind assertion repetition works

2014-06-26 Thread Matthew Barnett
Matthew Barnett added the comment: Lookarounds can capture, but they don't consume. That lookbehind is matching the same part of the string every time. -- ___ Python tracker <http://bugs.python.org/is

[issue9529] Make re match object iterable

2014-08-01 Thread Matthew Barnett
Matthew Barnett added the comment: Match objects have a .groups method: >>> import re >>> m = re.match(r'(\w+):(\w+)', 'qwerty:asdfgh') >>> m.groups() ('qwerty', 'asdfgh') >>> k, v = m.groups() >>> k '

[issue22119] Some input chars (i.e. '++') break re.match

2014-08-01 Thread Matthew Barnett
Matthew Barnett added the comment: In a regex, '+' is a metacharacter meaning "repeated one or more times". "libstdc+" will match "libstd" followed by "c" repeated one or more times. "libstdc++" will match "libstd"

[issue14076] sqlite3 module ignores placeholders in CREATE TRIGGER code

2014-09-09 Thread Matthew Barnett
Matthew Barnett added the comment: For comparison: Python 3.1.3: [(b'',)] Python 3.2.5: [(None,)] Python 3.3.5: [(b'',)] Python 3.4.1: sqlite3.OperationalError: trigger cannot use variables -- nosy: +mrabarnett ___ P

[issue22364] Unify error messages of re and regex

2014-09-09 Thread Matthew Barnett
Matthew Barnett added the comment: > re:Cannot process flags argument with a compiled pattern > regex: can't process flags argument with a compiled pattern Error messages usually start with a lowercase letter, and I think that all the other ones in the re module do. By the wa

<    1   2   3   4   5   6   >