Matthew Barnett added the comment:
findall() and finditer() consist of multiple uses of search(), basically, as do
sub() and split(), so we want the same rule to apply to them all.
--
___
Python tracker
<https://bugs.python.org/issue25
Matthew Barnett added the comment:
You don't give the value of 'newlines', but the problem is probably
catastrophic backtracking, not deadlock.
--
nosy: +mrabarnett
___
Python tracker
<https://bugs.pyt
Matthew Barnett added the comment:
It also raises a ValueError on Windows. For other invalid paths on Windows it
returns False.
--
nosy: +mrabarnett
___
Python tracker
<https://bugs.python.org/issue33
Matthew Barnett added the comment:
For clarity, the first is '\U00010308\U00010316' and the second is
'\U00010306\U00010300\U0001030B'.
The BMP is the Basic Multilingual Plane, which covers the codepoints in the
range U+ to U+. Some software has a problem dea
Matthew Barnett added the comment:
Not all uses of the word "master" are associated with slavery, e.g. "master
craftsman", "master copy", "master file table".
I think it's best to avoid use of master/slave where practicable, but other
u
Matthew Barnett added the comment:
I don't see a problem with this. If the zip file has 'dist/file1.py' then you
know to create a directory when unzipping. If you want to indicate that there's
an empty directory 'foo', then put 'foo/' in the
Matthew Barnett added the comment:
Unicode 11.0.0 has ๅ
(U+5345) as being numeric and having the value 30.
What's the difference between that and U+4E17?
I notice that they look at lot alike. Are they different variants, perhaps
traditional vs simpl
Change by Matthew Barnett :
--
Removed message: https://bugs.python.org/msg326015
___
Python tracker
<https://bugs.python.org/issue34763>
___
___
Python-bug
Change by Matthew Barnett :
--
Removed message: https://bugs.python.org/msg326014
___
Python tracker
<https://bugs.python.org/issue34763>
___
___
Python-bug
Change by Matthew Barnett :
--
Removed message: https://bugs.python.org/msg326013
___
Python tracker
<https://bugs.python.org/issue34763>
___
___
Python-bug
Change by Matthew Barnett :
--
Removed message: https://bugs.python.org/msg326012
___
Python tracker
<https://bugs.python.org/issue34763>
___
___
Python-bug
Change by Matthew Barnett :
--
nosy: -mrabarnett
___
Python tracker
<https://bugs.python.org/issue34694>
___
___
Python-bugs-list mailing list
Unsubscribe:
Matthew Barnett added the comment:
@Ezio: the value of stringy_thingy is irrelevant because it never gets that
far; it fails when it tries to parse the replacement, which occurs before
attempting any matching.
I can't reproduce the difference either.
--
status: pending -&
Matthew Barnett added the comment:
I've attached a patch.
--
keywords: +patch
Added file: http://bugs.python.org/file30377/issue7940.patch
___
Python tracker
<http://bugs.python.org/i
Matthew Barnett added the comment:
Issue #2636 resulted in the regex module, which supports variable-length
look-behinds.
I don't know how much work it would take even to put a limited fixed-length
look-behind fix for this into the re module, so I'm afraid the issue must
r
Matthew Barnett added the comment:
I had to check what re does in Python 3.3:
>>> print(len(re.match(r'\w+', 'เคนเคฟเคจเฅเคฆเฅ').group()))
1
Regex does this:
>>> print(len(regex.match(r'\w+', 'เคนเคฟเคจเฅเคฆเฅ').group()))
6
--
___
Matthew Barnett added the comment:
Like the OP, I would've expected it to handle negative indexes the way that
strings do.
In practice, I wouldn't normally provide negative indexes; I'd use some string
or regex method to determine the search limits, and then pass them to findit
Matthew Barnett added the comment:
I'm not sure what you're saying.
The re module in Python 3.3 matches only the first codepoint, treating the
second codepoint as not part of a word, whereas the regex module matches all 6
codepoints, treating them all as part of a s
Matthew Barnett added the comment:
You could've obtained it from msg76556 or msg190100:
>>> print(ascii('เคนเคฟเคจเฅเคฆเฅ'))
'\u0939\u093f\u0928\u094d\u0926\u0940'
>>> import re, regex
>>> print(ascii(re.match(r"\w+",
>>>
Matthew Barnett added the comment:
UTF-16 has nothing to do with it, that's just an encoding (a pair of them
actually, UTF-16LE and UTF-16BE).
And I don't know why you thought I was using findall in msg190100 when the
examples were u
Matthew Barnett added the comment:
Also in Python 3.3.2, but not Python 3.2.
I haven't tested Python 3.3.1 or Python 3.3.0.
--
versions: +Python 3.3
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
> with open('url_list.txt') as f:
>
> content = f.readlines()
> content = ''.join(content)
>
Why are you reading all of the lines and then joining them together like that?
Why not just do:
content = f.read()
>
Matthew Barnett added the comment:
This is basically what the regex module does, written in Python:
def get_grapheme_cluster_break(codepoint):
"""Gets the "Grapheme Cluster Break" property of a codepoint.
The properties defined here:
Matthew Barnett added the comment:
It looks like this was fixed for issue #14212.
--
___
Python tracker
<http://bugs.python.org/issue13083>
___
___
Python-bug
Matthew Barnett added the comment:
There's also the fact that the match object keeps a reference to the target
string anyway:
>>> import re
>>> t = memoryview(b"a")
>>> t
>>> m = re.match(b"a", t)
>>> m.string
On that su
Matthew Barnett added the comment:
I've attached my attempt at a patch.
--
keywords: +patch
nosy: +mrabarnett
Added file: http://bugs.python.org/file31009/issue16964.patch
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
Re msg193703: A little before that, 'value' is INCREF'ed, and then:
wstr = PyUnicode_AsUnicodeAndSize(value, &size);
if (wstr == NULL)
return NULL;
Shouldn't 'value' be DECREF'ed before r
Matthew Barnett added the comment:
I've attached a patch for this.
--
keywords: +patch
nosy: +mrabarnett
Added file: http://bugs.python.org/file31112/issue18614.patch
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
Suppose you have a repeated pattern, such as "(?:...)*" or "(?:...){0,100}".
If, after matching the subpattern, the text position hasn't changed, and none
of the capture groups have changed, then there has been no progress, and the
s
Matthew Barnett added the comment:
Python's current regex engine isn't so coded. That's the reason for the
up-front check.
--
___
Python tracker
<http://bugs.pyt
Matthew Barnett added the comment:
The help says:
""">>> help(re.escape)
Help on function escape in module re:
escape(pattern)
Escape all the characters in pattern except ASCII letters, numbers and '_'.
"""
The complementar
Matthew Barnett added the comment:
I can think of a real disadvantage with the current behaviour: it messes up
Unicode graphemes.
For example:
>>> print('เคนเคฟเคจเฅเคฆเฅ')
เคนเคฟเคจเฅเคฆเฅ
>>> print(re.escape('เคนเคฟเคจเฅเคฆเฅ'))
\เคน\เคฟ\เคจ\เฅ\เคฆ\เฅ
Of course, that's only a problem i
Matthew Barnett added the comment:
It appears that in your tests Python 3.2 is faster with Unicode than
bytestrings and that unpatched Python 3.4 is a lot slower.
I get somewhat different results (Windows XP Pro, 32-bit):
C:\Python32\python.exe -m timeit -s "import re; f = re.compile(
Matthew Barnett added the comment:
@Antoine: Are you on the same OS as Serhiy?
IIRC, wasn't the performance regression that wxjmfauth complained about in
Python 3.3 apparent on Windows, but not on Linux?
--
___
Python tracker
Matthew Barnett added the comment:
With the patch the results are:
C:\Python34\python.exe -m timeit -s "import re; f = re.compile(b'abc').search;
x = b'x'*10" "f(x)"
1 loops, best of 3: 113 usec per loop
C:\Python34\python.exe -m timeit -s &qu
Matthew Barnett added the comment:
I think you're probably right.
--
___
Python tracker
<http://bugs.python.org/issue18647>
___
___
Python-bugs-list m
Matthew Barnett added the comment:
The 'regex' module is not part of the CPython distribution, so it's not covered
by this tracker.
--
___
Python tracker
<http://bugs.pyt
Matthew Barnett added the comment:
Surely a case-insensitive dict should use str.casefold, not str.lower?
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue18
Matthew Barnett added the comment:
mappeddict?
Re defaultdict, you could write a dict that does all of these things, called
superdict! :-)
--
___
Python tracker
<http://bugs.python.org/issue18
Matthew Barnett added the comment:
It doesn't work in regex, but it probably should. IMHO, if it's a valid
identifier, then it should be allowed.
--
___
Python tracker
<http://bugs.python.o
Matthew Barnett added the comment:
@rhettinger: The problem with "nodefault" is that it's negative, so that
"nodefault=False" means that you don't not want the default, if you see what I
mean. I think that "suppress" would be better:
mo.groupdict(
Matthew Barnett added the comment:
It's not a bug, it's a pathological regex (i.e. it causes catastrophic
backtracking).
It also works correctly in the "regex" module.
--
___
Python tracker
<http://bug
Matthew Barnett added the comment:
Would a "set_encoding" method be Pythonic? I would've preferred an "encoding"
property which flushes the output when it's changed.
--
nosy: +mrabarnett
___
Python tracker
<
Matthew Barnett added the comment:
A codepoint such as "รฉ" ("\N{LATIN SMALL LETTER E WITH ACUTE}") can be
decomposed to "\u0065\u0301" ("\N{LATIN SMALL LETTER E}\N{COMBINING ACUTE
ACCENT"), but "\u201c" ("\N{LEFT DOUBLE QUOTATION
Matthew Barnett added the comment:
Python 2.7 is the end of the Python 2 line, and it's closed except for security
fixes.
--
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
That's because it uses a pathological regular expression (catastrophic
backtracking).
The problem lies here: (\\?[\w\.\-]+)+
--
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
It's probably inappropriate for me to mention that the alternative 'regex'
module on PyPI completes promptly, so I won't. :-)
--
___
Python tracker
<http://bug
Matthew Barnett added the comment:
There are actually 2 issues here:
1. The third argument is 'maxsplit', the fourth is 'flags'.
2. It never splits on a zero-width match. See issue 3262.
--
___
Python tracker
<http://bug
Matthew Barnett added the comment:
Ideally, yes, that whitespace should be ignored.
The question is whether it's worth fixing the code for the small case of when
there's whitespace within "tokens", such as within "(?:". Usually those who use
verbose mode use whit
Matthew Barnett added the comment:
Is it necessary to actually copy it? Isn't the pattern object immutable?
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
There needed to be a way of referring to named groups in the replacement
template. The existing form \groupnumber clearly wouldn't work. Other regex
implementations, such as Perl, do have \g and also \k (for named groups).
In my implementation I
Matthew Barnett added the comment:
'$' will match at the end of the string or just before the final '\n':
>>> re.match(r'abc$', 'abc\n')
<_sre.SRE_Match object at 0x00F15448>
So shouldn't you be using r'\Z' instea
Matthew Barnett added the comment:
Tim, my point is that if the MULTILINE flag happens to be turned on, '$' won't
just match at the end of the string (or slice), it'll also match at a newline,
so wrapping the pattern in (?:...)$ in that case could give the wrong answer,
Matthew Barnett added the comment:
It certainly appears to ignore the whitespace, even if the "(?x)" is at the end
of the pattern or in the middle of a group.
Another point we need to consider is that the user might want to use a
pre-compil
Matthew Barnett added the comment:
I'm about to add this to my regex implementation and, naturally, I want it to
have the same name for compatibility.
However, I'm not that keen on "fullmatch" and would prefer "matchall&quo
Matthew Barnett added the comment:
re2's FullMatch method contrasts with its PartialMatch method, which re doesn't
have!
--
___
Python tracker
<http://bugs.python.o
Matthew Barnett added the comment:
OK, in order to avoid bikeshedding, "fullmatch" it is.
--
___
Python tracker
<http://bugs.python.org/issue16203>
___
___
Matthew Barnett added the comment:
FWIW, here's my own attempt at a patch.
--
Added file: http://bugs.python.org/file34538/issue20998.patch
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
> > -(!ctx->match_all || ctx->ptr == state->end)) {
> > +ctx->ptr == state->end) {
>
> Why this check is not needed anymore?
>
After stepping through the code for that regex that fails, I con
Matthew Barnett added the comment:
The argument isn't a regex, it's a raw string literal consisting of the
characters " (quote), \ (backslash), ' (apostrophe), < (less than) and >
(greater than).
--
___
Python tracke
Matthew Barnett added the comment:
I wouldn't call it a crash. It's an exception.
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.o
Matthew Barnett added the comment:
See also issue #852532, issue #3262 and issue #988761.
--
___
Python tracker
<http://bugs.python.org/issue21551>
___
___
Pytho
Matthew Barnett added the comment:
Interesting.
In my regex module (http://pypi.python.org/pypi/regex) I have:
bool(regex.match(pat, "bb", regex.VERBOSE)) # True
bool(regex.match(pat, "b{1,3}", regex.VERBOSE)) # False
because I thought that when the VERBOSE flag is turned
Matthew Barnett added the comment:
The question is whether re should always treat 'b{1, 3}a' as a literal, even
with the VERBOSE flag.
I've checked with Perl 5.14.2, and it agrees with re: adding a space _always_
makes it a literal, even with the 'x' flag (/b{1, 3}a/x
Matthew Barnett added the comment:
The same problem occurs with both `False` and `True`.
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue16
Matthew Barnett added the comment:
In function SRE_MATCH, the code for SRE_OP_GROUPREF (line 1290) contains this:
while (p < e) {
if (ctx->ptr >= end ||
SRE_CHARGET(state, ctx->ptr, 0) != SRE_CHARGET(state, p, 0))
RETURN_FAILURE;
p += sta
Matthew Barnett added the comment:
OK, here's a patch.
--
keywords: +patch
Added file: http://bugs.python.org/file28321/issue16688.patch
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
I found another bug while looking through the source.
On line 495 in function SRE_COUNT:
if (maxcount < end - ptr && maxcount != 65535)
end = ptr + maxcount*state->charsize;
where 'end' and 'ptr' are of type &
Matthew Barnett added the comment:
I found another bug while looking through the source.
On line 495 in function SRE_COUNT:
if (maxcount < end - ptr && maxcount != 65535)
end = ptr + maxcount*state->charsize;
where 'end' and 'ptr' are of type &
Matthew Barnett added the comment:
I haven't found any other issues, so here's the second patch.
--
Added file: http://bugs.python.org/file28325/issue16688#2.patch
___
Python tracker
<http://bugs.python.o
Matthew Barnett added the comment:
Here are some tests for the issue.
--
Added file: http://bugs.python.org/file28330/issue16688#3.patch
___
Python tracker
<http://bugs.python.org/issue16
Matthew Barnett added the comment:
Oops! :-( Now corrected.
--
Added file: http://bugs.python.org/file28332/issue16688#3.patch
___
Python tracker
<http://bugs.python.org/issue16
Changes by Matthew Barnett :
Removed file: http://bugs.python.org/file28330/issue16688#3.patch
___
Python tracker
<http://bugs.python.org/issue16688>
___
___
Python-bug
Matthew Barnett added the comment:
The patch "issue1075356.patch" is my attempt to fix this bug.
'PyArg_ParseTuple', etc, eventually call 'convertsimple'. What this patch does
is to insert some code at the start of 'convertsimple' that checks whether the
Matthew Barnett added the comment:
Python takes a long way round when converting strings to int. It does the
following (I'll be talking about Python 3.3 here):
1. In function 'fix_decimal_and_space_to_ascii', the different kinds of spaces
are converted to " " and the
Matthew Barnett added the comment:
It occurred to me that the truncation of the string when building the error
message could cause a UnicodeDecodeError:
>>> int("1".ljust(199) + "\u0100")
Traceback (most recent call last):
File "", line
Matthew Barnett added the comment:
I've attached a patch.
It now reports an invalid literal as-is:
>>> int("#\N{ARABIC-INDIC DIGIT ONE}")
Traceback (most recent call last):
File "", line 1, in
int("#\N{ARABIC-INDIC DIGIT ONE}")
ValueError:
Matthew Barnett added the comment:
I've attached a small additional patch for truncating the UTF-8.
I don't know whether it's strictly necessary, but I don't know that it's
unnecessary either! (Better safe than sorry.)
--
Added file: http://bugs.python.org/fil
Matthew Barnett added the comment:
The semantics of '^' are common to many different regex implementations,
including those of Perl and C#.
The 'pos' argument merely gives the starting position the search (C# also lets
you provide a starting position, and behaves in
Matthew Barnett added the comment:
I've attached a patch.
--
keywords: +patch
Added file: http://bugs.python.org/file28614/issue13899.patch
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
I've attached my attempt at a patch.
--
keywords: +patch
Added file: http://bugs.python.org/file28744/issue9669.patch
___
Python tracker
<http://bugs.python.org/i
Matthew Barnett added the comment:
Lines 1000 and 1084 will be a problem only if you're near the top of the
address space. This is because:
1. ctx->pattern[1] will always be <= ctx->pattern[2].
2. A value of 65535 in ctx->pattern[2] means unlimited, even though SRE_CODE i
Matthew Barnett added the comment:
You're checking "int offset", but what happens with "unsigned int offset"?
--
___
Python tracker
<http:
Matthew Barnett added the comment:
IMHO, I don't think that MAXREPEAT should be defined in sre_constants.py _and_
SRE_MAXREPEAT defined in sre_constants.h. (In the latter case, why is it in
decimal?)
I think that it should be defined in one place, namely sre_constants.h, perhaps
as:
#d
Matthew Barnett added the comment:
I've attached a patch.
--
Added file: http://bugs.python.org/file28955/issue16203_mrab.patch
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
3 of the tests expect None when using 'fullmatch'; they won't return None when
using 'match'.
--
___
Python tracker
<http:
Matthew Barnett added the comment:
These are the ones that I think are wrong:
Doc/c-api/long.rst:206
Return a C :c:type:`size_t` representation of of *pylong*. *pylong* must be
Doc/c-api/long.rst:218
Return a C :c:type:`unsigned PY_LONG_LONG` representation of of *pylong*.
Doc
Matthew Barnett added the comment:
It does look like a duplicate to me.
--
___
Python tracker
<http://bugs.python.org/issue17184>
___
___
Python-bugs-list mailin
Matthew Barnett added the comment:
The behaviour is correct.
Here's a summary of what's happening:-
First iteration of the repeated group:
Try the first branch. Can match "a".
Second iteration of the repeated group:
Try the first branch. Can't match "
Matthew Barnett added the comment:
The bytestring literal isn't valid. It starts with b" and later on has an
unescaped " followed by more characters.
Also, the usual way to decode by using the .decode method.
I get this:
>>> content = b"+1911\' rel=\'st
Matthew Barnett added the comment:
The traceback says "bad character range" because ord('+') == 43 and ord('*') ==
42. It's not surprising that it complains if the range isn't valid.
--
___
Python tra
Matthew Barnett added the comment:
Works for me: Python 2.7.5, 64-bit, Windows 8.1
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue19
Matthew Barnett added the comment:
I don't know that it's not needed.
--
___
Python tracker
<http://bugs.python.org/issue16203>
___
___
Python-bugs-l
Matthew Barnett added the comment:
This issue is best posted to python-list and only posted here if it's agreed
that it's a bug.
Anyway:
1. You have "self.flows" and "flows", but haven't said what they are.
2. It's recommended that you don't modi
Matthew Barnett added the comment:
Lookarounds can contain capture groups:
>>> import re
>>> re.search(r'a(?=(.))', 'ab').groups()
('b',)
>>> re.search(r'(?<=(.))b', 'ab').groups()
('a',)
so lookaro
Matthew Barnett added the comment:
Lookarounds can capture, but they don't consume. That lookbehind is matching
the same part of the string every time.
--
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
Match objects have a .groups method:
>>> import re
>>> m = re.match(r'(\w+):(\w+)', 'qwerty:asdfgh')
>>> m.groups()
('qwerty', 'asdfgh')
>>> k, v = m.groups()
>>> k
'
Matthew Barnett added the comment:
In a regex, '+' is a metacharacter meaning "repeated one or more times".
"libstdc+" will match "libstd" followed by "c" repeated one or more times.
"libstdc++" will match "libstd"
Matthew Barnett added the comment:
For comparison:
Python 3.1.3:
[(b'',)]
Python 3.2.5:
[(None,)]
Python 3.3.5:
[(b'',)]
Python 3.4.1:
sqlite3.OperationalError: trigger cannot use variables
--
nosy: +mrabarnett
___
P
Matthew Barnett added the comment:
> re:Cannot process flags argument with a compiled pattern
> regex: can't process flags argument with a compiled pattern
Error messages usually start with a lowercase letter, and I think that all the
other ones in the re module do.
By the wa
301 - 400 of 541 matches
Mail list logo