Matthew Barnett added the comment:
They're not supported in string literals either:
Python 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37) [MSC v.1929 64 bit
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more inf
Change by Matthew Barnett :
--
stage: -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.org/issue46515>
___
___
Python-bugs-list
Matthew Barnett added the comment:
That pattern has:
(?P[^]]+)+
Is that intentional? It looks wrong to me.
--
___
Python tracker
<https://bugs.python.org/issue46
Matthew Barnett added the comment:
The expression is a repeated alternative where the first alternative is a
repeat. Repeated repeats can result in a lot of attempts and backtracking and
should be avoided.
Try this instead:
(0|1(01*0)*1
Matthew Barnett added the comment:
The quantifiers use 65535 to represent no upper limit, so ".{0,65535}" is
equivalent to ".*".
For example:
>>> re.match(".*", "x" * 10).span()
(0, 10)
>>> re.match(".{0,65535}", &
Matthew Barnett added the comment:
The limit is an implementation detail. The pattern is compiled into codes which
are then interpreted, and it just happens that the codes are (usually) 16 bits,
giving a range of 0..65535, but it uses 65535 to represent no limit and doesn't
warn i
Matthew Barnett added the comment:
In reply to Ezio, the repr of a large string, list, tuple or dict is also long.
The repr of a compiled regex should probably also show the flags, but should it
just be the numeric value?
--
___
Python tracker
Matthew Barnett added the comment:
Actually, one possibility that occurs to me is to provide the flags within the
pattern. The .pattern attribute gives the original pattern, but repr could give
the flags in-line at the start of the pattern:
>>> # Assuming Python 3.
>>>
Matthew Barnett added the comment:
I'm just adding this to the regex module and I've come up against a possible
issue. The regex module supports named lists, which could be very big. Should
the entire contents of those lists also be shown in the repr?They would have to
be if the
Matthew Barnett added the comment:
That's not a bug.
This might help to explain what's going on:
What do (lambda) function closures capture in Python?
http://stackoverflow.com/questions/2295290/what-do-lambda-function-closures-capture-in-python
--
nosy: +
Matthew Barnett added the comment:
This also raises MemoryError:
re.match(r'()*?1', 'a1')
but none of these do:
re.match(r'()+1', 'a1')
re.match(r'()*1', 'a1')
--
nosy: +mrabarnett
___
Matthew Barnett added the comment:
The new regex imlementation is hosted here:
https://code.google.com/p/mrab-regex-hg/
The span of m['a_thing'] is m.span('a_thing'), if that helps.
The named groups are listed on the pattern object, which can be accessed via
m.re:
>
New submission from Matthew Barnett :
Someone over at StackOverflow had a problem with urlopen in Python 3.2.1:
http://stackoverflow.com/questions/6892573/problem-with-urlopen/6892843#6892843
This is the code:
from urllib.request import urlopen
f =
urlopen('http://online.ws
Matthew Barnett added the comment:
Just been told this bug has already been reported as issue #12576.
--
resolution: -> duplicate
___
Python tracker
<http://bugs.python.org/issu
Changes by Matthew Barnett :
--
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue12671>
___
___
Python-bugs-list mailing list
Unsubscri
Changes by Matthew Barnett :
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue12728>
___
___
Python-bugs-list mailing list
Unsubscribe:
Changes by Matthew Barnett :
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue12729>
___
___
Python-bugs-list mailing list
Unsubscribe:
Changes by Matthew Barnett :
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue12730>
___
___
Python-bugs-list mailing list
Unsubscribe:
Changes by Matthew Barnett :
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue12731>
___
___
Python-bugs-list mailing list
Unsubscribe:
Changes by Matthew Barnett :
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue12732>
___
___
Python-bugs-list mailing list
Unsubscribe:
Changes by Matthew Barnett :
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue12733>
___
___
Python-bugs-list mailing list
Unsubscribe:
Changes by Matthew Barnett :
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue12734>
___
___
Python-bugs-list mailing list
Unsubscribe:
Changes by Matthew Barnett :
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue12735>
___
___
Python-bugs-list mailing list
Unsubscribe:
Changes by Matthew Barnett :
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue12736>
___
___
Python-bugs-list mailing list
Unsubscribe:
Matthew Barnett added the comment:
In a narrow build, a codepoint in the astral plane is encoded as surrogate pair.
I could implement a workaround for it in the regex module, but I think that the
proper place to fix it is in the language as a whole, perhaps by implementing
PEP 393 ("Fle
Matthew Barnett added the comment:
There are occasions when you want to do string slicing, often of the form:
pos = my_str.index(x)
endpos = my_str.index(y)
substring = my_str[pos : endpos]
To me that suggests that if UTF-8 is used then it may be worth profiling to see
whether caching the
Matthew Barnett added the comment:
You're right about starting the second search from where the first finished.
Caching the position would be an advantage there.
The memory cost of extra pointers wouldn't be so bad if UTF-8 took less space
than the current format.
Regex isn'
Matthew Barnett added the comment:
On a narrow build, "\N{MATHEMATICAL SCRIPT CAPITAL A}" is stored as 2 code
units, and neither re nor regex recombine them when compiling a regex or
looking for a match.
regex supports \xNN, \u and \U and \N{XYZ} itself, so they can be
Matthew Barnett added the comment:
Have a look here: http://98.245.80.27/tcpc/OSCON2011/gbu/index.html
--
___
Python tracker
<http://bugs.python.org/issue12
Matthew Barnett added the comment:
For what it's worth, I've had idea about string storage, roughly based on how
*nix stores data on disk.
If a string is small, point to a block of codepoints.
If a string is medium-sized, point to a block of pointers to codepoint blocks.
If a
Matthew Barnett added the comment:
For the "Line_Break" property, one of the possible values is "Inseparable",
with 2 permitted aliases, the shorter "IN" (which is reasonable) and
"Inseperable" (ouch!).
--
_
Matthew Barnett added the comment:
Even if this bug is fixed, it still won't work as you expect, and this s why.
The Scanner function accepts a list of 2-tuples. The first item of the tuple is
a regex and the second is a function. For example:
re.Scanner([(r"\d+", number)
Matthew Barnett added the comment:
There are some oddities in Unicode case-folding.
Under full case-folding, both "\N{LATIN CAPITAL LETTER SHARP S}" and "\N{LATIN
SMALL LETTER SHARP S}" fold to "ss", which means that those codepoints match
each other.
Howe
Matthew Barnett added the comment:
The regex module currently uses simple case-folding, although I'm working
towards full case-folding, as listed in
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt.
--
___
Python tracker
Matthew Barnett added the comment:
The regex module supports nested sets and set operations, eg.
r"[[a-z]--[aeiou]]" (the letters from 'a' to 'z', except the vowels). This
means that literal '[' in a set needs to be escaped.
For example, re module s
Matthew Barnett added the comment:
I think I need a show of hands.
Should the default be old behaviour (like re) or new behaviour? (It might be
old now, new later.)
Should there be a NEW flag (as at present), or an OLD flag, or a VERSION
parameter (0=old, 1=new, 2
Matthew Barnett added the comment:
The least disruptive change would be to have a NEW flag for the new behaviour,
as at present, and an OLD flag for the old behaviour.
Currently the default is old behaviour, but in the future it will be new
behaviour.
The differences would be:
Old
Matthew Barnett added the comment:
So, VERSION0 and VERSION1, with "(?V0)" and "(?V1)" in the pattern?
--
___
Python tracker
<http://bu
Matthew Barnett added the comment:
I agree with Kamil and Germán. I would've expected negative indexes for
sequences to work. Negative indexes for fields is a different matter.
--
___
Python tracker
<http://bugs.python.org/i
Matthew Barnett added the comment:
issue2636-20100814.zip is a new version of the regex module.
I've added default Unicode word boundaries and renamed the Pattern and Match
classes.
Over to you, Alex. :-)
--
Added file: http://bugs.python.org/file18532/issue2636-2010081
Matthew Barnett added the comment:
These have been added to the new 'regex' module. See issue #2636 or PyPI at:
http://pypi.python.org/pypi/regex
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.
Matthew Barnett added the comment:
If you're on Windows (x86, 32-bit) then compilation isn't necessary - just use
the appropriate _regex.pyd.
--
___
Python tracker
<http://bugs.python.
Matthew Barnett added the comment:
issue2636-20100816.zip is a new version of the regex module.
Unfortunately I came across a bug in the handing of sets. More unit tests added.
--
Added file: http://bugs.python.org/file18541/issue2636-20100816.zip
Matthew Barnett added the comment:
issue2636-20100824.zip is a new version of the regex module.
More speedups. Getting towards Perl speed now, depending on the regex. :-)
--
Added file: http://bugs.python.org/file18621/issue2636-20100824.zip
Matthew Barnett added the comment:
issue2636-20100912.zip is a new version of the regex module.
More speedups. I've been comparing the speed against Perl wherever possible. In
some cases Perl is lightning fast, probably because regex is built into the
language and it doesn't hav
Matthew Barnett added the comment:
Another flag? Hmm.
How about this instead: if a scoped flag appears at the end of a regex (and
would therefore normally have no effect) then it's treated as though it's at
the start of the regex. Thus:
foo(?i)
is treated like:
Matthew Barnett added the comment:
The tests for re include these regexes:
a.b(?s)
a.*(?s)b
I understand what Georg said previously about some people preferring to put
them at the end, but I personally wouldn't do that because some regex
implementations support scoped inline
Matthew Barnett added the comment:
OK, so would it be OK if there was, say, a NEW (N) flag which made the inline
flags (?flags) scoped and allowed splitting on zero-width matches?
--
___
Python tracker
<http://bugs.python.org/issue2
Matthew Barnett added the comment:
issue2636-20100913.zip is a new version of the regex module.
I've removed the ZEROWIDTH flag and added the NEW flag, which turns on the new
behaviour such as splitting on zero-width matches and positional flags. If the
NEW flag isn't turned o
Matthew Barnett added the comment:
Does this request still stand? If so then I'll add it to the new regex module.
--
___
Python tracker
<http://bugs.python.org/issu
Matthew Barnett added the comment:
issue2636-20100918.zip is a new version of the regex module.
I've added 'pos' and 'endpos' arguments to regex.sub and regex.subn and
refactored a little.
I can't think of any other features that need to be added or see any mor
Matthew Barnett added the comment:
'$' matches at the end of the string or at a newline at the end of a string (if
multiline mode isn't turned on). '\Z' matches only at the end of the string.
If not even the OP is convinced of the need, then I have
Matthew Barnett added the comment:
I've started on a module called 'texttools'. So far it has Levenshtein and
Porter (both coded in C).
If there's interest I'll put it on PyPI.
Suggestions for other additions?
--
nosy: +mrabarnett
_
Matthew Barnett added the comment:
I use Python 3, where len("\U00010337") == 2 on a narrow build.
Yes, wide Unicode on a narrow build is a problem:
>>> regex.findall("\\U00010337", "a\U00010337bc")
[]
>>> regex.findall("(?i)\\U00010337&quo
Matthew Barnett added the comment:
issue2636-20101009.zip is a new version of the regex module.
It appears from a posting in python-list and a closer look at the docs that
string positions in the 're' module are limited to 32 bits, even on 64-bit
builds. I think it's because
Matthew Barnett added the comment:
I am not able to build or test a 64-bit version. The update was to the source
files to ensure that if it is compiled for 64 bits then the string positions
will also be 64-bit.
This change was prompted by a poster who tried to use the re module of a 64-bit
Matthew Barnett added the comment:
That's a bug. I'll fix it as soon has I've reinstalled the SDK.
--
___
Python tracker
<http://bugs.py
Matthew Barnett added the comment:
issue2636-20101029.zip is a new version of the regex module.
I've also added to the unit tests.
--
Added file: http://bugs.python.org/file19419/issue2636-20101029.zip
___
Python tracker
<http://bugs.py
Matthew Barnett added the comment:
issue2636-20101030.zip is a new version of the regex module.
I've also added yet more to the unit tests.
--
Added file: http://bugs.python.org/file19422/issue2636-20101030.zip
___
Python tracker
Matthew Barnett added the comment:
issue2636-20101030a.zip is a new version of the regex module.
This bug was a bit more difficult to fix, but I think it's OK now!
--
Added file: http://bugs.python.org/file19435/issue2636-20101030a.zip
___
P
Matthew Barnett added the comment:
issue2636-20101101.zip is a new version of the regex module.
I hope it's finally fixed this time! :-)
--
Added file: http://bugs.python.org/file19456/issue2636-20101101.zip
___
Python tracker
Matthew Barnett added the comment:
issue2636-20101102.zip is a new version of the regex module.
--
Added file: http://bugs.python.org/file19460/issue2636-20101102.zip
___
Python tracker
<http://bugs.python.org/issue2
Matthew Barnett added the comment:
issue2636-20101102a.zip is a new version of the regex module.
msg120204 relates to issue #1519638 "Unmatched group in replacement". In
'regex' an unmatched group is treated as an empty string in a replacement
template. This behaviour is
Matthew Barnett added the comment:
It's a bug caused by trying to avoid getting stuck when a zero-width match is
found. Basically the fix is to advance one character after a zero-width match,
but that doesn't always give the correct result.
There are a number of related issues
Matthew Barnett added the comment:
issue2636-20101106.zip is a new version of the regex module.
Fix for issue 10328, which regex also shared.
--
Added file: http://bugs.python.org/file19514/issue2636-20101106.zip
___
Python tracker
<h
Matthew Barnett added the comment:
It looks like a similar problem to msg116252 and msg116276.
--
___
Python tracker
<http://bugs.python.org/issue2636>
___
___
Matthew Barnett added the comment:
issue2636-20101113.zip is a new version of the regex module.
It now supports Unicode 6.0.0.
--
Added file: http://bugs.python.org/file19597/issue2636-20101113.zip
___
Python tracker
<http://bugs.python.
Matthew Barnett added the comment:
It's a known issue (see issue #1662581, for example).
There's a new implementation at PyPI which doesn't have this problem:
http://pypi.python.org/pypi/regex
--
nosy: +mrabarnett
___
Python
Matthew Barnett added the comment:
@Gregory: I've added you to the project.
I'm currently trying to fix a problem with iterators shared across threads. As
a temporary measure, the current release on PyPI doesn't enable multithreading
for them.
The mrab-regex-hg project doe
Matthew Barnett added the comment:
I've fixed the problem with iterators for both Python 3 and Python 2. They can
now be shared safely across threads.
I've updated the release on PyPI.
--
___
Python tracker
<http://bugs.python.
Matthew Barnett added the comment:
I've been looking through the list of current keywords and the best syntax I
could come up with for suppressing the context is:
try:
x / y
except ZeroDivisionError as e:
raise as Exception( 'Invalid value for y' )
T
Matthew Barnett added the comment:
Alex is correct.
This part:
[^<>]*
can match an empty string, and it's nested with a repeated group. It stalls,
repeatedly matching an empty string.
Incidentally, my regex implementation (available on Py
Matthew Barnett added the comment:
The name isn't meaningful to me. My preference would be for something like
"total_count".
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.pyt
Matthew Barnett added the comment:
It depends on what kind of object it's like. If it's like a dict then your
example is clearly not empty, but if it's like a set then it /is/ empty, in
which case it's empty if:
all(count == 0 for count in my_counter.values
Matthew Barnett added the comment:
help(re.sub) says:
sub(pattern, repl, string, count=0)
and re.IGNORECASE has a value of 2.
Therefore this:
re.sub("_", "X", subject, re.IGNORECASE)
is telling it to replace at most 2 occurrences of "_".
Matthew Barnett added the comment:
I don't know how much code that might break. It might not be that much; I can't
remember when I last used re.sub without the default count.
--
___
Python tracker
<http://bugs.python.o
Matthew Barnett added the comment:
Something like "" may be more Pythonic.
--
___
Python tracker
<http://bugs.python.org/issue11957>
___
___
Python-b
Matthew Barnett added the comment:
Argument 4 of re.sub is the maximum number of replacements, NOT flags:
Help on function sub in module re:
sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in
Matthew Barnett added the comment:
Replied to the regex bug tracker.
--
___
Python tracker
<http://bugs.python.org/issue12130>
___
___
Python-bugs-list mailin
Matthew Barnett added the comment:
Earlier this week I discovered that .Net supports repeated capture and its API
suggested a much cleaner approach than what Perl offered, so I'll be adding it
to the regex module at:
http://pypi.python.org/pypi/regex
The new methods will follo
Matthew Barnett added the comment:
issue2636-20101120.zip is a new version of the regex module.
The match object now supports additional methods which return information on
all the successful matches of a repeated capture group.
The API was inspired by that of .Net:
matchobject.captures
Matthew Barnett added the comment:
issue2636-20101121.zip is a new version of the regex module.
The captures didn't work properly with lookarounds or atomic groups.
--
Added file: http://bugs.python.org/file19723/issue2636-20101121.zip
___
P
Matthew Barnett added the comment:
I'd be interested in having a go if I knew what the desired behaviour was, ie
unit tests to confirm what was 'correct'.
How should it handle line breaks? Should it treat them like any other
whitespace as at present, should it honour them, o
Matthew Barnett added the comment:
issue2636-20101123.zip is a new version of the regex module.
Oops, sorry, the weird behaviour of msg11 was a bug. :-(
--
Added file: http://bugs.python.org/file19786/issue2636-20101123.zip
___
Python tracker
Matthew Barnett added the comment:
textwrap_2010-11-23.diff is my attempt to provide a fix, if it's wanted/needed.
--
Added file: http://bugs.python.org/file19791/textwrap_2010-11-23.diff
___
Python tracker
<http://bugs.python.org/i
Matthew Barnett added the comment:
The spans say this:
>>> for m in re.finditer('((.d.)*)*', 'adb'):
print(m.span())
(0, 3)
(3, 3)
There's an non-empty match followed by an empty match.
IHMO, not a bug.
--
nosy: +mrabarnett
Matthew Barnett added the comment:
Re the regex module (issue #2636), would a good compromise be:
regex.escape(user_input, special_only=True)
to maintain compatibility?
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue2
Matthew Barnett added the comment:
issue2636-20101130.zip is a new version of the regex module.
Added 'special_only' keyword parameter (default False) to regex.escape. When
True, regex.escape escapes only 'special' characters, such as '?'.
--
Adde
Matthew Barnett added the comment:
issue2636-20101207.zip is a new version of the regex module.
It includes additional checks against pathological regexes.
--
Added file: http://bugs.python.org/file19965/issue2636-20101207.zip
___
Python tracker
Matthew Barnett added the comment:
issue2636-20101210.zip is a new version of the regex module.
I've extended the additional checks of the previous version.
It has been tested with Python 2.5 to Python 3.2b1.
--
Added file: http://bugs.python.org/file20001/issue2636-2010121
Matthew Barnett added the comment:
I use Windows XP, so I can't help with MacOS X.
>From the error log it looks like it doesn't like the sources for Python either!
--
___
Python tracker
<http://bugs.pytho
Matthew Barnett added the comment:
The regex module is intended to replace the re module, so its default behaviour
is the same: in Python 2, regexes default to matching ASCII, and in Python 3,
they default to matching Unicode.
If you want to use a regex on a Unicode string in Python 2 then
Matthew Barnett added the comment:
issue2636-20101224.zip is a new version of the regex module.
Case-insensitive matching is now faster.
The matching functions and methods now accept a keyword argument to release the
GIL during matching to enable other Python threads to run concurrently
Matthew Barnett added the comment:
I've been trying to push the history to Launchpad, completely without success;
it just won't authenticate (no such account, even though I can log in!).
I doubt that the history would be much use to
Matthew Barnett added the comment:
It does have an SSH key. It's probably something simple that I'm missing.
I think that the only change I'm likely to make is to a support script I use;
it currently uses hard-coded paths, etc,
Changes by Matthew Barnett :
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue6210>
___
___
Python-bugs-list mailing list
Unsubscribe:
Matthew Barnett added the comment:
issue2636-20101228.zip is a new version of the regex module.
Sorry for the delay, the fix took me a bit longer than I expected. :-)
--
Added file: http://bugs.python.org/file20176/issue2636-20101228.zip
___
Python
Matthew Barnett added the comment:
Regarding syntax, I'm undecided between:
raise with new_exception
and:
raise new_exception with caught_exception
I think that the second form is clearer:
try:
...
exception SomeException as ex:
raise SomeOtherExce
Matthew Barnett added the comment:
issue2636-20101228a.zip is a new version of the regex module.
It now compiles the pattern quickly.
--
Added file: http://bugs.python.org/file20182/issue2636-20101228a.zip
___
Python tracker
<h
Matthew Barnett added the comment:
issue2636-20101229.zip is a new version of the regex module.
It now compiles the pattern quickly.
--
Added file: http://bugs.python.org/file20185/issue2636-20101229.zip
___
Python tracker
<http://bugs.python.
1 - 100 of 541 matches
Mail list logo