from:"Matthew Barnett"

[issue46410] TypeError when parsing regexp with unicode named character sequence escape

2022-01-18 Thread Matthew Barnett



Matthew Barnett  added the comment:

They're not supported in string literals either:

Python 3.10.1 (tags/v3.10.1:2cd268a, Dec  6 2021, 19:10:37) [MSC v.1929 64 bit 
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> "\N{KEYCAP NUMBER SIGN}"
  File "", line 1
"\N{KEYCAP NUMBER SIGN}"
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in 
position 0-21: unknown Unicode character name

--

___
Python tracker 
<https://bugs.python.org/issue46410>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue46515] Benefits Of Phool Makhana

2022-01-25 Thread Matthew Barnett



Change by Matthew Barnett :


--
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46515>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue46627] Regex hangs indefinitely

2022-02-03 Thread Matthew Barnett



Matthew Barnett  added the comment:

That pattern has:

(?P[^]]+)+

Is that intentional? It looks wrong to me.

--

___
Python tracker 
<https://bugs.python.org/issue46627>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue46825] slow matching on regular expression

2022-02-22 Thread Matthew Barnett



Matthew Barnett  added the comment:

The expression is a repeated alternative where the first alternative is a 
repeat. Repeated repeats can result in a lot of attempts and backtracking and 
should be avoided.

Try this instead:

(0|1(01*0)*1)+

--

___
Python tracker 
<https://bugs.python.org/issue46825>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13169] Regular expressions with 0 to 65536 repetitions raises OverflowError

2011-10-13 Thread Matthew Barnett


Matthew Barnett  added the comment:

The quantifiers use 65535 to represent no upper limit, so ".{0,65535}" is 
equivalent to ".*".

For example:

>>> re.match(".*", "x" * 10).span()
(0, 10)
>>> re.match(".{0,65535}", "x" * 10).span()
(0, 10)

but:

>>> re.match(".{0,65534}", "x" * 10).span()
(0, 65534)

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue13169>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13169] Regular expressions with 0 to 65536 repetitions raises OverflowError

2011-10-14 Thread Matthew Barnett


Matthew Barnett  added the comment:

The limit is an implementation detail. The pattern is compiled into codes which 
are then interpreted, and it just happens that the codes are (usually) 16 bits, 
giving a range of 0..65535, but it uses 65535 to represent no limit and doesn't 
warn if you actually write 65535.

There's an alternative regex implementation here:

http://pypi.python.org/pypi/regex

--

___
Python tracker 
<http://bugs.python.org/issue13169>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13592] repr(regex) doesn't include actual regex

2011-12-13 Thread Matthew Barnett


Matthew Barnett  added the comment:

In reply to Ezio, the repr of a large string, list, tuple or dict is also long.

The repr of a compiled regex should probably also show the flags, but should it 
just be the numeric value?

--

___
Python tracker 
<http://bugs.python.org/issue13592>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13592] repr(regex) doesn't include actual regex

2011-12-13 Thread Matthew Barnett


Matthew Barnett  added the comment:

Actually, one possibility that occurs to me is to provide the flags within the 
pattern. The .pattern attribute gives the original pattern, but repr could give 
the flags in-line at the start of the pattern:

>>> # Assuming Python 3.
>>> r = re.compile("a", re.I)
>>> r.flags
34
>>> r.pattern
'a'
>>> repr(r)
"<_sre.SRE_Pattern '(?i)a'>"

I'm not sure how to make it eval-able, unless you mean something more like:

>>> repr(r)
"re.Regex('(?i)a')"

where re.Regex == re.compile, which would be more meaningful than:

>>> repr(r)
"re.compile('(?i)a')"

--

___
Python tracker 
<http://bugs.python.org/issue13592>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13592] repr(regex) doesn't include actual regex

2011-12-22 Thread Matthew Barnett


Matthew Barnett  added the comment:

I'm just adding this to the regex module and I've come up against a possible 
issue. The regex module supports named lists, which could be very big. Should 
the entire contents of those lists also be shown in the repr?They would have to 
be if the repr is to be a eval-able.

--

___
Python tracker 
<http://bugs.python.org/issue13592>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13652] Creating lambda functions in a loop has unexpected results when resolving variables used as arguments

2011-12-22 Thread Matthew Barnett


Matthew Barnett  added the comment:

That's not a bug.

This might help to explain what's going on:

What do (lambda) function closures capture in Python?
http://stackoverflow.com/questions/2295290/what-do-lambda-function-closures-capture-in-python

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue13652>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12177] re.match raises MemoryError

2011-05-25 Thread Matthew Barnett


Matthew Barnett  added the comment:

This also raises MemoryError:

re.match(r'()*?1', 'a1')

but none of these do:

re.match(r'()+1', 'a1')
re.match(r'()*1', 'a1')

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue12177>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-07-11 Thread Matthew Barnett


Matthew Barnett  added the comment:

The new regex imlementation is hosted here: 
https://code.google.com/p/mrab-regex-hg/

The span of m['a_thing'] is m.span('a_thing'), if that helps.

The named groups are listed on the pattern object, which can be accessed via 
m.re:

>>> m.re
<_regex.Pattern object at 0x0161DE30>
>>> m.re.groupindex
{'another_thing': 3, 'a_thing': 1}

so you can use that to create a reverse dict to go from the index to the name 
or None. (Perhaps the pattern object should have such a .group_name attribute.)

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12671] urlopen returning empty string

2011-07-31 Thread Matthew Barnett


New submission from Matthew Barnett :

Someone over at StackOverflow had a problem with urlopen in Python 3.2.1:


http://stackoverflow.com/questions/6892573/problem-with-urlopen/6892843#6892843

This is the code:

from urllib.request import urlopen
f = 
urlopen('http://online.wsj.com/mdc/public/page/2_3020-tips.html?mod=topnav_2_3000')
page = f.read()
f.close()

With Python 3.1 and Python 3.2 it works OK, but with Python 3.2.1 the
read returns an empty string.

--
components: Library (Lib)
messages: 141481
nosy: mrabarnett
priority: normal
severity: normal
status: open
title: urlopen returning empty string
type: behavior
versions: Python 3.2

___
Python tracker 
<http://bugs.python.org/issue12671>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12671] urlopen returning empty string

2011-07-31 Thread Matthew Barnett


Matthew Barnett  added the comment:

Just been told this bug has already been reported as issue #12576.

--
resolution:  -> duplicate

___
Python tracker 
<http://bugs.python.org/issue12671>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12671] urlopen returning empty string

2011-07-31 Thread Matthew Barnett


Changes by Matthew Barnett :


--
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue12671>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12728] Python re lib fails case insensitive matches on Unicode data

2011-08-12 Thread Matthew Barnett


Changes by Matthew Barnett :


--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue12728>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-12 Thread Matthew Barnett


Changes by Matthew Barnett :


--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue12729>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12730] Python's casemapping functions are untrustworthy due to narrow/wide build issues

2011-08-12 Thread Matthew Barnett


Changes by Matthew Barnett :


--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue12730>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-12 Thread Matthew Barnett


Changes by Matthew Barnett :


--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue12731>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12732] Can't portably use Unicode in Python identifiers

2011-08-12 Thread Matthew Barnett


Changes by Matthew Barnett :


--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue12732>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12733] Request for grapheme support in Python re lib

2011-08-12 Thread Matthew Barnett


Changes by Matthew Barnett :


--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue12733>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12734] Request for property support in Python re lib

2011-08-12 Thread Matthew Barnett


Changes by Matthew Barnett :


--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue12734>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12735] request full Unicode collation support in std python library

2011-08-12 Thread Matthew Barnett


Changes by Matthew Barnett :


--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue12735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-12 Thread Matthew Barnett


Changes by Matthew Barnett :


--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue12736>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-12 Thread Matthew Barnett


Matthew Barnett  added the comment:

In a narrow build, a codepoint in the astral plane is encoded as surrogate pair.

I could implement a workaround for it in the regex module, but I think that the 
proper place to fix it is in the language as a whole, perhaps by implementing 
PEP 393 ("Flexible String Representation").

--

___
Python tracker 
<http://bugs.python.org/issue12729>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-13 Thread Matthew Barnett


Matthew Barnett  added the comment:

There are occasions when you want to do string slicing, often of the form:

pos = my_str.index(x)
endpos = my_str.index(y)
substring = my_str[pos : endpos]

To me that suggests that if UTF-8 is used then it may be worth profiling to see 
whether caching the last 2 positions would be beneficial.

--

___
Python tracker 
<http://bugs.python.org/issue12729>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-13 Thread Matthew Barnett


Matthew Barnett  added the comment:

You're right about starting the second search from where the first finished. 
Caching the position would be an advantage there.

The memory cost of extra pointers wouldn't be so bad if UTF-8 took less space 
than the current format.

Regex isn't used as much as in Perl. BTW, the current re module was introduced 
in Python 1.5, the previous regex and regsub modules being removed in Python 
2.5.

--

___
Python tracker 
<http://bugs.python.org/issue12729>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Matthew Barnett


Matthew Barnett  added the comment:

On a narrow build, "\N{MATHEMATICAL SCRIPT CAPITAL A}" is stored as 2 code 
units, and neither re nor regex recombine them when compiling a regex or 
looking for a match.

regex supports \xNN, \u and \U and \N{XYZ} itself, so they can be 
used in a raw string literal, but it doesn't recombine code units.

I could add recombination to regex at some point if time has passed and no 
further progress has been made in the language's support for Unicode.

--

___
Python tracker 
<http://bugs.python.org/issue12749>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Matthew Barnett


Matthew Barnett  added the comment:

Have a look here: http://98.245.80.27/tcpc/OSCON2011/gbu/index.html

--

___
Python tracker 
<http://bugs.python.org/issue12729>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-15 Thread Matthew Barnett


Matthew Barnett  added the comment:

For what it's worth, I've had idea about string storage, roughly based on how 
*nix stores data on disk.

If a string is small, point to a block of codepoints.

If a string is medium-sized, point to a block of pointers to codepoint blocks.

If a string is large, point to a block of pointers to pointer blocks.

This means that a large string doesn't need a single large allocation.

The level of indirection can be increased as necessary.

For simplicity, all codepoint blocks contain the same number of codepoints, 
except the final codepoint block, which may contain fewer.

A codepoint block may use the minimum width necessary (1, 2 or 4 bytes) to 
store all of its codepoints.

This means that there are no surrogates and that different sections of the 
string can be stored in different widths to reduce memory usage.

--

___
Python tracker 
<http://bugs.python.org/issue12729>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12753] \N{...} neglects formal aliases and named sequences from Unicode charnames namespace

2011-08-19 Thread Matthew Barnett


Matthew Barnett  added the comment:

For the "Line_Break" property, one of the possible values is "Inseparable", 
with 2 permitted aliases, the shorter "IN" (which is reasonable) and 
"Inseperable" (ouch!).

--

___
Python tracker 
<http://bugs.python.org/issue12753>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12789] re.Scanner don't support more then 2 groups on regex

2011-08-20 Thread Matthew Barnett


Matthew Barnett  added the comment:

Even if this bug is fixed, it still won't work as you expect, and this s why.

The Scanner function accepts a list of 2-tuples. The first item of the tuple is 
a regex and the second is a function. For example:

re.Scanner([(r"\d+", number), (r"\w+", word)])

The Scanner function then builds a regex, using the given regexes as 
alternatives, each wrapped as a capture group:

r"(\d+)|(\w+)"

When matching, it sees which group captured and uses that to decide which 
function it should call, so, for example, if group 1 matched, it calls 
"number", and if group 2 matched, it calls "word".

When you introduce capture groups into the regexes, it gets confused. If your 
regex matches, it'll see that groups 1 and 2 match, so it'll try to call the 
second function, but there's isn't one...

--

___
Python tracker 
<http://bugs.python.org/issue12789>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Matthew Barnett


Matthew Barnett  added the comment:

There are some oddities in Unicode case-folding.

Under full case-folding, both "\N{LATIN CAPITAL LETTER SHARP S}" and "\N{LATIN 
SMALL LETTER SHARP S}" fold to "ss", which means that those codepoints match 
each other.

However, under simple case-folding, they fold to themselves, which means that 
those codepoints _don't_ match each other.

--

___
Python tracker 
<http://bugs.python.org/issue12736>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-28 Thread Matthew Barnett


Matthew Barnett  added the comment:

The regex module currently uses simple case-folding, although I'm working 
towards full case-folding, as listed in 
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt.

--

___
Python tracker 
<http://bugs.python.org/issue12736>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Adding a new regex module (compatible with re)

2011-09-01 Thread Matthew Barnett


Matthew Barnett  added the comment:

The regex module supports nested sets and set operations, eg. 
r"[[a-z]--[aeiou]]" (the letters from 'a' to 'z', except the vowels). This 
means that literal '[' in a set needs to be escaped.

For example, re module sees "[][()]..." as:

[  start of set
 ] literal ']'
 [()   literals '[', '(', ')'
]  end of set
...   ...

but the regex module sees it as:

[  start of set
 ] literal ']'
 [()]  nested set [()]
 ...   ...

Thus:

>>> s = u'void foo ( type arg1 [, type arg2 ] )'
>>> regex.sub(r'(?<=[][()]) |(?!,) (?!\[,)(?=[][(),])', '', s)
u'void foo ( type arg1 [, type arg2 ] )'
>>> regex.sub('(?<=[]\[()]) |(?!,) (?!\[,)(?=[]\[(),])', '', s)
u'void foo(type arg1 [, type arg2])'

If it can't parse it as a nested set, it tries again as a non-nested set (like 
re), but there are bound to be regexes where it could be either.

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Adding a new regex module (compatible with re)

2011-09-01 Thread Matthew Barnett


Matthew Barnett  added the comment:

I think I need a show of hands.

Should the default be old behaviour (like re) or new behaviour? (It might be 
old now, new later.)

Should there be a NEW flag (as at present), or an OLD flag, or a VERSION 
parameter (0=old, 1=new, 2=?)?

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Adding a new regex module (compatible with re)

2011-09-02 Thread Matthew Barnett


Matthew Barnett  added the comment:

The least disruptive change would be to have a NEW flag for the new behaviour, 
as at present, and an OLD flag for the old behaviour.

Currently the default is old behaviour, but in the future it will be new 
behaviour.

The differences would be:

Old behaviour   : New behaviour
- -
Global inline flags : Positional inline flags
Can't split on zero-width match : Can split on zero-width match
Simple sets : Nested sets and set operations

The only change would be that nested sets wouldn't be supported in the old 
behaviour.

There are also additional escape sequences, eg \X is no longer treated as "X", 
but as they look like escape sequences you really shouldn't be relying on that. 
(It's similar to writing Windows paths in non-raw string literals: "\T" == 
"\\T", but "\t" == chr(9).)

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Adding a new regex module (compatible with re)

2011-09-02 Thread Matthew Barnett


Matthew Barnett  added the comment:

So, VERSION0 and VERSION1, with "(?V0)" and "(?V1)" in the pattern?

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7951] Should str.format allow negative indexes when used for getitem access?

2010-08-11 Thread Matthew Barnett


Matthew Barnett  added the comment:

I agree with Kamil and Germán. I would've expected negative indexes for 
sequences to work. Negative indexes for fields is a different matter.

--

___
Python tracker 
<http://bugs.python.org/issue7951>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-08-14 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20100814.zip is a new version of the regex module.

I've added default Unicode word boundaries and renamed the Pattern and Match 
classes.

Over to you, Alex. :-)

--
Added file: http://bugs.python.org/file18532/issue2636-20100814.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7255] "Default" word boundaries for Unicode data?

2010-08-14 Thread Matthew Barnett


Matthew Barnett  added the comment:

These have been added to the new 'regex' module. See issue #2636 or PyPI at:

http://pypi.python.org/pypi/regex

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue7255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7255] "Default" word boundaries for Unicode data?

2010-08-15 Thread Matthew Barnett


Matthew Barnett  added the comment:

If you're on Windows (x86, 32-bit) then compilation isn't necessary - just use 
the appropriate _regex.pyd.

--

___
Python tracker 
<http://bugs.python.org/issue7255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-08-15 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20100816.zip is a new version of the regex module.

Unfortunately I came across a bug in the handing of sets. More unit tests added.

--
Added file: http://bugs.python.org/file18541/issue2636-20100816.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-08-23 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20100824.zip is a new version of the regex module.

More speedups. Getting towards Perl speed now, depending on the regex. :-)

--
Added file: http://bugs.python.org/file18621/issue2636-20100824.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-11 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20100912.zip is a new version of the regex module.

More speedups. I've been comparing the speed against Perl wherever possible. In 
some cases Perl is lightning fast, probably because regex is built into the 
language and it doesn't have to parse method arguments (for some short regexes 
a large part of the processing time is spent in PyArg_ParseTupleAndKeywords!). 
In other cases, where it has to use Unicode codepoints outside the 8-bit range, 
or character properties such as \p{Alpha}, its performance is simply appalling! 
:-)

--
Added file: http://bugs.python.org/file18854/issue2636-20100912.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-12 Thread Matthew Barnett


Matthew Barnett  added the comment:

Another flag? Hmm.

How about this instead: if a scoped flag appears at the end of a regex (and 
would therefore normally have no effect) then it's treated as though it's at 
the start of the regex. Thus:

foo(?i)

is treated like:

(?i)foo

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-12 Thread Matthew Barnett


Matthew Barnett  added the comment:

The tests for re include these regexes:

a.b(?s)
a.*(?s)b

I understand what Georg said previously about some people preferring to put 
them at the end, but I personally wouldn't do that because some regex 
implementations support scoped inline flags, although others, like re, don't.

I think that second regex is a bit perverse, though! :-)

On the other matter, I could make the Unicode script and block available 
through a couple of functions if you need them, eg:

# Using Python 3 here
>>> regex.script("A")
'Latin'
>>> regex.block("A")
'BasicLatin'

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-12 Thread Matthew Barnett


Matthew Barnett  added the comment:

OK, so would it be OK if there was, say, a NEW (N) flag which made the inline 
flags (?flags) scoped and allowed splitting on zero-width matches?

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-12 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20100913.zip is a new version of the regex module.

I've removed the ZEROWIDTH flag and added the NEW flag, which turns on the new 
behaviour such as splitting on zero-width matches and positional flags. If the 
NEW flag isn't turned on then the inline flags are global, like in the re 
module.

You were right about those bugs in the regex module, Vlastimil. :-(

I've left the permissiveness of the sets in, at least for the moment, or until 
someone complains about it!

Incidentally:

>>> re.findall(r"[\B]", "aBc")
[]
>>> re.findall(r"[\c]", "aBc")
['c']

so it is a bug in the re module (it's putting a non-word-boundary in a set).

--
Added file: http://bugs.python.org/file18865/issue2636-20100913.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1708652] Exact matching

2010-09-17 Thread Matthew Barnett


Matthew Barnett  added the comment:

Does this request still stand? If so then I'll add it to the new regex module.

--

___
Python tracker 
<http://bugs.python.org/issue1708652>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-17 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20100918.zip is a new version of the regex module.

I've added 'pos' and 'endpos' arguments to regex.sub and regex.subn and 
refactored a little.

I can't think of any other features that need to be added or see any more speed 
improvements.

Have I missed anything important? :-)

--
Added file: http://bugs.python.org/file18913/issue2636-20100918.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1708652] Exact matching

2010-09-18 Thread Matthew Barnett


Matthew Barnett  added the comment:

'$' matches at the end of the string or at a newline at the end of a string (if 
multiline mode isn't turned on). '\Z' matches only at the end of the string.

If not even the OP is convinced of the need, then I have no objection to 
closing.

--

___
Python tracker 
<http://bugs.python.org/issue1708652>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2027] Module containing C implementations of common text algorithms

2010-09-20 Thread Matthew Barnett


Matthew Barnett  added the comment:

I've started on a module called 'texttools'. So far it has Levenshtein and 
Porter (both coded in C).

If there's interest I'll put it on PyPI.

Suggestions for other additions?

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue2027>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-09-21 Thread Matthew Barnett


Matthew Barnett  added the comment:

I use Python 3, where len("\U00010337") == 2 on a narrow build.

Yes, wide Unicode on a narrow build is a problem:

>>> regex.findall("\\U00010337", "a\U00010337bc")
[]
>>> regex.findall("(?i)\\U00010337", "a\U00010337bc")
[]

I'm not sure how (or whether!) to handle surrogate pairs. It _would_ make 
things more complicated.

I suppose the moral is that if you want to use wide Unicode then you really 
should use a wide build.

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-08 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101009.zip is a new version of the regex module.

It appears from a posting in python-list and a closer look at the docs that 
string positions in the 're' module are limited to 32 bits, even on 64-bit 
builds. I think it's because of things like:

Py_BuildValue("i", ...)

where 'i' indicates the size of a C int, which, at least in Windows compilers, 
is 32-bits in both 32-bit and 64-bit builds.

The regex module shared the same problem. I've changed such code to:

Py_BuildValue("n", ...)

and so forth, which indicates Py_ssize_t.

Unfortunately I'm not able to confirm myself that this will fix the problem on 
64 bits.

--
Added file: http://bugs.python.org/file19168/issue2636-20101009.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-14 Thread Matthew Barnett


Matthew Barnett  added the comment:

I am not able to build or test a 64-bit version. The update was to the source 
files to ensure that if it is compiled for 64 bits then the string positions 
will also be 64-bit.

This change was prompted by a poster who tried to use the re module of a 64-bit 
Python build on a 30GB memmapped file but found that the string positions were 
still limited to 32 bits.

It looked like a 64-bit build of the regex module would have the same 
limitation.

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-29 Thread Matthew Barnett


Matthew Barnett  added the comment:

That's a bug. I'll fix it as soon has I've reinstalled the SDK. 

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-29 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101029.zip is a new version of the regex module.

I've also added to the unit tests.

--
Added file: http://bugs.python.org/file19419/issue2636-20101029.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-29 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101030.zip is a new version of the regex module.

I've also added yet more to the unit tests.

--
Added file: http://bugs.python.org/file19422/issue2636-20101030.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-10-30 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101030a.zip is a new version of the regex module.

This bug was a bit more difficult to fix, but I think it's OK now!

--
Added file: http://bugs.python.org/file19435/issue2636-20101030a.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-01 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101101.zip is a new version of the regex module.

I hope it's finally fixed this time! :-)

--
Added file: http://bugs.python.org/file19456/issue2636-20101101.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-01 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101102.zip is a new version of the regex module.

--
Added file: http://bugs.python.org/file19460/issue2636-20101102.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-02 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101102a.zip is a new version of the regex module.

msg120204 relates to issue #1519638 "Unmatched group in replacement". In 
'regex' an unmatched group is treated as an empty string in a replacement 
template. This behaviour is more in keeping with regex implementations in other 
languages.

msg120206 was caused by not all group references being made case-insensitive 
when they should be.

--
Added file: http://bugs.python.org/file19469/issue2636-20101102a.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10328] re.sub[n] doesn't seem to handle /Z replacements correctly in all cases

2010-11-05 Thread Matthew Barnett


Matthew Barnett  added the comment:

It's a bug caused by trying to avoid getting stuck when a zero-width match is 
found. Basically the fix is to advance one character after a zero-width match, 
but that doesn't always give the correct result.

There are a number of related issues like issue #1647489 ("zero-length match 
confuses re.finditer()").

--

___
Python tracker 
<http://bugs.python.org/issue10328>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-05 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101106.zip is a new version of the regex module.

Fix for issue 10328, which regex also shared.

--
Added file: http://bugs.python.org/file19514/issue2636-20101106.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-11 Thread Matthew Barnett


Matthew Barnett  added the comment:

It looks like a similar problem to msg116252 and msg116276.

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-13 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101113.zip is a new version of the regex module.

It now supports Unicode 6.0.0.

--
Added file: http://bugs.python.org/file19597/issue2636-20101113.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11307] re engine exhaustively explores more than necessary

2011-02-24 Thread Matthew Barnett


Matthew Barnett  added the comment:

It's a known issue (see issue #1662581, for example).

There's a new implementation at PyPI which doesn't have this problem:

http://pypi.python.org/pypi/regex

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue11307>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-03-14 Thread Matthew Barnett


Matthew Barnett  added the comment:

@Gregory: I've added you to the project.

I'm currently trying to fix a problem with iterators shared across threads. As 
a temporary measure, the current release on PyPI doesn't enable multithreading 
for them.

The mrab-regex-hg project doesn't have those sources yet. I'll update them 
later today, either to the release on PyPI, or to a fixed version if all goes 
well...

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-03-15 Thread Matthew Barnett


Matthew Barnett  added the comment:

I've fixed the problem with iterators for both Python 3 and Python 2. They can 
now be shared safely across threads.

I've updated the release on PyPI.

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6210] Exception Chaining missing method for suppressing context

2011-03-16 Thread Matthew Barnett


Matthew Barnett  added the comment:

I've been looking through the list of current keywords and the best syntax I 
could come up with for suppressing the context is:

try:
x / y
except ZeroDivisionError as e:
raise as Exception( 'Invalid value for y' )

The rationale is that it's saying "forget about the original exception (if 
any), raise _as though_ this is the original exception".

--

___
Python tracker 
<http://bugs.python.org/issue6210>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11665] Regexp findall freezes

2011-03-25 Thread Matthew Barnett


Matthew Barnett  added the comment:

Alex is correct.

This part:

[^<>]*

can match an empty string, and it's nested with a repeated group. It stalls, 
repeatedly matching an empty string.

Incidentally, my regex implementation (available on PyPI) returns [].

--

___
Python tracker 
<http://bugs.python.org/issue11665>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11733] Implement a `Counter.elements_count` method

2011-03-31 Thread Matthew Barnett


Matthew Barnett  added the comment:

The name isn't meaningful to me. My preference would be for something like 
"total_count".

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue11733>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11775] `bool(Counter({'a': 0})) is True`

2011-04-05 Thread Matthew Barnett


Matthew Barnett  added the comment:

It depends on what kind of object it's like. If it's like a dict then your 
example is clearly not empty, but if it's like a set then it /is/ empty, in 
which case it's empty if:

all(count == 0 for count in my_counter.values())

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue11775>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11947] re.IGNORECASE does not match literal "_" (underscore)

2011-04-28 Thread Matthew Barnett


Matthew Barnett  added the comment:

help(re.sub) says:

sub(pattern, repl, string, count=0)

and re.IGNORECASE has a value of 2.

Therefore this:

re.sub("_", "X", subject, re.IGNORECASE)

is telling it to replace at most 2 occurrences of "_".

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue11947>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11947] re.IGNORECASE does not match literal "_" (underscore)

2011-04-28 Thread Matthew Barnett


Matthew Barnett  added the comment:

I don't know how much code that might break. It might not be that much; I can't 
remember when I last used re.sub without the default count.

--

___
Python tracker 
<http://bugs.python.org/issue11947>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11957] re.sub confusion between count and flags args

2011-05-06 Thread Matthew Barnett


Matthew Barnett  added the comment:

Something like "" may be more Pythonic.

--

___
Python tracker 
<http://bugs.python.org/issue11957>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12078] re.sub() replaces only several matches

2011-05-14 Thread Matthew Barnett


Matthew Barnett  added the comment:

Argument 4 of re.sub is the maximum number of replacements, NOT flags:

Help on function sub in module re:

sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl.  repl can be either a string or a callable;
if a string, backslash escapes in it are processed.  If it is
a callable, it's passed the match object and must return
a replacement string to be used.

re.I is 2, so you're telling it to perform at most 2 replacements.

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue12078>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12130] regex 0.1.20110514 findall overlapped not working with 'start of string' expression

2011-05-20 Thread Matthew Barnett


Matthew Barnett  added the comment:

Replied to the regex bug tracker.

--

___
Python tracker 
<http://bugs.python.org/issue12130>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7132] Regexp: capturing groups in repetitions

2010-11-18 Thread Matthew Barnett


Matthew Barnett  added the comment:

Earlier this week I discovered that .Net supports repeated capture and its API 
suggested a much cleaner approach than what Perl offered, so I'll be adding it 
to the regex module at:

http://pypi.python.org/pypi/regex

The new methods will follow the example of .group() & co.

Given a match object m, m.group(i) returns the last match of group i (or None 
if there's no match), so I'll be adding m.captures(i) to return a tuple of the 
captures (an empty tuple if there's no match). I'll also be adding m.starts(i), 
m.ends(i) and m.spans(i).

The issue for this work is #2636.

Units tests are welcome.

--

___
Python tracker 
<http://bugs.python.org/issue7132>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-19 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101120.zip is a new version of the regex module.

The match object now supports additional methods which return information on 
all the successful matches of a repeated capture group.

The API was inspired by that of .Net:

matchobject.captures([group1, ...])

Returns a tuple of the strings matched in a group or groups. Compare 
with matchobject.group([group1, ...]).

matchobject.starts([group])

Returns a tuple of the start positions. Compare with 
matchobject.start([group]).

matchobject.ends([group])

Returns a tuple of the end positions. Compare with 
matchobject.end([group]).

matchobject.spans([group])

Returns a tuple of the spans. Compare with matchobject.span([group]).

--
Added file: http://bugs.python.org/file19651/issue2636-20101120.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-20 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101121.zip is a new version of the regex module.

The captures didn't work properly with lookarounds or atomic groups.

--
Added file: http://bugs.python.org/file19723/issue2636-20101121.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1859] textwrap doesn't linebreak on "\n"

2010-11-22 Thread Matthew Barnett


Matthew Barnett  added the comment:

I'd be interested in having a go if I knew what the desired behaviour was, ie 
unit tests to confirm what was 'correct'.

How should it handle line breaks? Should it treat them like any other 
whitespace as at present, should it honour them, or should it get another 
option, eg 'honor_breaks' (if US spelling is the standard for Python's 
libraries)?

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue1859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-23 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101123.zip is a new version of the regex module.

Oops, sorry, the weird behaviour of msg11 was a bug. :-(

--
Added file: http://bugs.python.org/file19786/issue2636-20101123.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1859] textwrap doesn't linebreak on "\n"

2010-11-23 Thread Matthew Barnett


Matthew Barnett  added the comment:

textwrap_2010-11-23.diff is my attempt to provide a fix, if it's wanted/needed.

--
Added file: http://bugs.python.org/file19791/textwrap_2010-11-23.diff

___
Python tracker 
<http://bugs.python.org/issue1859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10532] A bug related to matching the empty string

2010-11-25 Thread Matthew Barnett


Matthew Barnett  added the comment:

The spans say this:

>>> for m in re.finditer('((.d.)*)*', 'adb'):
print(m.span())


(0, 3)
(3, 3)

There's an non-empty match followed by an empty match.

IHMO, not a bug.

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue10532>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2650] re.escape should not escape underscore

2010-11-25 Thread Matthew Barnett


Matthew Barnett  added the comment:

Re the regex module (issue #2636), would a good compromise be:

regex.escape(user_input, special_only=True)

to maintain compatibility?

--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue2650>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-11-29 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101130.zip is a new version of the regex module.

Added 'special_only' keyword parameter (default False) to regex.escape. When 
True, regex.escape escapes only 'special' characters, such as '?'.

--
Added file: http://bugs.python.org/file19881/issue2636-20101130.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-06 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101207.zip is a new version of the regex module.

It includes additional checks against pathological regexes.

--
Added file: http://bugs.python.org/file19965/issue2636-20101207.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-10 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101210.zip is a new version of the regex module.

I've extended the additional checks of the previous version.

It has been tested with Python 2.5 to Python 3.2b1.

--
Added file: http://bugs.python.org/file20001/issue2636-20101210.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10704] Regex 0.1.20101210 Python 3.1 install problem Mac OS X 10.6.5

2010-12-14 Thread Matthew Barnett


Matthew Barnett  added the comment:

I use Windows XP, so I can't help with MacOS X.

>From the error log it looks like it doesn't like the sources for Python either!

--

___
Python tracker 
<http://bugs.python.org/issue10704>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10703] Regex 0.1.20101210

2010-12-14 Thread Matthew Barnett


Matthew Barnett  added the comment:

The regex module is intended to replace the re module, so its default behaviour 
is the same: in Python 2, regexes default to matching ASCII, and in Python 3, 
they default to matching Unicode.

If you want to use a regex on a Unicode string in Python 2 then you need to set 
the Unicode flag, either by providing the UNICODE flag or by putting "(?u)" in 
the regex itself.

--

___
Python tracker 
<http://bugs.python.org/issue10703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-23 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101224.zip is a new version of the regex module.

Case-insensitive matching is now faster.

The matching functions and methods now accept a keyword argument to release the 
GIL during matching to enable other Python threads to run concurrently:

matches = regex.findall(pattern, string, concurrent=True)

This should be used only when it's guaranteed that the string won't change 
during matching.

The GIL is always released when working on instances of the builtin (immutable) 
string classes because that's known to be safe.

--
Added file: http://bugs.python.org/file20154/issue2636-20101224.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-24 Thread Matthew Barnett


Matthew Barnett  added the comment:

I've been trying to push the history to Launchpad, completely without success; 
it just won't authenticate (no such account, even though I can log in!).

I doubt that the history would be much use to you anyway.

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-24 Thread Matthew Barnett


Matthew Barnett  added the comment:

It does have an SSH key. It's probably something simple that I'm missing.

I think that the only change I'm likely to make is to a support script I use; 
it currently uses hard-coded paths, etc, to do its magic. :-)

--

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6210] Exception Chaining missing method for suppressing context

2010-12-27 Thread Matthew Barnett


Changes by Matthew Barnett :


--
nosy: +mrabarnett

___
Python tracker 
<http://bugs.python.org/issue6210>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-27 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101228.zip is a new version of the regex module.

Sorry for the delay, the fix took me a bit longer than I expected. :-)

--
Added file: http://bugs.python.org/file20176/issue2636-20101228.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6210] Exception Chaining missing method for suppressing context

2010-12-27 Thread Matthew Barnett


Matthew Barnett  added the comment:

Regarding syntax, I'm undecided between:

raise with new_exception

and:

raise new_exception with caught_exception

I think that the second form is clearer:

try:
...
exception SomeException as ex:
raise SomeOtherException() with ex

(I'd prefer 'with' to Steven's 'from') but the first form doesn't force you to 
provide a name:

try:
...
exception SomeException:
raise with SomeOtherException()

and the syntax also means that you can't chain another exception like this:

try:
...
exception SomeException as ex:
raise SomeOtherException() with YetAnotherException()

although perhaps Python should just rely on the programmer's good judgement. :-)

--

___
Python tracker 
<http://bugs.python.org/issue6210>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-28 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101228a.zip is a new version of the regex module.

It now compiles the pattern quickly.

--
Added file: http://bugs.python.org/file20182/issue2636-20101228a.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-28 Thread Matthew Barnett


Matthew Barnett  added the comment:

issue2636-20101229.zip is a new version of the regex module.

It now compiles the pattern quickly.

--
Added file: http://bugs.python.org/file20185/issue2636-20101229.zip

___
Python tracker 
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

1 2 3 4 5 6 >

1 - 100 of 541 matches

Mail list logo