from:"brent s."

[issue9253] argparse: optional subparsers

2020-04-03 Thread brent s.



Change by brent s. :


--
nosy: +bsaner

___
Python tracker 
<https://bugs.python.org/issue9253>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37594] re does not honor matching trailing multiple periods

2019-07-14 Thread brent s.



New submission from brent s. :

(Sorry for the title; not quite sure how to summarize this)

SO! Have I got an interesting one for you.

ISSUE:
In release 3.7.3 (and possibly later), the re module, if one has a string e.g. 
'a.b.', a pattern such as '\.*$' will successfully *match* any number of 
multiple trailing periods. HOWEVER, when attempting to substitute those with 
actual character(s), it chokes. See attached poc.py

NOTES:
- This *is a regression* from 2.6.6, 2.7.16, and 3.6.7 (other releases were not 
tested). This behaviour does not occur on those versions.

--
components: Library (Lib)
files: example.py
messages: 347933
nosy: bsaner
priority: normal
severity: normal
status: open
title: re does not honor matching trailing multiple periods
versions: Python 3.7
Added file: https://bugs.python.org/file48481/example.py

___
Python tracker 
<https://bugs.python.org/issue37594>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37594] re does not honor matching trailing multiple periods

2019-07-14 Thread brent s.



brent s.  added the comment:

Sorry- by "chokes", I mean "substitutes in multiple replacements".

--

___
Python tracker 
<https://bugs.python.org/issue37594>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37594] re does not honor matching trailing multiple periods

2019-07-14 Thread brent s.



brent s.  added the comment:

WORKAROUND:

Obviously, str.rstrip('.') still works, but this is of course quite inflexible 
compared to a regex pattern.

--

___
Python tracker 
<https://bugs.python.org/issue37594>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37594] re does not honor matching trailing multiple periods

2019-07-14 Thread brent s.



brent s.  added the comment:

"'\.' is an invalid escape sequence. Could you try it with a raw string?"

Well, a valid regex escape, but right. Point taken. I am under the impression, 
however, that given the value in ptrn (in example.py) is already a string, it 
should be interpreted as a raw string in the re.compile(), no? Because 
otherwise it'd be a dickens of a time getting a regex pattern that's 
dynamic/programmatically assigned to a name, since there's no raw(), str.raw(), 
or str.encode('raw').

They both evaluate to the same, for what it's worth:

>>> repr('\.+$')
"'.+$'"
>>> repr(r'\.+$')
"'.+$'"
>>> ptrn = '\.+$'
>>> repr(ptrn)
"'.+$'"

So.

"Also, it's not really clear to me what you're seeing, vs. what you expect to 
see. For one example that you think is incorrect, could you show what you get 
vs. what you expect to get? And, if that's different on different python 
versions, could you show what each version does?"

The comment from Serhiy clarifies that this was indeed something that was 
changed. You can see the difference pretty easily by just calling the 
example.py between python2 and python3.

--

"This change was intentional and documented. It fixed old bug in the Python 
implementation of RE and removed the discrepancy with other RE engines."

Okay, so I'm not going insane. That's good. Do you have the bug ID it fixes and 
where it's documented? Do you know which other RE engines were doing this? 
Because GNU sed, for instance, does not behave like this - it behaves as the 
"pre-bugfix" behaviour did:

$ echo 'a.b.' | sed -e 's/\.*$/./g'
a.b.
$ echo 'a.b...' | sed -e 's/\.*$/./g'
a.b.
$ echo 'a.b' | sed -e 's/\.*$/./g'
a.b.

"The pattern r'\.*$' matches not only a sequence of dots at the of the line, 
but also an empty string at the end of line. If this is not what you want, use 
r'\.+$'."

Right; it's to guarantee there is one and only one period at the end of a line, 
whether there is no period, one period, or many periods in the original string 
(think e.g. enforcing RFC1025-compatible FQDNs, for instance).

--

___
Python tracker 
<https://bugs.python.org/issue37594>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37594] re does not honor matching trailing multiple periods

2019-07-14 Thread brent s.



brent s.  added the comment:

Oh for pete's sake. I wish I could edit comments.

Eric-

To make it clear:

*

VERSION: 2.7.16 (default, Mar 11 2019, 18:59:25) 
[GCC 8.2.1 20181127]
PATTERN: \.*$

BEFORE: a.b
WITHOUT: a.b
DUMMY: a.bX
AFTER: a.b.
RSTRIP: a.b
==
BEFORE: a.b.
WITHOUT: a.b
DUMMY: a.bX
AFTER: a.b.
RSTRIP: a.b
==
BEFORE: a.b..
WITHOUT: a.b
DUMMY: a.bX
AFTER: a.b.
RSTRIP: a.b
==
BEFORE: a.b...
WITHOUT: a.b
DUMMY: a.bX
AFTER: a.b.
RSTRIP: a.b
==

*

VERSION: 3.7.3 (default, Jun 24 2019, 04:54:02) 
[GCC 9.1.0]
PATTERN: \.*$

BEFORE: a.b
WITHOUT: a.b
DUMMY: a.bX
AFTER: a.b.
RSTRIP: a.b
==
BEFORE: a.b.
WITHOUT: a.b
DUMMY: a.bXX
AFTER: a.b..
RSTRIP: a.b
==
BEFORE: a.b..
WITHOUT: a.b
DUMMY: a.bXX
AFTER: a.b..
RSTRIP: a.b
==
BEFORE: a.b...
WITHOUT: a.b
DUMMY: a.bXX
AFTER: a.b..
RSTRIP: a.b
==


Note the differences between versions for cases a.b., a.b.., and a.b... 
("BEFORE: ..." lines). Compare their "AFTER" and "DUMMY" lines between python2 
and python3.



Serhiy-

Apologies; I meant RFC1035; I typo'd that. But as shown above, the difference 
is pretty distinct (and inconsistent with GNU sed behaviour).

--

___
Python tracker 
<https://bugs.python.org/issue37594>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue33480] Improvement suggestions for urllib.parse.urlparser

2018-05-13 Thread brent s.


New submission from brent s. :

Currently, a parsed urlparse() object looks (roughly) like this:

urlparse('http://example.com/foo;key1=value1?key2=value2#key3=value3#key4=value4')

returns:

ParseResult(scheme='http', netloc='example.com', path='/foo', 
params='key1=value1', query='key2=value2', fragment='key3=value3#key4=value4')

However, I recommend a couple things:

0.) that ParseResult objects support dict emulation. e.g. one can run:

dict(parseresult_obj)

and get (using the example string above (corrected classification for 
RFC2986 compliance and common usage):

{'fragment': [('key4', 'value4')],
 'netloc': 'foo.tld',
 'params': [('key2', 'value2')],
 'path': '/foo',
 'query': [('key3', 'value3')],
 'scheme': 'http'}

Obviously, fragment, params, and query could instead be serialized into a 
nested dict. I'm not sure which is more preferred in the pythonic sense.

1.) Better RFC3986 compliance.
Per RFC3986 § 3 (https://tools.ietf.org/html/rfc3986#section-3), the URL 
can be further split into separate components. For instance, while considered 
deprecated, should "userinfo" (e.g. "http://user:password@...";) be parsed? At 
the very least, the port should be parsed out to a separate component from the 
netloc (or userinfo parsed out separate from netloc) - this will assist in 
parsing host:port combinations in netlocs that contain both userinfo and a 
specified port (and allow the port to be given as an int type, thus more easily 
used in e.g. the socket lib).

2.) If a component is not present, I suggest it be a None object instead of an 
empty string.
e.g.:

urlparse('http://example.com/foo')

Would return:

ParseResult(scheme='http', netloc='example.com', path='/foo', 
params=None, query=None, fragment=None)

instead of

ParseResult(scheme='http', netloc='example.com', path='/foo', 
params='', query='', fragment='')

--
components: Library (Lib)
messages: 316454
nosy: bsaner
priority: normal
severity: normal
status: open
title: Improvement suggestions for urllib.parse.urlparser
type: enhancement

___
Python tracker 
<https://bugs.python.org/issue33480>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9253] argparse: optional subparsers

[issue37594] re does not honor matching trailing multiple periods

[issue37594] re does not honor matching trailing multiple periods

[issue37594] re does not honor matching trailing multiple periods

[issue37594] re does not honor matching trailing multiple periods

[issue37594] re does not honor matching trailing multiple periods

[issue33480] Improvement suggestions for urllib.parse.urlparser

7 matches

Site Navigation

Mail list logo

Footer information