[Python-Dev] IDLE colorizer

2018-04-01 Thread MRAB
A thread on python-ideas is talking about the prefixes of string 
literals, and the regex used in IDLE.


Line 25 of Lib\idlelib\colorizer.py is:

stringprefix = r"(?i:\br|u|f|fr|rf|b|br|rb)?"

which looks slightly wrong to me.

The \b will apply only to the first choice.

Shouldn't it be more like:

stringprefix = r"(?:\b(?i:r|u|f|fr|rf|b|br|rb))?"

?
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] IDLE colorizer

2018-04-01 Thread Tim Peters
[MRAB [
> A thread on python-ideas is talking about the prefixes of string literals,
> and the regex used in IDLE.
>
> Line 25 of Lib\idlelib\colorizer.py is:
>
> stringprefix = r"(?i:\br|u|f|fr|rf|b|br|rb)?"
>
> which looks slightly wrong to me.
>
> The \b will apply only to the first choice.
>
> Shouldn't it be more like:
>
> stringprefix = r"(?:\b(?i:r|u|f|fr|rf|b|br|rb))?"
>
> ?

I believe the change would capture its real intent.  It doesn't seem
to matter a whole lot, though - IDLE isn't a syntax checker, and
applies heuristics to color on the fly based on best guesses.  As is,
if you type this fragment into an IDLE shell:

kr"sdf"

only the last 5 characters get "string colored", presumably because of
the leading \br in the original regexp.  But if you type in

ku"sdf"

the last 6 characters get "string colored", because - as you pointed
out - the \b part of the original regexp has no effect on anything
other than the r following \b.

But in neither case is the fragment legit Python.  If you do type in
legit Python, it makes no difference (legit string literals always
start at a word boundary, regardless of whether the regexp checks for
that).
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] IDLE colorizer

2018-04-01 Thread Terry Reedy

On 4/1/2018 10:20 PM, Tim Peters wrote:

[MRAB [

A thread on python-ideas is talking about the prefixes of string literals,
and the regex used in IDLE.

Line 25 of Lib\idlelib\colorizer.py is:

 stringprefix = r"(?i:\br|u|f|fr|rf|b|br|rb)?"

which looks slightly wrong to me.


This must be a holdover from years ago, before I was involved.  I have 
wondered about it but left it as is.  Thanks for confirming that it is 
not right.



The \b will apply only to the first choice.

Shouldn't it be more like:

 stringprefix = r"(?:\b(?i:r|u|f|fr|rf|b|br|rb))?"

?


See below.


I believe the change would capture its real intent.  It doesn't seem
to matter a whole lot, though - IDLE isn't a syntax checker, and
applies heuristics to color on the fly based on best guesses.  As is,
if you type this fragment into an IDLE shell:

kr"sdf"

only the last 5 characters get "string colored", presumably because of
the leading \br in the original regexp.  But if you type in

ku"sdf"

the last 6 characters get "string colored", because - as you pointed
out - the \b part of the original regexp has no effect on anything
other than the r following \b.


I tested with uf versus ur, which are both plausibly legal but are not.


But in neither case is the fragment legit Python.  If you do type in
legit Python, it makes no difference (legit string literals always
start at a word boundary, regardless of whether the regexp checks for
that).


I want uniform behavior.  I decided to drop the \b because I prefer 
coloring the maximal legal string rather than the minimum.  I think the 
contrast between two chars legal by themselves, but differently colored 
when put together, makes the bug more obvious.


https://bugs.python.org/issue33204

--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] IDLE colorizer

2018-04-01 Thread Guido van Rossum
My question for you: how on earth did you find this?! Speaking of a needle
in a haystack. Did you run some kind of analysis program that looks for
regexprs? (We've received some good reports from someone who did that
looking for possible DoS attacks.)

On Sun, Apr 1, 2018 at 6:49 PM, MRAB  wrote:

> A thread on python-ideas is talking about the prefixes of string literals,
> and the regex used in IDLE.
>
> Line 25 of Lib\idlelib\colorizer.py is:
>
> stringprefix = r"(?i:\br|u|f|fr|rf|b|br|rb)?"
>
> which looks slightly wrong to me.
>
> The \b will apply only to the first choice.
>
> Shouldn't it be more like:
>
> stringprefix = r"(?:\b(?i:r|u|f|fr|rf|b|br|rb))?"
>
> ?
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%
> 40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Nuking wstr [Re: How can we use 48bit pointer safely?]

2018-04-01 Thread INADA Naoki
Some of APIs are stated as "Deprecated since version 3.3, will be
removed in version 4.0:".

e.g. https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AS_UNICODE

So we will remove them (and wstr) at Python 4.0.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Nuking wstr [Re: How can we use 48bit pointer safely?]

2018-04-01 Thread INADA Naoki
>
> Of course, the question is whether all this matters.  Is it important
> to save 8 bytes on each unicode object?  Only testing would tell.
>

Last year, I tried to profile memory usage of web application in my company.

https://gist.github.com/methane/ce723adb9a4d32d32dc7525b738d3c31#investigating-overall-memory-usage

Without -OO option, str is the most memory eater and average size is
about 109bytes.
(Note: SQLAlchemy uses docstring very heavily).

With -OO option, str is the third memory eater, and average size was
about 73bytes.

So I think 8bytes for each string object is not negligible.

But, of course, it's vary according to applications and libraries.

-- 
INADA Naoki  
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com