[Python-Dev] IDLE colorizer
A thread on python-ideas is talking about the prefixes of string literals, and the regex used in IDLE. Line 25 of Lib\idlelib\colorizer.py is: stringprefix = r"(?i:\br|u|f|fr|rf|b|br|rb)?" which looks slightly wrong to me. The \b will apply only to the first choice. Shouldn't it be more like: stringprefix = r"(?:\b(?i:r|u|f|fr|rf|b|br|rb))?" ? ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IDLE colorizer
[MRAB [ > A thread on python-ideas is talking about the prefixes of string literals, > and the regex used in IDLE. > > Line 25 of Lib\idlelib\colorizer.py is: > > stringprefix = r"(?i:\br|u|f|fr|rf|b|br|rb)?" > > which looks slightly wrong to me. > > The \b will apply only to the first choice. > > Shouldn't it be more like: > > stringprefix = r"(?:\b(?i:r|u|f|fr|rf|b|br|rb))?" > > ? I believe the change would capture its real intent. It doesn't seem to matter a whole lot, though - IDLE isn't a syntax checker, and applies heuristics to color on the fly based on best guesses. As is, if you type this fragment into an IDLE shell: kr"sdf" only the last 5 characters get "string colored", presumably because of the leading \br in the original regexp. But if you type in ku"sdf" the last 6 characters get "string colored", because - as you pointed out - the \b part of the original regexp has no effect on anything other than the r following \b. But in neither case is the fragment legit Python. If you do type in legit Python, it makes no difference (legit string literals always start at a word boundary, regardless of whether the regexp checks for that). ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IDLE colorizer
On 4/1/2018 10:20 PM, Tim Peters wrote: [MRAB [ A thread on python-ideas is talking about the prefixes of string literals, and the regex used in IDLE. Line 25 of Lib\idlelib\colorizer.py is: stringprefix = r"(?i:\br|u|f|fr|rf|b|br|rb)?" which looks slightly wrong to me. This must be a holdover from years ago, before I was involved. I have wondered about it but left it as is. Thanks for confirming that it is not right. The \b will apply only to the first choice. Shouldn't it be more like: stringprefix = r"(?:\b(?i:r|u|f|fr|rf|b|br|rb))?" ? See below. I believe the change would capture its real intent. It doesn't seem to matter a whole lot, though - IDLE isn't a syntax checker, and applies heuristics to color on the fly based on best guesses. As is, if you type this fragment into an IDLE shell: kr"sdf" only the last 5 characters get "string colored", presumably because of the leading \br in the original regexp. But if you type in ku"sdf" the last 6 characters get "string colored", because - as you pointed out - the \b part of the original regexp has no effect on anything other than the r following \b. I tested with uf versus ur, which are both plausibly legal but are not. But in neither case is the fragment legit Python. If you do type in legit Python, it makes no difference (legit string literals always start at a word boundary, regardless of whether the regexp checks for that). I want uniform behavior. I decided to drop the \b because I prefer coloring the maximal legal string rather than the minimum. I think the contrast between two chars legal by themselves, but differently colored when put together, makes the bug more obvious. https://bugs.python.org/issue33204 -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IDLE colorizer
My question for you: how on earth did you find this?! Speaking of a needle in a haystack. Did you run some kind of analysis program that looks for regexprs? (We've received some good reports from someone who did that looking for possible DoS attacks.) On Sun, Apr 1, 2018 at 6:49 PM, MRAB wrote: > A thread on python-ideas is talking about the prefixes of string literals, > and the regex used in IDLE. > > Line 25 of Lib\idlelib\colorizer.py is: > > stringprefix = r"(?i:\br|u|f|fr|rf|b|br|rb)?" > > which looks slightly wrong to me. > > The \b will apply only to the first choice. > > Shouldn't it be more like: > > stringprefix = r"(?:\b(?i:r|u|f|fr|rf|b|br|rb))?" > > ? > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido% > 40python.org > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Nuking wstr [Re: How can we use 48bit pointer safely?]
Some of APIs are stated as "Deprecated since version 3.3, will be removed in version 4.0:". e.g. https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AS_UNICODE So we will remove them (and wstr) at Python 4.0. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Nuking wstr [Re: How can we use 48bit pointer safely?]
> > Of course, the question is whether all this matters. Is it important > to save 8 bytes on each unicode object? Only testing would tell. > Last year, I tried to profile memory usage of web application in my company. https://gist.github.com/methane/ce723adb9a4d32d32dc7525b738d3c31#investigating-overall-memory-usage Without -OO option, str is the most memory eater and average size is about 109bytes. (Note: SQLAlchemy uses docstring very heavily). With -OO option, str is the third memory eater, and average size was about 73bytes. So I think 8bytes for each string object is not negligible. But, of course, it's vary according to applications and libraries. -- INADA Naoki ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
