[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2019-03-22 Thread Terry J. Reedy
Change by Terry J. Reedy : -- assignee: -> terry.reedy components: +IDLE ___ Python tracker ___ ___ Python-bugs-list mailing list U

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2015-12-15 Thread Tal Einat
Tal Einat added the comment: It turns out that staying with str.translate in PyParse payed off! An optimization in 3.5 (issue21118) has made it much much faster on ASCII-only inputs, which are the vast majority in this case. -- ___ Python tracker <

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-16 Thread Terry J. Reedy
Changes by Terry J. Reedy : -- resolution: -> fixed stage: test needed -> resolved status: open -> closed ___ Python tracker ___ ___

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-16 Thread Tal Einat
Tal Einat added the comment: Fix committed to 3.4 and merged to default. (My first commit!) Not back-porting this to 2.7 despite PEP 434, because support for non-ASCII identifiers only exists in 3.x. Close this issue as fixed! -- ___ Python tracker

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-16 Thread Roundup Robot
Roundup Robot added the comment: New changeset 8b3f7aecdf85 by Tal Einat in branch '3.4': Issue #21765: Add support for non-ascii identifiers to HyperParser http://hg.python.org/cpython/rev/8b3f7aecdf85 New changeset 73a8c614af4d by Tal Einat in branch 'default': Issue #21765: Add support for no

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-16 Thread Martin v . Löwis
Martin v. Löwis added the comment: LGTM -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-15 Thread Tal Einat
Changes by Tal Einat : Added file: http://bugs.python.org/file35960/taleinat.20140716.IDLE_HyperParser_unicode_ids_v2.patch ___ Python tracker ___ __

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-15 Thread Tal Einat
Changes by Tal Einat : Removed file: http://bugs.python.org/file35959/taleinat.20140716.IDLE_HyperParser_unicode_ids.patch ___ Python tracker ___ ___

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-15 Thread Tal Einat
Tal Einat added the comment: I'm attaching a patch which really fixes this issue, along with additional tests for idlelib.HyperParser. I did indeed have to fix PyParse as well. I got it working with re.subn() as Martin suggested, but the performance was much worse (between 100x and 1000x slow

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-11 Thread Terry J. Reedy
Terry J. Reedy added the comment: I did not open another issue yet because I did not want to split the general discussion of parsing 3.x unicode-based python. We might also need another issue for idlelib/PyParse, and that might need to come first. What I think is that Idle should have at 1 def

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-11 Thread Tal Einat
Tal Einat added the comment: If you think ColorDelegator and UndoDelegator should be fixed as well, I'd be happy to take a look, but we should open a separate tracker issue for it. -- ___ Python tracker __

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-10 Thread Terry J. Reedy
Terry J. Reedy added the comment: I just noticed that ColorDelegator has idprog = re.compile(r"\s+(\w+)", re.S) which will recognize unicode 'words', if not exactly Python 'identifiers'. However, UndoDelegator has alphanumeric = string.ascii_letters + string.digits + "_" which is the same

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-07 Thread Tal Einat
Tal Einat added the comment: @Martin: > 1. This issue is only about identifiers. So processing of > string literals is technically out of scope. I added a test with a non-ASCII string literal only for good measure, while I was already adding a test with a non-ASCII identifier. The patch doesn'

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-07 Thread Martin v . Löwis
Martin v. Löwis added the comment: Two observations: 1. This issue is only about identifiers. So processing of string literals is technically out of scope. 2. I'd suggest to replace .translate with regular expressions: py> re.sub('[^(){}\[\]]','','foo(b[a]{r}≠)') '([]{})' I'm sure people inter

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-07-06 Thread Tal Einat
Tal Einat added the comment: Indeed, I seem to have been misinterpreting the grammar, despite taking care and reading it several times. This strengthens my opinion that we should use str.isidentifier() rather than attempt to correctly re-implement just the parts that we need. Attached is a pa

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-22 Thread Martin v . Löwis
Martin v. Löwis added the comment: I think you are misinterpreting the grammar. Your code declares that U+00B2 (SUPERSCRIPT TWO, ²) is an identifier character. Its category is No, so it is actually not. However, its normalization is U+0032 (DIGIT TWO, 2), which is an identifier character - but

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-22 Thread Tal Einat
Tal Einat added the comment: > Note that the proposed patch only manages to replicate the > ID_Start and ID_Continue properties. Is this just because of the mishandling of the Other_ID_Start and Other_ID_Continue properties, or something else as well? I based my code on the definitions in: ht

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-22 Thread Martin v . Löwis
Martin v. Löwis added the comment: Tal: If you want to verify your is_id_char function, you could use the code for i in range(65536): c = chr(i) c2 = 'a'+c if is_id_char(c) != c2.isidentifier(): print('\\u%.4x'%i,is_id_char(c),c2.isidentifier()) Alternatively, you could use

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-22 Thread Martin v . Löwis
Martin v. Löwis added the comment: The reason the Unicode consortium made this list (Other_ID_Start) is that they want to promise 100% backwards compatibility: if some programming language had been using UAX#31, changes to the Unicode database might break existing code. To avoid this, UAX#31 g

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-21 Thread Ezio Melotti
Ezio Melotti added the comment: > It's an optimization. Assuming the majority of characters will be > ASCII, most non-identifier characters will fail this test, thus > avoiding the more involved generic Unicode check. I don't know what kind of characters are usually received as input. If things

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-21 Thread Tal Einat
Tal Einat added the comment: > What's the reason for checking if the ord is >= 128? It's an optimization. Assuming the majority of characters will be ASCII, most non-identifier characters will fail this test, thus avoiding the more involved generic Unicode check. > However, I would propose th

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-21 Thread Martin v . Löwis
Martin v. Löwis added the comment: The Other_ID_Start property is defined in http://www.unicode.org/Public/UCD/latest/ucd/PropList.txt It currently includes 4 characters. However, I would propose that methods .isidstart and .isidcontinue get added to the str type if there is a need for them.

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-21 Thread Ezio Melotti
Ezio Melotti added the comment: > _ID_FIRST_CATEGORIES = {"Lu", "Ll", "Lt", "Lm", "Lo", "Nl", > "Other_ID_Start"} > _ID_CATEGORIES = _ID_FIRST_CATEGORIES | {"Mn", "Mc", "Nd", "Pc", > "Other_ID_Continue"} Note that "Other_ID_Start"

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-21 Thread Tal Einat
Tal Einat added the comment: Alright, so I'm going to use the equivalent of the following code, unless someone can tell me that something is wrong: from keyword import iskeyword from unicodedata import category, normalize _ID_FIRST_CATEGORIES = {"Lu", "Ll", "Lt", "Lm", "Lo", "Nl",

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-19 Thread Ezio Melotti
Ezio Melotti added the comment: > I'm not sure what the "Other_ID_Start property" mentioned in [1] and > [2] means, though. Can we get someone with more in-depth knowledge of > unicode to help with this? See http://www.unicode.org/reports/tr31/#Backward_Compatibility. Basically they were consid

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-15 Thread Tal Einat
Tal Einat added the comment: AutoComplete isn't doing hidden checks. My concern is that auto-completion happens automatically and the parsing is done synchronously, so if the parsing takes a significant amount of time it can cause a delay long enough to be noticeable by users. We should also c

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-15 Thread Terry J. Reedy
Terry J. Reedy added the comment: I checked for usage: _id(_first)_chars is only used in _eat_identifier, which is used in one place in get_expression. That is called once each in AutoComplete and CallTips. Both are seldom visible accept as requested (by waiting, for calltips). Calltips is onl

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-14 Thread Tal Einat
Tal Einat added the comment: Bah, I messed up the code sample in my previous message. It was supposed to be: from unicodedata import normalize, category norm_char_first = normalize(char)[0] is_id_first_char = ( norm_char_first == '_' or category(norm_char_first) in {"Lu", "Ll", "Lt", "Lm

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-14 Thread Tal Einat
Tal Einat added the comment: It seems that the unicodedata module already supplies relevant functions which can be used for this. For example, we can replace "char in self._id_first_chars" with something like: from unicodedata import normalize, category norm_char = normalize(char)[0] is_id_fir

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: #21686 adds the test file that a new test would go in. -- dependencies: +IDLE - Test hyperparser ___ Python tracker ___

[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

2014-06-14 Thread Terry J. Reedy
New submission from Terry J. Reedy: idlelib.HyperParser.Hyperparser has these lines _whitespace_chars = " \t\n\\" _id_chars = string.ascii_letters + string.digits + "_" _id_first_chars = string.ascii_letters + "_" used in _eat_identifier() and get_expression. At least the latter two s