[issue46572] Unicode identifiers not necessarily unique

Diego Argueta Sat, 29 Jan 2022 09:06:22 -0800

New submission from Diego Argueta <[email protected]>:

The way Python 3 handles identifiers containing mathematical characters appears 
to be broken. I didn't test the entire range of U+1D400 through U+1D59F but I 
spot-checked them and the bug manifests itself there:


    Python 3.9.7 (default, Sep 10 2021, 14:59:43) 
    [GCC 11.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.

    >>> foo = 1234567890
    >>> bar = 1234567890
    >>> foo is bar
    False
    >>> 𝖇𝖆𝖗 = 1234567890

    >>> foo is 𝖇𝖆𝖗
    False
    >>> bar is 𝖇𝖆𝖗
    True

    >>> 𝖇𝖆𝖗 = 0
    >>> bar
    0


This differs from the behavior with other non-ASCII characters. For example, 
ASCII 'a' and Cyrillic 'a' are properly treated as different identifiers:

    >>> а = 987654321    # Cyrillic lowercase 'a', U+0430
    >>> a = 123456789    # ASCII 'a'
    >>> а        # Cyrillic
    987654321
    >>> a        # ASCII
    123456789


While a bit of a pathological case, it is a nasty surprise. It's possible this 
is a symptom of a larger bug in the way identifiers are resolved.

This is similar but not identical to https://bugs.python.org/issue46555

Note: I did not find this myself; I give credit to Cooper Stimson 
(https://github.com/6C1) for finding this bug. I merely reported it.

----------
components: Parser, Unicode
messages: 412084
nosy: da, ezio.melotti, lys.nikolaou, pablogsal, vstinner
priority: normal
severity: normal
status: open
title: Unicode identifiers not necessarily unique
type: behavior
versions: Python 3.7, Python 3.8, Python 3.9

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue46572>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue46572] Unicode identifiers not necessarily unique

Reply via email to