New submission from Max Bachmann <kont...@maxbachmann.de>:

I noticed that when using the Unicode character \U00010900 when inserting the 
character as character:
Here is the result on the Python console both for 3.6 and 3.9:
```
>>> s = '0𐤀00'
>>> s
'0𐤀00'
>>> ls = list(s)
>>> ls
['0', '𐤀', '0', '0']
>>> s[0]
'0'
>>> s[1]
'𐤀'
>>> s[2]
'0'
>>> s[3]
'0'
>>> ls[0]
'0'
>>> ls[1]
'𐤀'
>>> ls[2]
'0'
>>> ls[3]
'0'
```

It appears that for some reason in this specific case the character is actually 
stored in a different position that shown when printing the complete string. 
Note that the string is already behaving strange when marking it in the 
console. When marking the special character it directly highlights the last 3 
characters (probably because it already thinks this character is in the second 
position).

The same behavior does not occur when directly using the unicode point
```
>>> s='000\U00010900'
>>> s
'000𐤀'
>>> s[0]
'0'
>>> s[1]
'0'
>>> s[2]
'0'
>>> s[3]
'𐤀'
```

This was tested using the following Python versions:
```
Python 3.6.0 (default, Dec 29 2020, 02:18:14) 
[GCC 10.2.1 20201125 (Red Hat 10.2.1-9)] on linux

Python 3.9.6 (default, Jul 16 2021, 00:00:00) 
[GCC 11.1.1 20210531 (Red Hat 11.1.1-3)] on linux
```
on Fedora 34

----------
components: Unicode
messages: 401078
nosy: ezio.melotti, maxbachmann, vstinner
priority: normal
severity: normal
status: open
title: Incorrect handling of unicode character \U00010900
type: behavior
versions: Python 3.6, Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45105>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to