On 1/18/2023 5:43 AM, Stephen Tucker wrote:
Thanks for these responses.

I was encouraged to read that I'm not the only one to find this all
confusing.

I have investigated a little further.

1. I produced the following IDLE log:

mylongstr = ""
for thisCP in range (1, 256):
mylongstr += chr (thisCP) + " " + str (ord (chr (thisCP))) + ", "


print mylongstr
1, 2, 3, 4, 5, 6, 7, 8, 9,
  10, 11, 12,
  13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31,   32, ! 33, " 34, # 35, $ 36, % 37, & 38, ' 39, ( 40, ) 41, * 42, + 43,
, 44, - 45, . 46, / 47, 0 48, 1 49, 2 50, 3 51, 4 52, 5 53, 6 54, 7 55, 8
56, 9 57, : 58, ; 59, < 60, = 61, > 62, ? 63, @ 64, A 65, B 66, C 67, D 68,
E 69, F 70, G 71, H 72, I 73, J 74, K 75, L 76, M 77, N 78, O 79, P 80, Q
81, R 82, S 83, T 84, U 85, V 86, W 87, X 88, Y 89, Z 90, [ 91, \ 92, ] 93,
^ 94, _ 95, ` 96, a 97, b 98, c 99, d 100, e 101, f 102, g 103, h 104, i
105, j 106, k 107, l 108, m 109, n 110, o 111, p 112, q 113, r 114, s 115,
t 116, u 117, v 118, w 119, x 120, y 121, z 122, { 123, | 124, } 125, ~
126, 127, タ 128, チ 129, ツ 130, テ 131, ト 132, ナ 133, ニ 134, ヌ 135, ネ 136, ノ
137, ハ 138, ヒ 139, フ 140, ヘ 141, ホ 142, マ 143, ミ 144, ム 145, メ 146, モ 147,
ヤ 148, ユ 149, ヨ 150, ラ 151, リ 152, ル 153, レ 154, ロ 155, ワ 156, ン 157, ゙
158, ゚ 159, ᅠ 160, ᄀ 161, ᄁ 162, ᆪ 163, ᄂ 164, ᆬ 165, ᆭ 166, ᄃ 167, ᄄ 168,
ᄅ 169, ᆰ 170, ᆱ 171, ᆲ 172, ᆳ 173, ᆴ 174, ᆵ 175, ᄚ 176, ᄆ 177, ᄇ 178, ᄈ
179, ᄡ 180, ᄉ 181, ᄊ 182, ᄋ 183, ᄌ 184, ᄍ 185, ᄎ 186, ᄏ 187, ᄐ 188, ᄑ 189,
ᄒ 190, ﾿ 191, À 192, Á 193, Â 194, Ã 195, Ä 196, Å 197, Æ 198, Ç 199, È
200, É 201, Ê 202, Ë 203, Ì 204, Í 205, Î 206, Ï 207, Ð 208, Ñ 209, Ò 210,
Ó 211, Ô 212, Õ 213, Ö 214, × 215, Ø 216, Ù 217, Ú 218, Û 219, Ü 220, Ý
221, Þ 222, ß 223, à 224, á 225, â 226, ã 227, ä 228, å 229, æ 230, ç 231,
è 232, é 233, ê 234, ë 235, ì 236, í 237, î 238, ï 239, ð 240, ñ 241, ò
242, ó 243, ô 244, õ 245, ö 246, ÷ 247, ø 248, ù 249, ú 250, û 251, ü 252,
ý 253, þ 254, ÿ 255,


2. I copied and pasted the IDLE log into a text file and ran a program on
it that told me about every byte in the log.

3. I discovered the following:

Bytes 001 to 127 (01 to 7F hex) inclusive were printed as-is;

Bytes 128 to 191 (80 to BF) inclusive were output as UTF-8-encoded
characters whose codepoints were FF00 hex more than the byte values (hence
the strange glyphs);

Bytes 192 to 255 (C0 to FF) inclusive were output as UTF-8-encoded
characters - without any offset being added to their codepoints in the
meantime!

I thought you might just be interested in this - there does seem to be some
method in IDLE's mind, at least.

This has nothing to do with IDLE. The UTF-8 encoding of those code points uses two bytes instead of one. See

https://stackoverflow.com/questions/8732025/why-degree-symbol-differs-from-utf-8-from-unicode#:~:text=UTF-8%20encodes%20the%20value%200xB0%20as%20two%20consecutive,on%20endianness%20(I%20suppose%20other%20orderings%20are%20possible).coding-in-vs-code-on-ubuntu-leading-to-unicode-error/62652695#62652695




Stephen Tucker.








On Wed, Jan 18, 2023 at 9:41 AM Peter J. Holzer <hjp-pyt...@hjp.at> wrote:

On 2023-01-17 22:58:53 -0500, Thomas Passin wrote:
On 1/17/2023 8:46 PM, rbowman wrote:
On Tue, 17 Jan 2023 12:47:29 +0000, Stephen Tucker wrote:
2. Does the IDLE in Python 3.x behave the same way?

fwiw

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license()" for more
information.
str = ""
for c in range(140, 169):
      str += chr(c) + " "

print(str)
Œ   Ž     ‘ ’ “ ” • – — ˜ ™ š › œ   ž Ÿ   ¡ ¢ £ ¤ ¥
¦ § ¨


I don't know how this will appear since Pan is showing the icon for a
character not in its set.  However, even with more undefined characters
the printable one do not change. I get the same output running Python3
from the terminal so it's not an IDLE thing.

I'm not sure what explanation is being asked for here.  Let's take
Python3,
so we can be sure that the strings are in unicode.  The font being used
by
the console isn't mentioned, but there's no reason it should have glyphs
for
any random unicode character.

Also note that the characters between 128 (U+0080) and 159 (U+009F)
inclusive aren't printable characters. They are control characters.

         hp

--
    _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | h...@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
--
https://mail.python.org/mailman/listinfo/python-list


--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to