Quentin Wenger <wenger.quen...@bluewin.ch> added the comment: > > this limitation to the latin-1 subset is not compatible with the > > documentation, which says that valid Python identifiers are valid group > > names. > > Not all latin-1 characters are valid identifier, for example: > > >>> '\x94'.encode('latin1') > b'\x94' > >>> '\x94'.isidentifier() > False
True but that's not the point. Δ is a valid Python identifier but not a valid group name in bytes regexes, because it is not in the latin-1 plane. The documentation does not mention this. > There is a workaround, you can convert `bytes` to `str` with "latin-1" > decoder before processing, IIRC there will be no extra overhead > (memory/speed) during processing, then the name and content are the same > type. :) I am not searching a workaround for my current code. And the simplest workaround is to latin-1-convert back to bytes, because re should not latin-1-convert to string in the first place. Are you saying that the proper way to use bytes regexes is to use string regexes instead? > Please look at these: > > >>> orig_name = "Ř" > >>> orig_ch = orig_name.encode("cp1250") # Because why not? > >>> orig_ch > b'\xd8' > >>> name = list(re.match(b"(?P<" + orig_ch + b">)", > b"").groupdict().keys())[0] > >>> name > 'Ø' # '\xd8' > >>> name == orig_name > False > >>> name.encode("latin-1") > b'\xd8' > >>> name.encode("latin-1") == orig_ch > True > > "Ř" (\u0158) --cp1250--> b'\xd8' > "Ø" (\u00d8) --latin-1--> b'\xd8' That's no surprize, I carefully crafted this example. :-) Rather, that is exactly my point: several different strings (which can all be valid Python identifiers) can have the same single-byte representation, simply by the mean of different encodings (duh). So why convert group names to strings when outputting them from matches, when you don't know where the bytes come from, or even whether they ever were strings? That should be left to the programmer. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue40980> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com