On Fri, Jul 14, 2017 at 8:59 PM, Marko Rauhamaa <ma...@pacujo.net> wrote: > Chris Angelico <ros...@gmail.com>: > >> On Fri, Jul 14, 2017 at 6:53 PM, Marko Rauhamaa <ma...@pacujo.net> wrote: >>> Chris Angelico <ros...@gmail.com>: >>> Then, why bother with Unicode to begin with? Why not just use bytes? >>> After all, Python3's strings have the very same pitfalls: >>> >>> - you don't know the length of a text in characters >>> - chr(n) doesn't return a character >>> - you can't easily find the 7th character in a piece of text >> >> First you have to define "character". > > I'm referring to the > > Grapheme clusters, a.k.a.real characters
Okay. Just as long as you know that that's not the only valid definition. >> Yes, you can. For most purposes, textual equality should be defined in >> terms of NFC or NFD normalization. Python already gives you that. You >> could argue that a string should always be stored NFC (or NFD, take >> your pick), and then the equality operator would handle this; but I'm >> not sure the benefit is worth it. > > As I said, Python3's strings are neither here nor there. They don't > quite solve the problem Python2's strings had. They will push the > internationalization problems a bit farther out but fall short of the > mark. > > he developer still has to worry a lot. Unicode seemingly solved one > problem only to present the developer of a bagful of new problems. > > And if Python3's strings are a half-measure, why not stick to bytes? Python's float type can't represent all possible non-integer values. If it's such a half-measure, why not stick to integers and do all your own fraction handling? >> If you're trying to use strings as identifiers in any way (say, file >> names, or document lookup references), using the NFC/NFD normalized >> form of the string should be sufficient. > > Show me ten Python3 database applications, and I'll show you ten Python3 > database applications that don't normalize their primary keys. I don't have ten open source ones handy, but I can tell you for sure that I've worked with far more than ten that don't NEED to normalize their primary keys. Why? Because they are *by definition* normal already. Mostly because they use integers for keys. Tada! Normalization is unnecessary. ChrisA -- https://mail.python.org/mailman/listinfo/python-list