On Friday, March 6, 2015 at 10:50:35 AM UTC+5:30, Chris Angelico wrote: > On Fri, Mar 6, 2015 at 3:53 PM, Rustom Mody wrote: > > My conclusion: Early adopters of unicode -- Windows and Java -- were > > punished > > for their early adoption. You can blame the unicode consortium, you can > > blame the babel of human languages, particularly that some use characters > > and some only (the equivalent of) what we call words. > > > > Or you can skip the blame-game and simply note the fact that large segments > > of > > extant code-bases are currently in bug-prone or plain buggy state. > > For most of the 1990s, I was writing code in REXX, on OS/2. An even > earlier adopter, REXX didn't have Unicode support _at all_, but > instead had facilities for working with DBCS strings. You can't get > everything right AND be the first to produce anything. Python didn't > make Unicode strings the default until 3.0, but that's not Unicode's > fault. > > > This includes not just bug-prone-system code such as Java and Windows but > > seemingly working code such as python 3. > > > > Here is Roy's Smith post that first started me thinking that something may > > be wrong with SMP > > https://groups.google.com/d/msg/comp.lang.python/loYWMJnPtos/GHMC0cX_hfgJ > > > > Some parts are here some earlier and from my memory. > > If details wrong please correct: > > - 200 million records > > - Containing 4 strings with SMP characters > > - System made with python and mysql. SMP works with python, breaks mysql. > > So whole system broke due to those 4 in 200,000,000 records > > > > I know enough (or not enough) of unicode to be chary of statistical > > conclusions > > from the above. > > My conclusion is essentially an 'existence-proof': > > Hang on hang on. Why are you blaming Python or SMP characters for > this? The problem here is MySQL, which doesn't adequately cope with > the full Unicode range. (Or, didn't then, or doesn't with its default > settings. I believe you can configure current versions of MySQL to > work correctly, though I haven't actually checked. PostgreSQL gets it > right, that's good enough for me.) > > > SMP-chars can break systems. > > The breakage is costly-fied by the combination > > - layman statistical assumptions > > - BMP → SMP exercises different code-paths > > Broken systems can be shown up by anything. Suppose you have a program > that breaks when it gets a NUL character (not unknown in C code); is > the fault with the Unicode consortium for allocating something at > codepoint 0, or the code that can't cope with a perfectly normal > character?
Strawman. Lets please stick to UTF-16 shall we? Now tell me: - Is it broken or not? - Is it widely used or not? - Should programmers be careful of it or not? - Should programmers be warned about it or not? -- https://mail.python.org/mailman/listinfo/python-list