On Tue, 27 Aug 2013 22:57:45 -0700, David M. Cotter wrote: > I am very sorry that I have offended you to such a degree you feel it > necessary to publicly eviscerate me.
You know David, you are right. I did over-react. And I apologise for that. I am sorry, I was excessively confrontational. (Although I think "eviscerate" is a bit strong.) Putting aside my earlier sarcasm, the basic message remains the same: Python byte strings are not designed to work with Unicode characters, and if they do work, it is an accident, not defined behaviour. > Perhaps I could have worded it like this: "So far I have not seen any > troubles including unicode characters in my strings, they *seem* to be > fine for my use-case. What kind of trouble has been seen with this by > others?" Exactly the same sort of trouble you were having earlier when you were inadvertently decoding the source file as MacRoman rather than UTF-8. Mojibake, garbage characters in your text, corrupted data. http://en.wikipedia.org/wiki/Mojibake The point is, you might not see these errors, because by accident all the relevant factors conspire to give you the correct result. You might test it on a Mac and on Windows and it all works well. You might even test it on a dozen different machines, and it works fine on all of them. But since you're relying on an accident of implementation, none of this is guaranteed. And then in eighteen months time, *something* changes -- a minor update to Python, a different version of Mac OS/X, an unusual Registry setting in Windows, who knows what?, and all of a sudden the factors no longer line up to give you the correct results and it all comes tumbling down in a big stinking mess. If you are lucky you will get a nice clear exception telling you something is broken, but more likely you'll just get corrupted data and mojibake and you, or the poor guy who maintains the code after you, will have no idea why. And you'll probably come here asking for our help to solve it. If you came back and said "I tried it with the u prefix, and it broke a bunch of other code, and I don't have time to fix it now so I'm reverting to the u-less byte string form" I wouldn't *like* it but I could *accept* it as one of those sub-optimal compromises people make in Real Life. I've done the same thing myself, we probably all have: written code we knew was broken, but fixing it was too hard or too low a priority. > Really, I wonder why you are so angry at me for having made a mistake? > I'm going to guess that you don't have kids. What do kids have to do with this? Are you an adult or a child? *wink* You didn't offend me so much as frustrate me. You had multiple people telling you the same thing, don't embed Unicode characters in a byte string, but you choose to not just ignore them but effectively declare that they were all wrong to give that advice, not just the people here but essentially the entire Python development community responsible for adding Unicode strings to the language. Can you blame me for feeling that your reply seemed rather arrogant? In any case, I'm glad you responded with a little more restraint than I did, and I hope you can see my point of view and hopefully I haven't soured you on this forum. -- Steven -- http://mail.python.org/mailman/listinfo/python-list