On 11/24/15 09:03, Gerd Hoffmann wrote: > Hi, > >> Is it accepted practice to put UTF-8 in commit messages? (Or, actually, >> anywhere in patches, except maybe the notes section?) > > Sure. We don't want limit people names (in signed-off etc) to us-ascii. > > See "eb8934b vnc: fix memory corruption (CVE-2015-5225)" for a name > written in kanji.
I'm very sorry, but this is something I expressly disagree with. (I didn't want to bring this up on my own, but since you did...) International engineering / science / research etc. are being done in English. We use English because we expect people to learn one common language (native English speakers have it easy, but that's a side point), so that everyone not have to learn every possible language. The same point applies *much more* to writing systems / alphabets. You (the generic you) can't expect me (the generic me) to read Kanji, Sanskrit, Thai script, Cyrillic script, and so on, even if your name is written in that language natively. You come up with an approximation in Latin script, and use that. Is your purpose to feel pleased about the faithful representation of your name in the commit message (that the international community is unable to read, not even approximately), or is your goal to allow the community to read your (approximate) name? I bet everyone who is involved in international development, and travels occasionally, has business cards in Latin script *too*. I bet whoever does research and publishes papers in English puts their name (or at least an official approximation of it) on the front page in Latin script *too*. Specifically about the commit you mention, the email of the reporter is: zuozhi....@alibaba-inc.com I'm absolutely sure that "zuozhi" is the official Pinyin transliteration of the reporter's name (or a part of it). Now, while Pinyin has its own separate pronunciation rules, I *can* (and occasionally do) look up those rules. So Pinyin allows me to *work* with the name with relative safety, and it even gives me a fleeting chance at getting the pronunciation right, should we meet. My name is László Érsek. I've dropped the accents for the purpose of international exchange in advance; I just write Laszlo Ersek, even when I sign physical documents that are in English. Even that way, I've seen the larger community abuse my name endlessly; in particular I've seen all permutations (= reordering) and variations (= missing characters) of the substring "szl" in "Laszlo". If the larger community fails to get such a simple ASCII name right -- and yes I'm at fault too, I have occasionally left off the second "n" of your last name --, then why am I (or anyone else) expected to struggle with names written in non-Latin script? They are much harder, and have exactly zero value, as far as international collaboration is concerned. The development is being done in English, and the script of English is Latin. Stick with it. >> I'd recommend o_O. > > Heh, it's 2015, not 1995 ... Sure, and the Internet Standards are still being written in pure ASCII. https://en.wikipedia.org/wiki/Request_for_Comments#Obtaining_RFCs Laszlo