On Fri 27 Mar 2020 at 12:42:17 (-0400), Greg Wooledge wrote: > On Fri, Mar 27, 2020 at 11:12:49AM -0500, David Wright wrote: > > On Fri 27 Mar 2020 at 17:35:19 (+0300), Reco wrote: > > > I'm not that familiar with the languages to qualify Romanian as a Latin > > > or a non-Latin language, > > > > I think we can agree that the Romans spoke Latin! > > Err... Romanian is not the language that was spoken in Rome.
I don't think anyone has said that it was. The Romans (in the sense that I think we're also agreed on, ie those living in Rome a couple of millennia ago) spoke Latin. As deloptes has pointed out, they didn't speak the literary language, just as most English people don't speak literary English. They spoke Vulgar Latin, and that outlived the Empire, evolving and dividing into the Romance languages. But it's strange how an aside can kill the actual discussion of email headers. > It is, however, considered a "Romance language", meaning it has Latin > as one of its primary roots. > > That said, I think the actual question was which character set it > requires. The Romanian language only came up because Andrei used it to write an example of a filename. AIUI what's important in deciding whether to encode it is the type of Latin, ie Latin-1 vs anything else. Pointing out that it's been encoded here in Unicode is obvious (it's prefixed by utf-8), but AFAICT it could have been encoded in Latin-10 just as well. (Andrei would have to adjudicate on Latin-10's suitability for the 1st and 10th letters, which have comma below. I think Latin-2 has the cedilla form, and Andrei's utf-8 version avoided that.) (BTW I'm not sure about Reco's use of \uc899. Does \u mean that c899 is in utf-8, or should it be followed by a Unicode codepoint, as in U+c899? If the latter, then \uc899 is way off my charts.) However, the actual problem that Russell introduced was how a character set—any character set—should be encoded in the email header parameter's value. And the RFC answer is "not in Base64", which is for unstructured fields, as illustrated by the header of my previous post. Mutt, as expected, writes conformant values but can be instructed to decode particular non-conformant ones. > According to > <https://en.wikipedia.org/wiki/Romanian_alphabet#Digital_typography>, > it originally used ISO 8859-2. … aka Latin-2. Not being Romanian, I can't comment on the relative popularity of that and Latin-10 (ISO 8859-16) or whether the latter was still-born. And… > Of course, it would probably use UTF-8 > on most modern systems. Yes, that also seems more up-to-date and expressive, with support for distinguishing obscure (to me) variants like cedilla vs comma below. Cheers, David.

