On 9/13/22, G. Branden Robinson <g.branden.robin...@gmail.com> wrote: >> Or look at the Unicode standard, where real great minds with >> incredible multi-national professional life careers are involved, >> get the official PDF (hr-hrm, i have not updated since Unicode >> 13..), combined words are separated with hyphen-minus, _not_ >> hyphen. > > I am dubious of this claim. I would like to see how you verified it.
http://en.wikipedia.org/wiki/Hyphen-minus#Description also makes this claim: "Though the Unicode Standard states that the U+2010 hyphen is 'preferred' over the hyphen-minus, the Standard itself uses the hyphen-minus as its hyphen character." It has citations for both these statements for anyone who wants to dig further. It seems an odd choice, but is hardly the Unicode committee's worst sin. (That distinction I still reserve for their recommendation to use U+2019 as both a closing single quotation mark and an apostrophe, an asinine overloading that obscures the vast semantic difference between the two.) > However, by default groff does _not_ break after en dashes. I > don't know why this is the case; it has been true for a long time. I don't know why either, but I can speculate that for most hyphen and em dash usages, a break following the dash is acceptable, whereas for one common use of the en dash -- indicating a number range -- a break would look odd. > My hypothesis is that less(1) treats '.' as standing for any single code > point rather than any single byte in the input stream. My version (less 563) doesn't behave that way, even in a UTF-8 locale. Maybe it's the older version I'm using, or maybe some other environmental factor. > [3] The "hyphen-minus" was, I gather, an entity unknown to typographers > in, say, 1970. It exists because early computer character encoding > standards, like ASCII, had limited glyph repertoires and overloaded > many glyphs. There is also the fact that computer keyboards were largely based on typewriter keyboards, which going back decades earlier had only one midlevel horizontal bar that had to pull duty as a hyphen, minus sign, and (typed multiple times) dash. So even had they had more slots to work with, the ASCII designers had only the one all-purpose input character to encode.