If encoding italics means reencoding the normal linguistic usage, it is no ! We already have the nightmares caused by partial encoding of Latin and Greek (als a few Hebrew characters) for maths notations or IPA notations, but they are restricted to a well delimited scope of use and subset, and at least they have relevant scientific sources and auditors for what is needed in serious publications (Anyway these subsets may continue to evolve but very slowly). We could have exceptions added for chemical or electrical notations, if there are standard bodies supporting them. But for linguistic usage, there's no universal agreement and no single authority. Characters are added according to common use (by statistic survey, or because there are some national standard promoting them and sometimes making their use mandatory with defined meanings, sometimes legally binding). For everything else, languages are not constrained and users around the world invent their own letterforms, styles: there' no limit at all and if we start accepting such reencoding, the situation would in fact be worse in terms of interoperability ,because noone can support zillions variants if they are not explicitly encoded separately as surrounding styles, or scoping characters if needed (using contextual characters, possibly variant selectors if these variants are most often isolated). But italics encoded as varaint selectors would just pollute everything; and anyway "italic" is not a single universal convention and does not apply erqually to all scripts). The semantics attached to italic styles also varies from document to documents, and the sema semantics also have different typographic conventions depending on authors, and there's no agreed meaning bout the distinctions they encode. For this reason "italique/oblique/cursive/handwriting..." should remain in styles (note also that even the italic transform can be variable, it could also be later a subject of user preferences where people may want to adjust the degree or slanting, according to their reading preferences, or its orientation if they are left-handed to match how they write themselves, or if the writer is a native RTL writer; the context of use (in BiDi) may also adject this slanting orientation, e.g. inserting some Latin in Arabic could present the Latin italic letters slanted backward, to better match the slanting of Arabic itself and avoid collisions of Latin and Arabic glyphs at BiDi boundaries... One can still propose a contextual control character, but it would still be insufficient for correctly representing the many stylistic variants possible: we have better languages to do that now, and CSS (or even HTML) is better for it (including for accessibility requirements: note that there's no way to translate corretly these italics to Braille readers for example; Braille or audio readers attempt to infer an heuristic to reduce the number of contextual words or symbols they need to insert between each character, but using VSn characters would complicate that: they are already processing the standard HTML/CSS conventions to do that much more simply). direct native encoding of italic characters for lingusitic use would fail if it only covers English: it would worsen the language coverage if people are then said to remove the essential diacritics common in their language, only because of the partial coverage of their alphabet. I don't think this is worth the effort (and it would in fact cause lot of maintenance and would severely complicate the addition of new missing letters; and let's not forget the case of common ligatures, correct typograhpic features like kerning which would no longer be supported and would render ugly text if many new kerning pairs are missing in fonts, many fonts used today would no longer work properly, we would have a reduction of stylistic options and less fonts usable, and we would fall into the trap of proprietary solutions with a single provider; it would be too difficult or any font designer to start defining a usable font sellable on various market: these fonts would be reduced to niches, and would no longer find a way to be economically defined and maintained at reasonable cost. Consider the problems orthogonally: even if you use CSS/HTML styles in document encoding (rather than the plain text character encoding) you can also supply the additional semantics clearly in that document, and also encode the intent of the author, or supply enough info to permit alternate renderings (for accessibility, or for technical reasons such as small font sizes on devices will low resolution, or for people with limited vision capabilities). the same will apply to color (whose meaning is not clear, except in specific notations supported by wellknown authorities, or by a long tradition shared by many authors and kept in archives or important text corpus, such as litterature, legal, and publications that have fallen to the public domain after their iniçtial publisher disappeared and their proprietary assets were dissolved: the original documents remain as reliable sources sharable by many and which can guide the development of reuse using them as an established convention that many can now reuse without explaining them too much). we can repeat this argument to the other common styles : monospaced, bold, doublestruck, hollow, shadowed, 3D-like, underlining/striking/upperlining, generic subscripts and superscripts (I don't like the partial encoding of Latin letters in subscript/superscript working only for basic modern English, this is an abuse of what was defined mostly for jsut a few wellknown abbreviation or notations that have a long multilingual tradition): authors have much more freedom of creation using separate styles, encoding in an upper-layer protocol. However we can admit that for use in documents not intended to be rendered visually, but used technically, we would need some contextual control characters (just like those for BiDi when HTML/CSS is not usable): these are just needed for compatibility with technical contraints, provided that there's an application support for that and such application is not vendor-specific but sponsored by a wellknown standard (which should then be explicited in Unicode, probably by character properties, just like additional properties used for CJK characters specifying the dictionnary sources). That referenced standard should be open, readable at least by all (even if it is not republishable), and the standard body should have an open contact with the community, and regular meetings to solve incoming issues by defining some policies or the best practices, or the current "state of the art" (if research is still continuing), as well as some rules for making the transition and maintaining a good level of compatibility if this standard evolves or switches to another supported standard.
Le jeu. 17 janv. 2019 à 04:51, James Kass via Unicode <unicode@unicode.org> a écrit : > > Victor Gaultney wrote, > > > Treating italic like punctuation is a win for a lot of people: > > Italic Unicode encoding is a win for a lot of people regardless of > approach. Each of the listed wins remains essentially true whether > treated as punctuation, encoded atomically, or selected with VS. > > > My main point in suggesting that Unicode needs these characters is that > > italic has been used to indicate specific meaning - this text is somehow > > special - for over 400 years, and that content should be preserved in > plain > > text. > > ( http://www.unicode.org/versions/Unicode11.0.0/ch02.pdf ) > > "Plain text must contain enough information to permit the text to be > rendered legibly, and nothing more." > > The argument is that italic information can be stripped yet still be > read. A persuasive argument towards encoding would need to negate that; > it would have to be shown that removing italic information results in a > loss of meaning. > > The decision makers at Unicode are familiar with italic use conventions > such as those shown in "The Chicago Manual of Style" (first published in > 1906). The question of plain-text italics has arisen before on this > list and has been quickly dismissed. > > Unicode began with the idea of standardizing existing code pages for the > exchange of computer text using a unique double-byte encoding rather > than relying on code page switching. Latin was "grandfathered" into the > standard. Nobody ever submitted a formal proposal for Basic Latin. > There was no outreach to establish contact with the user community -- > the actual people who used the script as opposed to the "computer nerds" > who grew up with ANSI limitations and subsequent ISO code pages. > Because that's how Unicode rolled back then. Unicode did what it was > supposed to do WRT Basic Latin. > > When someone points out that italics are used for disambiguation as well > as stress, the replies are consistent. > > "That's not what plain-text is for." "That's not how plain-text > works." "That's just styling and so should be done in rich-text." > "Since we do that in rich-text already, there's no reason to provide for > it in plain-text." "You can already hack it in plain-text by enclosing > the string with slashes." And so it goes. > > But if variant letter form information is stripped from a string like > "Jackie Brown", the primary indication that the string represents either > a person's name or a Tarantino flick title is also stripped. "Thorstein > Veblen" is either a dead economist or the name of a fictional yacht in > the Travis McGee series. And so forth. > > Computer text tradition aside, nobody seems to offer any legitimate > reason why such information isn't worthy of being preservable in > plain-text. Perhaps there isn't one. > > I'm not qualified to assess the impact of italic Unicode inclusion on > the rich-text world as mentioned by David Starner. Maybe another list > member will offer additional insight or a second opinion. > >