On Tue, 1 Apr 2014 20:20:13 +0000 "Whistler, Ken" <[email protected]> wrote:
> I don’t think the answer is directly deduced from UAX #9, because > it involves deciding where to insert a visible hyphen for display. > However, I think the correct answer here is your number two guess, > i.e. (in a RTL paragraph context): > > -car SI TORRAC > > A way to think about this, rather than starting from the BN nature > of U+00AD, is to ask what would happen if there was an *explicit* > hyphen-minus at the same position. Is it legitimate to truncate the context to a single line? The BiDi algorithm is attempting to interpret unlabelled text as embedded text (it's not an arbitrary dance), and in just one line there is no indicator of whether the hyphen is part of the LTR text embedded in RTL text. However, the very next character is 'r', which tells us that the left-to-right run contains the hyphen. I also think the HYPHEN-MINUS is the wrong character to consider - the analogy should be with U+2010 HYPHEN (class ON) rather than with U+2212 MINUS SIGN (class ES), let alone the ambiguous HPYHEN-MINUS, for which ES is merely the interpretation most likely to work. I found a similar example, but with Hebrew embedded in the Latin script, in the introduction to the Stuttgart Bible. The corresponding character was U+05BE HEBREW PUNCTUATION MAQAF, though in this case the class is R (because one doesn't expect MAQAF to be used with left-to right scripts), and therefore not as good an example as I would have hoped for. The BiDi algorith then happily places the MAQAF internally, making the analogy 'car- SI TORRAC'. (I metaphorically embedded the quote, so I don't get 'SI TORRAC car-', which is plain wrong.) Now, a valid opposing view is that the graphical representation of soft hyphens says, "When written out as one very long line, there is no space between successive lines", as opposed to "This apparent word is actually continued by text on the next line". If you take the interpretation of the marks operating at the level of lines, then '-car SI TORRAC' is reasonable. As English has the hyphen as a half-way house between one word and two words, English very naturally works at the word level. I am not sure about other languages. Richard. _______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

