Emacs claims to fully conform to the Unicode Bidirectional Algorithm 8.0.0 (see sections 22.19 'Bidirectional Editing' and 37.26 'Bidirectional Display' of the Emacs manual), yet I have noticed some behavior that makes me question this claim.
I'll appreciate the opinion of others, this way or the other. For each of the following three situation, I wish to know: Is Emacs' behavior consistent with the UBA? If it does, I'd like to know whether you find this behavior in line with the 'spirit' of the UBA, and with common sense. 1. Paragraph boundaries. According to the Emacs manual (section 22.19) "Paragraph boundaries are empty lines, i.e., lines consisting entirely of whitespace characters." The following screenshot shows this behavior in action: http://imgur.com/3eyrUfA 2. Visualization of explicit bidi characters. According to the Emacs manual (section 22.19: "In a GUI session, the lrm and rlm characters display as very thin blank characters; on text terminals they display as blanks." The following screenshot shows this behavior in action. There are three bidi marks (LRI,PDI,LRM) between the two left-most x's. http://imgur.com/VD3Lvsn 3. Line wrapping. The following screenshot shows the line-breaking algorithm in action. The paragraph starts with two Hebrew words followed by the beginning of Abraham Lincoln's Gettysburg Address. The English text flows from the bottom to the top. http://imgur.com/Bckn7zP Possible reasons why these behaviors are reasonable and consistent with the standard. 1. Paragraph boundaries. The UBA allows applications to employ higher-level protocols when deciding on base paragraph direction. See section 4.3 and specifically clause HL1 there. 2. Visualization of explicit bidi characters. (a) The UBA also allows to display the bidi characters. See section 5.2. (b) This is just the default; it can be customized like every other character's glyph. 3. Line wrapping. The remedy is simple: break long lines into shorter ones by inserting newlines.