09/04/2014 12:47, Vincent van Ravesteijn:
Urdu uses the same script, so the same arabic_table can be reused,
just as the table is also used for Farsi.

Some specific Urdu letters are not implemented in this table, but it
is almost trivial to fix that.

... or we could do like the following patch, (for the str-metrics branch). This patch uses Qt for drawing RTL strings and _seems_ to work. However, I did not commit it because
1/ I do not know what I am doing
and
2/ I have a "better" version (see the code), but it does not work.

The tests for RTL characters (check whether it is in the Arabic or Hebrew ranges) is probably a bit rough around the edges, but I
think it is possible to get it right. The trick is that we take into
account the layout direction of the characters themselves, not only of
the language.

The nice and scary property of this patch is that it will lead us to kill our carefully hand tuned Arabic and Hebrew support (for display). I would appreciate to testing or some comments before applying it to my branch. I only tried it visually on one Hebrew and one Farsi document.

Note that at this point, I do not intend to implement perfect bidi handling, but only to re-implement what we already have using string-level metrics. So please do not try to lure me into implementing implementing the full specs. The goal is just (for now) to find what I'd call a Gudinov algorithm, that is something that does not regress wrt the current implementation.

Comments welcome.

JMarc


>From 77e5fc4009bdf978bdde9081a585b1cc7fe564bc Mon Sep 17 00:00:00 2001
From: Jean-Marc Lasgouttes <lasgout...@lyx.org>
Date: Thu, 10 Apr 2014 16:37:21 +0200
Subject: [PATCH] Also draw right-to-left text string-wise

We rely on Qt built-in unicode support for handling Arabic and Hebrew
compose characters.

This  should provide a nice speedup at a low cost and
will eventually allow us to get rid of:
 * most of our Arabic/Hebrew machinery in Encodings.cpp,
 * Paragraph::transformChar,
 * and probably more.
---
 00README_STR_METRICS_BRANCH |   11 ++++---
 src/Encoding.cpp            |    6 ++++
 src/Encoding.h              |    2 ++
 src/rowpainter.cpp          |   74 +++++++++++++++++--------------------------
 src/rowpainter.h            |    2 +-
 src/support/lstrings.cpp    |   23 ++++++++++++++
 src/support/textutils.h     |    3 ++
 7 files changed, 70 insertions(+), 51 deletions(-)

diff --git a/00README_STR_METRICS_BRANCH b/00README_STR_METRICS_BRANCH
index 82026e8..762759a 100644
--- a/00README_STR_METRICS_BRANCH
+++ b/00README_STR_METRICS_BRANCH
@@ -11,7 +11,7 @@ what we have with force_paint_single_char. Moreover there has been
 some code factorization in TextMetrics, where the same row-breaking
 algorithm was basically implemented 3 times.
 
-Currently everything is supposed to work for LTR text, and RTL text
+Currently everything is supposed to work for LTR text, and RtL text
 should work too except possibly metrics with Arabic and Hebrew fonts.
 
 When KEEP_OLD_METRICS_CODE is defined in TextMetrics.cpp, the new code
@@ -40,12 +40,13 @@ What is done:
   lyxrc.force_paint_single_char is false. In this case, remove also
   useless workarounds which disable kerning and ligatures.
 
+* when lyxrc.force_paint_single_char is false, draw also RtL text
+  string-wise. This both speed-up drawing and prepare for code
+  removel, since we now rely on Qt to do things we use to do by
+  ourselves (see isArabic* and isHebrew* code in Encodings.cpp).
 
-Next steps:
 
-* check what happens with Arabic and/or Hebrew text. There may be some
-  problems related to compose characters. Investigate whether RtL
-  strings could be drawn on a string-wise basis.
+Next steps:
 
 * investigate whether strings could be cut at separators in RowPainter
   only in justified text. This would speed-up painting in other cases
diff --git a/src/Encoding.cpp b/src/Encoding.cpp
index a231106..7167d7b 100644
--- a/src/Encoding.cpp
+++ b/src/Encoding.cpp
@@ -704,6 +704,12 @@ docstring Encodings::fromLaTeXCommand(docstring const & cmd, int cmdtype,
 }
 
 
+bool Encodings::isHebrewChar(char_type c)
+{
+	return c >= 0x0590 && c <= 0x05ff;
+}
+
+
 bool Encodings::isHebrewComposeChar(char_type c)
 {
 	return c <= 0x05c2 && c >= 0x05b0 && c != 0x05be && c != 0x05c0;
diff --git a/src/Encoding.h b/src/Encoding.h
index 5f788f9..171b4f1 100644
--- a/src/Encoding.h
+++ b/src/Encoding.h
@@ -269,6 +269,8 @@ public:
 	///
 	static bool isHebrewComposeChar(char_type c);
 	///
+	static bool isHebrewChar(char_type c);
+	///
 	static bool isArabicComposeChar(char_type c);
 	///
 	static bool isArabicSpecialChar(char_type c);
diff --git a/src/rowpainter.cpp b/src/rowpainter.cpp
index 5b69701..6871e6e 100644
--- a/src/rowpainter.cpp
+++ b/src/rowpainter.cpp
@@ -226,7 +226,7 @@ void RowPainter::paintArabicComposeChar(pos_type & vpos, FontInfo const & font)
 
 
 void RowPainter::paintChars(pos_type & vpos, FontInfo const & font,
-			    bool hebrew, bool arabic)
+			    bool arabic)
 {
 	// This method takes up 70% of time when typing
 	pos_type pos = bidi_.vis2log(vpos);
@@ -244,7 +244,7 @@ void RowPainter::paintChars(pos_type & vpos, FontInfo const & font,
 			c = ')';
 		else if (c == ')')
 			c = '(';
-		str[0] = par_.transformChar(c, pos);
+		str[0] = c;
 	}
 
 	pos_type const end = row_.endpos();
@@ -260,6 +260,12 @@ void RowPainter::paintChars(pos_type & vpos, FontInfo const & font,
 	bool const spell_state =
 		lyxrc.spellcheck_continuously && par_.isMisspelled(pos);
 
+	// are we building a RtL string? 
+	//FIXME: I would like to use the new isRTL() from textutils.h,
+	// but it does not give the same results for some reason I do
+	// not understand.
+	bool const rtl = Encodings::isArabicChar(str[0]) || Encodings::isHebrewChar(str[0]);
+
 	// collect as much similar chars as we can
 	for (++vpos ; vpos < end ; ++vpos) {
 		if (lyxrc.force_paint_single_char)
@@ -287,37 +293,19 @@ void RowPainter::paintChars(pos_type & vpos, FontInfo const & font,
 
 		char_type c = par_.getChar(pos);
 
-		if (c == '\t')
-			break;
-
-		if (!isPrintableNonspace(c))
-			break;
-
-		/* Because we do our own bidi, at this point the strings are
-		 * already in visual order. However, Qt also applies its own
-		 * bidi algorithm to strings that it paints to the screen.
-		 * Therefore, if we were to paint Hebrew/Arabic words as a
-		 * single string, the letters in the words would get reversed
-		 * again. In order to avoid that, we don't collect Hebrew/
-		 * Arabic characters, but rather paint them one at a time.
-		 * See also http://thread.gmane.org/gmane.editors.lyx.devel/79740
-		 */
-		if (hebrew)
+		//FIXME: I would like to use the new isRTL() from textutils.h,
+		// but it does not give the same results for some reason I do
+		// not understand.
+		bool const new_rtl = Encodings::isArabicChar(c) || Encodings::isHebrewChar(c);
+		if (new_rtl != rtl)
+			// String direction has changed
 			break;
 
-		/* FIXME: these checks are irrelevant, since 'arabic' and
-		 * 'hebrew' alone are already going to trigger a break.
-		 * However, this should not be removed completely, because
-		 * if an alternative solution is found which allows grouping
-		 * of arabic and hebrew characters, then these breaks may have
-		 * to be re-applied.
-
-		if (arabic && Encodings::isArabicComposeChar(c))
+		if (c == '\t')
 			break;
 
-		if (hebrew && Encodings::isHebrewComposeChar(c))
+		if (!isPrintableNonspace(c))
 			break;
-		*/
 
 		// FIXME: Why only round brackets and why the difference to
 		// Hebrew? See also Paragraph::getUChar
@@ -326,9 +314,6 @@ void RowPainter::paintChars(pos_type & vpos, FontInfo const & font,
 				c = ')';
 			else if (c == ')')
 				c = '(';
-			c = par_.transformChar(c, pos);
-			/* see comment in hebrew, explaining why we break */
-			break;
 		}
 
 		str.push_back(c);
@@ -337,6 +322,18 @@ void RowPainter::paintChars(pos_type & vpos, FontInfo const & font,
 
 	docstring s(&str[0], str.size());
 
+	/* Because we do our own bidi, at this point the strings are
+	 * already in visual order. However, Qt also applies its own
+	 * bidi algorithm to strings that it paints to the screen.
+	 * Therefore, if we were to paint Hebrew/Arabic words as a
+	 * single string, the letters in the words would get reversed
+	 * again. In order to avoid that, we reverse the string in advance.
+	 * See also http://thread.gmane.org/gmane.editors.lyx.devel/79740
+	 * for an earlier thread on the subject
+	 */
+	if (rtl)
+		reverse(s.begin(), s.end());
+
 	if (s[0] == '\t')
 		s.replace(0,1,from_ascii("    "));
 
@@ -397,12 +394,8 @@ void RowPainter::paintFromPos(pos_type & vpos, bool changed)
 	Font const orig_font = text_metrics_.displayFont(pit_, pos);
 	double const orig_x = x_;
 
-	// usual characters, no insets
-	char_type const c = par_.getChar(pos);
-
 	// special case languages
 	string const & lang = orig_font.language()->lang();
-	bool const hebrew = lang == "hebrew";
 	bool const arabic = lang == "arabic_arabtex" || lang == "arabic_arabi" ||
 						lang == "farsi";
 
@@ -410,16 +403,7 @@ void RowPainter::paintFromPos(pos_type & vpos, bool changed)
 	bool const misspelled =
 		lyxrc.spellcheck_continuously && par_.isMisspelled(pos);
 
-	// draw as many chars as we can
-	if ((!hebrew && !arabic)
-		|| (hebrew && !Encodings::isHebrewComposeChar(c))
-		|| (arabic && !Encodings::isArabicComposeChar(c))) {
-		paintChars(vpos, orig_font.fontInfo(), hebrew, arabic);
-	} else if (hebrew) {
-		paintHebrewComposeChar(vpos, orig_font.fontInfo());
-	} else if (arabic) {
-		paintArabicComposeChar(vpos, orig_font.fontInfo());
-	}
+	paintChars(vpos, orig_font.fontInfo(), arabic);
 
 	paintForeignMark(orig_x, orig_font.language());
 
diff --git a/src/rowpainter.h b/src/rowpainter.h
index 644cb37..986c3f4 100644
--- a/src/rowpainter.h
+++ b/src/rowpainter.h
@@ -64,7 +64,7 @@ private:
 	void paintHebrewComposeChar(pos_type & vpos, FontInfo const & font);
 	void paintArabicComposeChar(pos_type & vpos, FontInfo const & font);
 	void paintChars(pos_type & vpos, FontInfo const & font,
-			bool hebrew, bool arabic);
+			bool arabic);
 	int paintAppendixStart(int y);
 	void paintFromPos(pos_type & vpos, bool changed);
 	void paintInset(Inset const * inset, pos_type const pos);
diff --git a/src/support/lstrings.cpp b/src/support/lstrings.cpp
index 7af8aae..db4304a 100644
--- a/src/support/lstrings.cpp
+++ b/src/support/lstrings.cpp
@@ -158,6 +158,29 @@ bool isNumber(char_type c)
 }
 
 
+bool isRTL(char_type c)
+{
+	if (!is_utf16(c))
+		// assume that no non-utf16 character is right-to-left
+		// c outside the UCS4 range is catched as well
+		return false;
+	QChar::Direction direction = ucs4_to_qchar(c).direction();
+	/**
+	 * See for example:
+	 *  http://en.wikipedia.org/wiki/Template:Bidi_Class_%28Unicode%29.
+	 * Here we only handle the easy cases:
+	 *  * R:  Hebrew alphabet and related punctuation
+	 *  * AL: Arabic, Thaana and Syriac alphabets, and most
+	 *        punctuation specific to those scripts
+	 *
+	 * FIXME: testing show that this does not work (see
+	 * RowPainter::paintChars), but my knowledge of unicode is too
+	 * poor to understand why.
+	 */
+	return direction == QChar::DirR || direction ==QChar::DirAL;
+}
+
+
 bool isDigitASCII(char_type c)
 {
 	return '0' <= c && c <= '9';
diff --git a/src/support/textutils.h b/src/support/textutils.h
index 78e34cb..4cb304a 100644
--- a/src/support/textutils.h
+++ b/src/support/textutils.h
@@ -44,6 +44,9 @@ bool isSpace(char_type c);
 /// return true if a unicode char is a numeral.
 bool isNumber(char_type c);
 
+/// return true is a unicode char uses a right-to-left direction for layout
+bool isRTL(char_type c);
+
 /// return whether \p c is a digit in the ASCII range
 bool isDigitASCII(char_type c);
 
-- 
1.7.9.5

Reply via email to