Le 06/11/2016 à 14:30, Jean-Marc Lasgouttes a écrit :
This is a more radical approach that what I have in mind, and I do not
know whether it is safe. My idea was to modify the Row building code and
replace the character with some visual cue (in addition with the row
breaking), because I am not confident in sending this character to Qt
string rendering functions.

I'll propose something shortly.

Finally, I convinced myself that your approach is correct if we want to keep the breaks. In the following patch I add some one screen hints of what is going on. I could use a color of the characters, but I am not sure what to do, these are actual characters, not insets. A solution could be to add a frame around the characters.

The next problem is running LaTeX. By default, these characters are not accepted. Could our local latex+unicode experts tell us whether it makes any sense to handle these characters in LaTeX of whether nobody cares and they should be ignored on output?

I suspect that adding them to lib/unicodesymbols would do more harm than good.

I am not sure that the approach of removing them when converting from plain text (paste or insert) is worth it, since we have to handle the characters anyway. But again, at some moments it seems right to me to handle them there.

For example, this hints that we should handle them like (CR)LF:
http://stackoverflow.com/questions/3072152/what-is-unicode-character-2028-ls-line-separator-used-for

JMarc

From 1d5ae75919e70c2b93a471bde9024c3738a9b13f Mon Sep 17 00:00:00 2001
From: Jean-Marc Lasgouttes <lasgout...@lyx.org>
Date: Mon, 7 Nov 2016 10:14:39 +0100
Subject: [PATCH] Handle properly unicode paragraph/line break

They are shown on screen by arrow or pilcrow symbol and cause a line break.

They are still not handled in LaTeX output, though.
---
 src/Paragraph.cpp   |    5 +++++
 src/TextMetrics.cpp |   19 ++++++++++++++++++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/src/Paragraph.cpp b/src/Paragraph.cpp
index 8afa475..05c10b5 100644
--- a/src/Paragraph.cpp
+++ b/src/Paragraph.cpp
@@ -3147,6 +3147,11 @@ bool Paragraph::isHfill(pos_type pos) const
 
 bool Paragraph::isNewline(pos_type pos) const
 {
+	// U+2028 LINE SEPARATOR
+	// U+2029 PARAGRAPH SEPARATOR
+	char_type const c = d->text_[pos];
+	if (c == 0x2028 || c == 0x2029)
+		return true;
 	Inset const * inset = getInset(pos);
 	return inset && inset->lyxCode() == NEWLINE_CODE;
 }
diff --git a/src/TextMetrics.cpp b/src/TextMetrics.cpp
index 8f7ac82..17ee2e4 100644
--- a/src/TextMetrics.cpp
+++ b/src/TextMetrics.cpp
@@ -864,7 +864,23 @@ bool TextMetrics::breakRow(Row & row, int const right_margin) const
 		} else if (c == '\t')
 			row.addSpace(i, theFontMetrics(*fi).width(from_ascii("    ")),
 				     *fi, par.lookupChange(i));
-		else {
+		else if (c == 0x2028 || c == 0x2029) {
+			/**
+			 * U+2028 LINE SEPARATOR
+			 * U+2029 PARAGRAPH SEPARATOR
+
+			 * These are special unicode characters that break
+			 * lines/pragraphs. Not handling them lead to trouble wrt
+			 * Qt QTextLayout formatting. We add a visible character
+			 * on screen so that the user can see that something is
+			 * happening.
+			*/
+			row.finalizeLast();
+			// ⤶ U+2936 ARROW POINTING DOWNWARDS THEN CURVING LEFTWARDS
+			// ¶ U+00B6 PILCROW SIGN
+			char_type const screen_char = (c == 0x2028) ? 0x2936 : 0x00B6;
+			row.add(i, screen_char, *fi, par.lookupChange(i));
+		} else {
 			// FIXME: please someone fix the Hebrew/Arabic parenthesis mess!
 			// see also Paragraph::getUChar.
 			if (fi->language()->lang() == "hebrew") {
@@ -925,6 +941,7 @@ bool TextMetrics::breakRow(Row & row, int const right_margin) const
 		BufferParams const & bparams
 			= text_->inset().buffer().params();
 		f.setLanguage(par.getParLanguage(bparams));
+		// ¶ U+00B6 PILCROW SIGN
 		row.addVirtual(end, docstring(1, char_type(0x00B6)), f, Change());
 	}
 
-- 
1.7.9.5

Reply via email to