This small patch makes most of plain text readable again (in utf8).
Questions:
1) Is it on purpose that the functions in unicode.h convert only between
std::vectors of characters and C strings, but not std::string/docstring? I
think we should have variants for these as well. or are we always supposed
to use such constructs as in the patch?
2) Do we agree that we should use lyx::doscstring for all internal methods
that store parts of the document, i. e. change
std::string const TocBackend::Item::str() const;
to
lyx::docstring const TocBackend::Item::str() const;
and convert to utf8 where needed (in this case for plain text output)? Or
should we not change the type, but use utf8 as encoding instead? I believe
the former is safer.
Georg
Index: src/output_plaintext.C
===================================================================
--- src/output_plaintext.C (Revision 14695)
+++ src/output_plaintext.C (Arbeitskopie)
@@ -23,6 +23,7 @@
#include "ParagraphParameters.h"
#include "support/lstrings.h"
+#include "support/unicode.h"
#include <fstream>
@@ -232,8 +233,10 @@ void asciiParagraph(Buffer const & buf,
"writeAsciiFile: NULL char in structure." << endl;
break;
- default:
- word += c;
+ default: {
+ std::vector<char> tmp = ucs4_to_utf8(c);
+ tmp.push_back('\0');
+ word += &tmp[0];
if (runparams.linelen > 0 &&
currlinelen + word.length() > runparams.linelen)
{
@@ -244,6 +247,7 @@ void asciiParagraph(Buffer const & buf,
}
break;
}
+ }
}
os << word;
}