Am Samstag, 2. Dezember 2006 15:03 schrieb Abdelrazak Younes:
> Please, let's just concentrate on the 16bits range for now and make sure
> that LyX is fine with that. And to answer Georg post in another thread,
> I've replaced the unicode conversion with simple cast because:
Oh, I knew the reasons, but I did not like that nevertheless.
> 1) we don't know yet how to deal with more than 16 bits char with Qt4
> (yet). From my initial reading of Qt docs, this is not clear how this
> could be supported and I don't have the time to investigate more.
It is simple: Just do a proper conversion to QString. If André is right
then everything else will be done by Qt.
> 2) the iconv conversion were awfully slow! At least on windows, so a
> solution using iconv in ucs4_to_qchar() is out of the question.
>
> IMHO a rock solid and fast LyX supporting the 16 bit range is more than
> enough for 1.5. But if you or Georg want to do handle full ucs4, be my
> guest.
What I did not like with your solution was that you replaced a slow, but
working solution with a fast, but only partially working one. My rule is
always to first make it work and then do optimizations.
Because I am a nice guy I do not only complain but implemented a solution
that is both correct and should also be fast. I had no time for
measurementsm, but I tested the version for Qt 4.2.x and 4.1.x, and they
both work. Do you see any speed difference with and without this patch? I
expect only a very small one if any, because the only differences are some
additional if clauses, the code should be basically the same for
characters in the utf16 range.
At the same time the patch gets rid of the ucs2 conversion that where used
for qt3.
> Besides...
> 1) this has been reduced to 10Mo now and this is number is only for a
> document using lots of fonts (UserGuide.lyx), typical documents won't be
> affected that much and this does not depend on the number of opened
> documents.
> 2) The cache is enabled only on Mac and Windows. Speacking for Windows,
> 37 Mo when dealing with a document as complex as the UserGuide is quite
> reasonable IMHO.
I agree with Jean-Marc that it is too much. LyX should also be usable on
machines with less than 512 MB RAM.
> 3) If this really is a problem, then we can switch to a QHash solution
> or even the map solution used in 1.4. The speed impact is very
negligible.
This is really an argument to switch to the QHash. It is good that you did
it, please update also the comment.
Georg
Index: src/frontends/qt4/qt_helpers.C
===================================================================
--- src/frontends/qt4/qt_helpers.C (Revision 16138)
+++ src/frontends/qt4/qt_helpers.C (Arbeitskopie)
@@ -19,6 +19,7 @@
#include "support/lstrings.h"
#include "support/convert.h"
+#include "support/unicode.h"
#include "debug.h"
@@ -111,30 +112,50 @@ void lengthToWidgets(QLineEdit * input,
}
-void ucs4_to_qstring(lyx::docstring const & str, QString & s)
+#if QT_VERSION < 0x040200
+void ucs4_to_qstring_helper(char_type const * str, size_t ls, QString & s)
{
- int i = static_cast<int>(str.size());
- s.resize(i);
- for (; --i >= 0;)
- s[i] = ucs4_to_qchar(str[i]);
+ std::vector<unsigned short> const utf16 = ucs4_to_utf16(str, ls);
+ s.setUtf16(&utf16[0], utf16.size());
}
+#endif
-QString const toqstr(docstring const & ucs4)
+void ucs4_to_qstring(docstring const & str, QString & s)
{
- QString s;
- ucs4_to_qstring(ucs4, s);
- return s;
+ int i = static_cast<int>(str.size());
+ s.resize(i);
+ for (; --i >= 0;) {
+ if (str[i] < 65536) {
+ s[i] = QChar(static_cast<unsigned short>(str[i]));
+ } else {
+ // A simple cast is not possible, so we need to resort to
+ // the slow full conversion.
+#if QT_VERSION >= 0x040200
+ s = QString::fromUcs4(reinterpret_cast<uint const *>(str.data()), str.length());
+#else
+ std::vector<unsigned short> const utf16 = ucs4_to_utf16(str.data(), str.length());
+ s.setUtf16(&utf16[0], utf16.size());
+#endif
+ return;
+ }
+ }
}
docstring const qstring_to_ucs4(QString const & qstr)
{
+#if QT_VERSION >= 0x040200
+ QVector<uint> const ucs4 = qstr.toUcs4();
+ return docstring(ucs4.begin(), ucs4.end());
+#else
+ // This does not properly convert surrogate pairs
int const ls = qstr.size();
docstring ucs4;
for (int i = 0; i < ls; ++i)
ucs4 += static_cast<char_type>(qstr[i].unicode());
return ucs4;
+#endif
}
Index: src/frontends/qt4/qt_helpers.h
===================================================================
--- src/frontends/qt4/qt_helpers.h (Revision 16138)
+++ src/frontends/qt4/qt_helpers.h (Arbeitskopie)
@@ -76,26 +76,54 @@ inline QString const toqstr(std::string
inline char_type const qchar_to_ucs4(QChar const & qchar) {
return static_cast<char_type>(qchar.unicode());
}
+// no QChar const ucs4_to_qchar(char_type) because not all UCS4 code points
+// can be translated to one QChar
-inline QChar const ucs4_to_qchar(char_type const ucs4) {
- // FIXME: The following cast is not a real conversion but it work
- // for the ucs2 subrange of unicode. Instead of an assertion we should
- // return some special characters that indicates that its display is
- // not supported.
- BOOST_ASSERT(ucs4 < 65536);
- return QChar(static_cast<unsigned short>(ucs4));
+
+void ucs4_to_qstring(docstring const & str, QString & s);
+
+
+inline QString const toqstr(docstring const & ucs4)
+{
+ // If possible we let qt do the work, since this version does not
+ // need to be superfast.
+#if QT_VERSION >= 0x040200
+ return QString::fromUcs4(reinterpret_cast<uint const *>(ucs4.data()), ucs4.length());
+#else
+ QString s;
+ ucs4_to_qstring(ucs4, s);
+ return s;
+#endif
}
-QString const toqstr(docstring const & ucs4);
-void ucs4_to_qstring(docstring const & str, QString & s);
+#if QT_VERSION < 0x040200
+/// This slow variant needs to be used for ucs4 characters above 65535
+/// Don't use this directly, only through ucs4_to_qstring!
+void ucs4_to_qstring_helper(char_type const * str, size_t ls, QString & s);
+#endif
+
+/// This one needs to be superfast, because it is used in metrics calculation.
+/// Therefore we try to do without real conversion if possible.
inline void ucs4_to_qstring(char_type const * str, size_t ls, QString & s)
{
int i = static_cast<int>(ls);
s.resize(i);
- for (; --i >= 0;)
- s[i] = ucs4_to_qchar(str[i]);
+ for (; --i >= 0;) {
+ if (str[i] < 65536)
+ s[i] = QChar(static_cast<unsigned short>(str[i]));
+ else {
+ // A simple cast is not possible, so we need to resort to
+ // the slow full conversion.
+#if QT_VERSION >= 0x040200
+ s = QString::fromUcs4(reinterpret_cast<uint const *>(str), ls);
+#else
+ ucs4_to_qstring_helper(str, ls, s);
+#endif
+ return;
+ }
+ }
}
Index: src/frontends/qt4/GuiFontMetrics.C
===================================================================
--- src/frontends/qt4/GuiFontMetrics.C (Revision 16138)
+++ src/frontends/qt4/GuiFontMetrics.C (Arbeitskopie)
@@ -52,15 +52,29 @@ int GuiFontMetrics::maxDescent() const
int GuiFontMetrics::lbearing(char_type c) const
{
- return metrics_.leftBearing(ucs4_to_qchar(c));
+ if (c < 65536)
+ return metrics_.leftBearing(QChar(static_cast<unsigned short>(c)));
+ else {
+ QString s;
+ ucs4_to_qstring(&c, 1, s);
+ BOOST_ASSERT(s.size() > 0);
+ return metrics_.leftBearing(s[0]);
+ }
}
int GuiFontMetrics::rbearing(char_type c) const
{
// Qt rbearing is from the right edge of the char's width().
- QChar sc = ucs4_to_qchar(c);
- return metrics_.width(sc) - metrics_.rightBearing(sc);
+ if (c < 65536) {
+ QChar const sc(static_cast<unsigned short>(c));
+ return metrics_.width(sc) - metrics_.rightBearing(sc);
+ } else {
+ QString s;
+ ucs4_to_qstring(&c, 1, s);
+ BOOST_ASSERT(s.size() > 0);
+ return metrics_.width(s) - metrics_.rightBearing(s[s.size() - 1]);
+ }
}
@@ -159,33 +173,63 @@ void GuiFontMetrics::buttonText(docstrin
int GuiFontMetrics::ascent(char_type c) const
{
- QRect const & r = metrics_.boundingRect(ucs4_to_qchar(c));
- return -r.top();
+ if (c < 65536) {
+ QRect const & r = metrics_.boundingRect(QChar(static_cast<unsigned short>(c)));
+ return -r.top();
+ } else {
+ QString s;
+ ucs4_to_qstring(&c, 1, s);
+ QRect const & r = metrics_.boundingRect(s);
+ return -r.top();
+ }
}
int GuiFontMetrics::descent(char_type c) const
{
- QRect const & r = metrics_.boundingRect(ucs4_to_qchar(c));
- return r.bottom() + 1;
+ if (c < 65536) {
+ QRect const & r = metrics_.boundingRect(QChar(static_cast<unsigned short>(c)));
+ return r.bottom() + 1;
+ } else {
+ QString s;
+ ucs4_to_qstring(&c, 1, s);
+ QRect const & r = metrics_.boundingRect(s);
+ return r.bottom() + 1;
+ }
}
#else
void GuiFontMetrics::fillMetricsCache(char_type c) const
{
- QRect const & r = metrics_.boundingRect(ucs4_to_qchar(c));
- AscendDescend ad = { -r.top(), r.bottom() + 1};
- // We could as well compute the width but this is not really
- // needed for now as it is done directly in width() below.
- metrics_cache_.insert(c, ad);
+ if (c < 65536) {
+ QRect const & r = metrics_.boundingRect(QChar(static_cast<unsigned short>(c)));
+ AscendDescend ad = { -r.top(), r.bottom() + 1};
+ // We could as well compute the width but this is not really
+ // needed for now as it is done directly in width() below.
+ metrics_cache_.insert(c, ad);
+ } else {
+ QString s;
+ ucs4_to_qstring(&c, 1, s);
+ QRect const & r = metrics_.boundingRect(s);
+ AscendDescend ad = { -r.top(), r.bottom() + 1};
+ // We could as well compute the width but this is not really
+ // needed for now as it is done directly in width() below.
+ metrics_cache_.insert(c, ad);
+ }
}
int GuiFontMetrics::width(char_type c) const
{
if (!width_cache_.contains(c)) {
- width_cache_.insert(c, metrics_.width(ucs4_to_qchar(c)));
+ if (c < 65536) {
+ width_cache_.insert(c, metrics_.width(QChar(static_cast<unsigned short>(c))));
+ } else {
+ QString s;
+ ucs4_to_qstring(&c, 1, s);
+ width_cache_.insert(c, metrics_.width(s));
+ }
}
return width_cache_.value(c);
Index: src/support/unicode.C
===================================================================
--- src/support/unicode.C (Revision 16138)
+++ src/support/unicode.C (Arbeitskopie)
@@ -24,14 +24,23 @@
using std::endl;
+namespace {
+
+#ifdef WORDS_BIGENDIAN
+ char const * utf16_codeset = "UTF16-BE";
+#else
+ char const * utf16_codeset = "UTF16-LE";
+#endif
+
+}
+
+
namespace lyx {
#ifdef WORDS_BIGENDIAN
char const * ucs4_codeset = "UCS-4BE";
- char const * ucs2_codeset = "UCS-2BE";
#else
char const * ucs4_codeset = "UCS-4LE";
- char const * ucs2_codeset = "UCS-2LE";
#endif
static const iconv_t invalid_cd = (iconv_t)(-1);
@@ -219,52 +228,18 @@ utf8_to_ucs4(char const * utf8str, size_
}
-lyx::char_type
-ucs2_to_ucs4(unsigned short c)
-{
- return ucs2_to_ucs4(&c, 1)[0];
-}
-
-
-std::vector<lyx::char_type>
-ucs2_to_ucs4(std::vector<unsigned short> const & ucs2str)
-{
- if (ucs2str.empty())
- return std::vector<lyx::char_type>();
-
- return ucs2_to_ucs4(&ucs2str[0], ucs2str.size());
-}
-
-
-std::vector<lyx::char_type>
-ucs2_to_ucs4(unsigned short const * ucs2str, size_t ls)
-{
- static IconvProcessor processor(ucs4_codeset, ucs2_codeset);
- return iconv_convert<lyx::char_type>(processor, ucs2str, ls);
-}
-
-
-unsigned short
-ucs4_to_ucs2(lyx::char_type c)
-{
- return ucs4_to_ucs2(&c, 1)[0];
-}
-
-
-std::vector<unsigned short>
-ucs4_to_ucs2(std::vector<lyx::char_type> const & ucs4str)
+std::vector<char_type>
+utf16_to_ucs4(unsigned short const * s, size_t ls)
{
- if (ucs4str.empty())
- return std::vector<unsigned short>();
-
- return ucs4_to_ucs2(&ucs4str[0], ucs4str.size());
+ static IconvProcessor processor(ucs4_codeset, utf16_codeset);
+ return iconv_convert<char_type>(processor, s, ls);
}
std::vector<unsigned short>
-ucs4_to_ucs2(lyx::char_type const * s, size_t ls)
+ucs4_to_utf16(char_type const * s, size_t ls)
{
- static IconvProcessor processor(ucs2_codeset, ucs4_codeset);
+ static IconvProcessor processor(utf16_codeset, ucs4_codeset);
return iconv_convert<unsigned short>(processor, s, ls);
}
Index: src/support/unicode.h
===================================================================
--- src/support/unicode.h (Revision 16138)
+++ src/support/unicode.h (Arbeitskopie)
@@ -67,24 +67,13 @@ std::vector<lyx::char_type> utf8_to_ucs4
std::vector<lyx::char_type> utf8_to_ucs4(char const * utf8str, size_t ls);
-// ucs2_to_ucs4
+// utf16_to_ucs4
-lyx::char_type ucs2_to_ucs4(unsigned short c);
+std::vector<char_type> utf16_to_ucs4(unsigned short const * s, size_t ls);
-std::vector<lyx::char_type>
-ucs2_to_ucs4(std::vector<unsigned short> const & ucs2str);
-
-std::vector<lyx::char_type>
-ucs2_to_ucs4(unsigned short const * ucs2str, size_t ls);
-
-// ucs4_to_ucs2
-
-unsigned short ucs4_to_ucs2(lyx::char_type c);
-
-std::vector<unsigned short>
-ucs4_to_ucs2(std::vector<lyx::char_type> const & ucs4str);
+// ucs4_to_utf16
-std::vector<unsigned short> ucs4_to_ucs2(lyx::char_type const * s, size_t ls);
+std::vector<unsigned short> ucs4_to_utf16(char_type const * s, size_t ls);
// ucs4_to_utf8
@@ -105,7 +94,6 @@ std::vector<char>
ucs4_to_eightbit(lyx::char_type const * ucs4str, size_t ls, std::string const & encoding);
extern char const * ucs4_codeset;
-extern char const * ucs2_codeset;
} // namespace lyx