Hi

I've begun researching how best to upgrade the old ICU library we use to
newer versions, and it does not seem easy: recent versions require "C11 &
C++17", which AOO code won't build with, and the MSVC compiler in
particular needs an upgrade to build it.

However ICU exports both a C API and a C++ API. We currently use only the
C++ API, but C is a language we can also consume, and is far more
compatible: C++ can consume code from almost any C language version, but
only compatible C++ language versions.

On FreeBSD with the very recent ICU 76.1 (from 1 October 2024), I tried to
build using --with-system-icu, and my build of course failed, because even
the ICU header files using new C++ versions can't be parsed. But I saw that
main/i18npool's gencoll_rule is a small standalone executable that uses
ICU, and I tried to patch it to use ICU's C API instead of the ICU C++ API,
and I managed to get it to build successfully. Against ICU 76.1. From 1
October 2024. And it works against ICU 1.4.2 as well, which is at least 15
years of compatibility!

------------------------
How is it done?
------------------------
The patch demonstrating the change is attached. (I am not completely happy
with it, and may want to make further changes before committing:
UParseError may need to be freed, logging needs a review, there is a typo,
etc.)

Before including ICU header files:
#define U_SHOW_CPLUSPLUS_API 0
#define U_SHOW_CPLUSPLUS_HEADER_API 0
will hide the C++ declarations inside ICU's headers.

Then just use the C functions instead of the C++ classes and methods, eg.
ucol_openRules() instead of "new RuleBasedCollator", ucol_cloneBinary()
instead of RuleBasedCollator::cloneRuleData(), and call ucol_close()
instead of "delete" on the RuleBasedCollator pointer.

Since C won't throw exceptions, unlike C++, exception safety should be
taken into effect, and nothing else that throws exceptions should be
called. For example, I used malloc()/free() instead of new[]/delete[], as
new[] throws an exception when memory runs out, while malloc() returns NULL.

-----------------------------------------------
What is the scale of the change?
-----------------------------------------------
Other than "icu" which is the module's own directory and "scp2" where it's
packaged, the modules using it as per their prj/build.lst include only
i18npool, linguistic, and vcl. However searching through makefiles also
finds cui, editeng, lingucomponent, sc, svtools, svx, and sw.

>From a quick look through these 10 modules:
- cui: includes unicode/ubidi.h in precompiled headers but appears not to
use ICU at all.
- editeng: includes unicode/ubidi.h and uses the C functions from it.
- i18npool: heavy use of ICU, including collators, calendars, regex, and
more, in C++.
- lingucomponent: does not appear to use ICU at all.
- linguistic: minimal use of unicode/uscript.h in one file. Already uses C
API.
- sc: includes unicode/uchar.h in source/core/tool/interpr1.cxx
- svtools: includes unicode/ubidi.h and uses a couple of C functions in
source/edit/texteng.cxx.
- svx: precompiled header includes unicode/ubidi.h and
source/dialog/fntctrl.cxx includes unicode/uchar.h and calls u_charType()
once.
- sw: includes unicode/ubidi.h and unicode/uchar.h in 7 files, calls
u_charDirection(), u_charType() and some ubidi functions. Only C API.
- vcl: includes a mixture of unicode/ubidi.h, unicode/uchar.h and
unicode/uscript.h in 3 files, uses C APIs.

So it seems like:
- only i18npool uses the ICU C++ API.
- most ICU usage is in C already.
- ICU is used relatively lightly in AOO, only 10 (or less) out of our 185
modules use it, and in those modules only a small number of files call a
small number of ICU functions.

--------------------------------------------------------------------------------------
What can we expect if we start using the C API for ICU instead?
--------------------------------------------------------------------------------------
Linux and FreeBSD could use --with-system-icu even with much newer ICU
versions, and system ICU upgrades would not require AOO upgrades.

When we are building with Clang or GCC, we might be able to build ICU by
using -std=gnu++98 for other AOO code, and -std=<something else> for ICU.

What about Windows? ICU provides prebuilt binaries for both Win32 and
Win64, that we could use instead of building our own, hopefully allowing us
to link against them from our older MSVC compiler/linker. That precludes
the use of patches to the source code though. Or we could use Clang to
build ICU and MSVC to build the rest of AOO.

Anyway, let me know what you think?

Regards
Damjan
diff --git a/main/i18npool/source/collator/gencoll_rule.cxx b/main/i18npool/source/collator/gencoll_rule.cxx
index 2295d79b35..66a1b7962a 100644
--- a/main/i18npool/source/collator/gencoll_rule.cxx
+++ b/main/i18npool/source/collator/gencoll_rule.cxx
@@ -30,8 +30,10 @@
 #include <sal/main.h>
 #include <sal/types.h>
 #include <rtl/ustrbuf.hxx>
-
+#define U_SHOW_CPLUSPLUS_API 0
+#define U_SHOW_CPLUSPLUS_HEADER_API 0
 #include "warnings_guard_unicode_tblcoll.h"
+#include "unicode/ucol.h"
 
 U_CAPI void U_EXPORT2 uprv_free(void *mem);
 
@@ -107,30 +109,41 @@ SAL_IMPLEMENT_MAIN_WITH_ARGS(argc, argv)
 	fclose(fp);
 
     UErrorCode status = U_ZERO_ERROR;
-    //UParseError parseError;
-    //UCollator *coll = ucol_openRules(Obuf.getStr(), Obuf.getLength(), UCOL_OFF, 
-    //        UCOL_DEFAULT_STRENGTH, &parseError, &status);
+    UParseError parseError;
+    UCollator *coll = ucol_openRules(reinterpret_cast<const UChar *>(Obuf.getStr()), -1, UCOL_OFF, 
+            UCOL_DEFAULT_STRENGTH, &parseError, &status);
+
+    //RuleBasedCollator *coll = new RuleBasedCollator(reinterpret_cast<const UChar *>(Obuf.getStr()), status);	// UChar != sal_Unicode in MinGW
 
-    RuleBasedCollator *coll = new RuleBasedCollator(reinterpret_cast<const UChar *>(Obuf.getStr()), status);	// UChar != sal_Unicode in MinGW
 
     if (U_SUCCESS(status)) {
 
         int32_t len = 0;
-        uint8_t *data = coll->cloneRuleData(len, status);
-
-        if (U_SUCCESS(status) && data != NULL)
-            data_write(argv[2], argv[3], data, len);
-        else {
+        status = U_ZERO_ERROR;
+        len = ucol_cloneBinary(coll, NULL, 0, &status);
+        if (len > 0 && status == U_BUFFER_OVERFLOW_ERROR) {
+            uint8_t* data = (uint8_t*)malloc(len);
+            if (data != NULL) {
+                status = U_ZERO_ERROR;
+                len = ucol_cloneBinary(coll, data, len, &status);
+                if (U_SUCCESS(status))
+                    data_write(argv[2], argv[3], data, len);
+                else {
+                    printf("Could not get rule data from collator\n");
+                }
+                free(data);
+            } else {
+                printf("Out of memory getting rule data from collator\n");
+            }
+        } else {
             printf("Could not get rule data from collator\n");
         }
-
-	if (data) uprv_free(data);
     } else {
         printf("\nRule parsering error\n");
     }
 
     if (coll)
-        delete coll;
+        ucol_close(coll); //delete coll;
 
     return U_SUCCESS(status) ? 0 : 1;
 }	// End of main
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Reply via email to