Hello Damjan, All, On Sun, Apr 27, 2025 at 05:45:51PM +0200, Damjan Jovanovic wrote:
> Hi > > I've begun researching how best to upgrade the old ICU library we use to > newer versions, and it does not seem easy: recent versions require "C11 & > C++17", which AOO code won't build with, and the MSVC compiler in > particular needs an upgrade to build it. > > However ICU exports both a C API and a C++ API. We currently use only the > C++ API, but C is a language we can also consume, and is far more > compatible: C++ can consume code from almost any C language version, but > only compatible C++ language versions. > > On FreeBSD with the very recent ICU 76.1 (from 1 October 2024), I tried to > build using --with-system-icu, and my build of course failed, because even > the ICU header files using new C++ versions can't be parsed. But I saw that > main/i18npool's gencoll_rule is a small standalone executable that uses > ICU, and I tried to patch it to use ICU's C API instead of the ICU C++ API, > and I managed to get it to build successfully. Against ICU 76.1. From 1 > October 2024. And it works against ICU 1.4.2 as well, which is at least 15 > years of compatibility! Wonderful! > ------------------------ > How is it done? > ------------------------ > The patch demonstrating the change is attached. (I am not completely happy > with it, and may want to make further changes before committing: > UParseError may need to be freed, logging needs a review, there is a typo, > etc.) > > Before including ICU header files: > #define U_SHOW_CPLUSPLUS_API 0 > #define U_SHOW_CPLUSPLUS_HEADER_API 0 > will hide the C++ declarations inside ICU's headers. > > Then just use the C functions instead of the C++ classes and methods, eg. > ucol_openRules() instead of "new RuleBasedCollator", ucol_cloneBinary() > instead of RuleBasedCollator::cloneRuleData(), and call ucol_close() > instead of "delete" on the RuleBasedCollator pointer. > > Since C won't throw exceptions, unlike C++, exception safety should be > taken into effect, and nothing else that throws exceptions should be > called. For example, I used malloc()/free() instead of new[]/delete[], as > new[] throws an exception when memory runs out, while malloc() returns NULL. ...we could also implement our own RAII classes... just in case the C interface becomes _too_ tedious. > ----------------------------------------------- > What is the scale of the change? > ----------------------------------------------- > Other than "icu" which is the module's own directory and "scp2" where it's > packaged, the modules using it as per their prj/build.lst include only > i18npool, linguistic, and vcl. However searching through makefiles also > finds cui, editeng, lingucomponent, sc, svtools, svx, and sw. > > >From a quick look through these 10 modules: > - cui: includes unicode/ubidi.h in precompiled headers but appears not to > use ICU at all. > - editeng: includes unicode/ubidi.h and uses the C functions from it. > - i18npool: heavy use of ICU, including collators, calendars, regex, and > more, in C++. > - lingucomponent: does not appear to use ICU at all. > - linguistic: minimal use of unicode/uscript.h in one file. Already uses C > API. > - sc: includes unicode/uchar.h in source/core/tool/interpr1.cxx > - svtools: includes unicode/ubidi.h and uses a couple of C functions in > source/edit/texteng.cxx. > - svx: precompiled header includes unicode/ubidi.h and > source/dialog/fntctrl.cxx includes unicode/uchar.h and calls u_charType() > once. > - sw: includes unicode/ubidi.h and unicode/uchar.h in 7 files, calls > u_charDirection(), u_charType() and some ubidi functions. Only C API. > - vcl: includes a mixture of unicode/ubidi.h, unicode/uchar.h and > unicode/uscript.h in 3 files, uses C APIs. > > So it seems like: > - only i18npool uses the ICU C++ API. > - most ICU usage is in C already. > - ICU is used relatively lightly in AOO, only 10 (or less) out of our 185 > modules use it, and in those modules only a small number of files call a > small number of ICU functions. > > -------------------------------------------------------------------------------------- > What can we expect if we start using the C API for ICU instead? > -------------------------------------------------------------------------------------- > Linux and FreeBSD could use --with-system-icu even with much newer ICU > versions, and system ICU upgrades would not require AOO upgrades. > > When we are building with Clang or GCC, we might be able to build ICU by > using -std=gnu++98 for other AOO code, and -std=<something else> for ICU. > > What about Windows? ICU provides prebuilt binaries for both Win32 and > Win64, that we could use instead of building our own, hopefully allowing us > to link against them from our older MSVC compiler/linker. That precludes > the use of patches to the source code though. Or we could use Clang to > build ICU and MSVC to build the rest of AOO. We have a patch to file source/layout/ArabicShaping.cpp, that I do not fully understand, and therefore I suggest we keep it. (we probably need an Arabic-proficient developer to fully understand its meaning, and I do not speak Arabic, unfortunately) But I think that adapting C code to older standard is much better than C++; we did it already for other modules! > Anyway, let me know what you think? I think that changing C++ calls into C calls, as you proposed in your message, is a great idea and could be the quickest path to get to an updated ICU. Maybe only starting the ``porting'' of the i18npool module will tell how expensive this transition will be. Adapting the ICU C interface to our current MSVC requirements should be an easier task. Boring and possibly long, but easy. Best regards, -- Arrigo --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org