Hello Damjan, All,

On Sun, Apr 27, 2025 at 05:45:51PM +0200, Damjan Jovanovic wrote:

> Hi
> 
> I've begun researching how best to upgrade the old ICU library we use to
> newer versions, and it does not seem easy: recent versions require "C11 &
> C++17", which AOO code won't build with, and the MSVC compiler in
> particular needs an upgrade to build it.
> 
> However ICU exports both a C API and a C++ API. We currently use only the
> C++ API, but C is a language we can also consume, and is far more
> compatible: C++ can consume code from almost any C language version, but
> only compatible C++ language versions.
> 
> On FreeBSD with the very recent ICU 76.1 (from 1 October 2024), I tried to
> build using --with-system-icu, and my build of course failed, because even
> the ICU header files using new C++ versions can't be parsed. But I saw that
> main/i18npool's gencoll_rule is a small standalone executable that uses
> ICU, and I tried to patch it to use ICU's C API instead of the ICU C++ API,
> and I managed to get it to build successfully. Against ICU 76.1. From 1
> October 2024. And it works against ICU 1.4.2 as well, which is at least 15
> years of compatibility!

Wonderful!

> ------------------------
> How is it done?
> ------------------------
> The patch demonstrating the change is attached. (I am not completely happy
> with it, and may want to make further changes before committing:
> UParseError may need to be freed, logging needs a review, there is a typo,
> etc.)
> 
> Before including ICU header files:
> #define U_SHOW_CPLUSPLUS_API 0
> #define U_SHOW_CPLUSPLUS_HEADER_API 0
> will hide the C++ declarations inside ICU's headers.
> 
> Then just use the C functions instead of the C++ classes and methods, eg.
> ucol_openRules() instead of "new RuleBasedCollator", ucol_cloneBinary()
> instead of RuleBasedCollator::cloneRuleData(), and call ucol_close()
> instead of "delete" on the RuleBasedCollator pointer.
> 
> Since C won't throw exceptions, unlike C++, exception safety should be
> taken into effect, and nothing else that throws exceptions should be
> called. For example, I used malloc()/free() instead of new[]/delete[], as
> new[] throws an exception when memory runs out, while malloc() returns NULL.

...we could also implement our own RAII classes... just in case the C
interface becomes _too_ tedious.

> -----------------------------------------------
> What is the scale of the change?
> -----------------------------------------------
> Other than "icu" which is the module's own directory and "scp2" where it's
> packaged, the modules using it as per their prj/build.lst include only
> i18npool, linguistic, and vcl. However searching through makefiles also
> finds cui, editeng, lingucomponent, sc, svtools, svx, and sw.
> 
> >From a quick look through these 10 modules:
> - cui: includes unicode/ubidi.h in precompiled headers but appears not to
> use ICU at all.
> - editeng: includes unicode/ubidi.h and uses the C functions from it.
> - i18npool: heavy use of ICU, including collators, calendars, regex, and
> more, in C++.
> - lingucomponent: does not appear to use ICU at all.
> - linguistic: minimal use of unicode/uscript.h in one file. Already uses C
> API.
> - sc: includes unicode/uchar.h in source/core/tool/interpr1.cxx
> - svtools: includes unicode/ubidi.h and uses a couple of C functions in
> source/edit/texteng.cxx.
> - svx: precompiled header includes unicode/ubidi.h and
> source/dialog/fntctrl.cxx includes unicode/uchar.h and calls u_charType()
> once.
> - sw: includes unicode/ubidi.h and unicode/uchar.h in 7 files, calls
> u_charDirection(), u_charType() and some ubidi functions. Only C API.
> - vcl: includes a mixture of unicode/ubidi.h, unicode/uchar.h and
> unicode/uscript.h in 3 files, uses C APIs.
> 
> So it seems like:
> - only i18npool uses the ICU C++ API.
> - most ICU usage is in C already.
> - ICU is used relatively lightly in AOO, only 10 (or less) out of our 185
> modules use it, and in those modules only a small number of files call a
> small number of ICU functions.
> 
> --------------------------------------------------------------------------------------
> What can we expect if we start using the C API for ICU instead?
> --------------------------------------------------------------------------------------
> Linux and FreeBSD could use --with-system-icu even with much newer ICU
> versions, and system ICU upgrades would not require AOO upgrades.
> 
> When we are building with Clang or GCC, we might be able to build ICU by
> using -std=gnu++98 for other AOO code, and -std=<something else> for ICU.
> 
> What about Windows? ICU provides prebuilt binaries for both Win32 and
> Win64, that we could use instead of building our own, hopefully allowing us
> to link against them from our older MSVC compiler/linker. That precludes
> the use of patches to the source code though. Or we could use Clang to
> build ICU and MSVC to build the rest of AOO.

We have a patch to file source/layout/ArabicShaping.cpp, that I do not
fully understand, and therefore I suggest we keep it.

(we probably need an Arabic-proficient developer to fully understand
its meaning, and I do not speak Arabic, unfortunately)

But I think that adapting C code to older standard is much better than
C++; we did it already for other modules!

> Anyway, let me know what you think?

I think that changing C++ calls into C calls, as you proposed in your
message, is a great idea and could be the quickest path to get to an
updated ICU.

Maybe only starting the ``porting'' of the i18npool module will tell
how expensive this transition will be.

Adapting the ICU C interface to our current MSVC requirements should
be an easier task. Boring and possibly long, but easy.

Best regards,
-- 
Arrigo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Reply via email to