On Sun, Apr 27, 2025 at 7:07 PM Arrigo Marchiori <ard...@apache.org> wrote:

> Hello Damjan, All,
>
> On Sun, Apr 27, 2025 at 05:45:51PM +0200, Damjan Jovanovic wrote:
>
> > Hi
> >
> > I've begun researching how best to upgrade the old ICU library we use to
> > newer versions, and it does not seem easy: recent versions require "C11 &
> > C++17", which AOO code won't build with, and the MSVC compiler in
> > particular needs an upgrade to build it.
> >
> > However ICU exports both a C API and a C++ API. We currently use only the
> > C++ API, but C is a language we can also consume, and is far more
> > compatible: C++ can consume code from almost any C language version, but
> > only compatible C++ language versions.
> >
> > On FreeBSD with the very recent ICU 76.1 (from 1 October 2024), I tried
> to
> > build using --with-system-icu, and my build of course failed, because
> even
> > the ICU header files using new C++ versions can't be parsed. But I saw
> that
> > main/i18npool's gencoll_rule is a small standalone executable that uses
> > ICU, and I tried to patch it to use ICU's C API instead of the ICU C++
> API,
> > and I managed to get it to build successfully. Against ICU 76.1. From 1
> > October 2024. And it works against ICU 1.4.2 as well, which is at least
> 15
> > years of compatibility!
>
> Wonderful!
>
> > ------------------------
> > How is it done?
> > ------------------------
> > The patch demonstrating the change is attached. (I am not completely
> happy
> > with it, and may want to make further changes before committing:
> > UParseError may need to be freed, logging needs a review, there is a
> typo,
> > etc.)
> >
> > Before including ICU header files:
> > #define U_SHOW_CPLUSPLUS_API 0
> > #define U_SHOW_CPLUSPLUS_HEADER_API 0
> > will hide the C++ declarations inside ICU's headers.
> >
> > Then just use the C functions instead of the C++ classes and methods, eg.
> > ucol_openRules() instead of "new RuleBasedCollator", ucol_cloneBinary()
> > instead of RuleBasedCollator::cloneRuleData(), and call ucol_close()
> > instead of "delete" on the RuleBasedCollator pointer.
> >
> > Since C won't throw exceptions, unlike C++, exception safety should be
> > taken into effect, and nothing else that throws exceptions should be
> > called. For example, I used malloc()/free() instead of new[]/delete[], as
> > new[] throws an exception when memory runs out, while malloc() returns
> NULL.
>
> ...we could also implement our own RAII classes... just in case the C
> interface becomes _too_ tedious.
>
> > -----------------------------------------------
> > What is the scale of the change?
> > -----------------------------------------------
> > Other than "icu" which is the module's own directory and "scp2" where
> it's
> > packaged, the modules using it as per their prj/build.lst include only
> > i18npool, linguistic, and vcl. However searching through makefiles also
> > finds cui, editeng, lingucomponent, sc, svtools, svx, and sw.
> >
> > >From a quick look through these 10 modules:
> > - cui: includes unicode/ubidi.h in precompiled headers but appears not to
> > use ICU at all.
> > - editeng: includes unicode/ubidi.h and uses the C functions from it.
> > - i18npool: heavy use of ICU, including collators, calendars, regex, and
> > more, in C++.
> > - lingucomponent: does not appear to use ICU at all.
> > - linguistic: minimal use of unicode/uscript.h in one file. Already uses
> C
> > API.
> > - sc: includes unicode/uchar.h in source/core/tool/interpr1.cxx
> > - svtools: includes unicode/ubidi.h and uses a couple of C functions in
> > source/edit/texteng.cxx.
> > - svx: precompiled header includes unicode/ubidi.h and
> > source/dialog/fntctrl.cxx includes unicode/uchar.h and calls u_charType()
> > once.
> > - sw: includes unicode/ubidi.h and unicode/uchar.h in 7 files, calls
> > u_charDirection(), u_charType() and some ubidi functions. Only C API.
> > - vcl: includes a mixture of unicode/ubidi.h, unicode/uchar.h and
> > unicode/uscript.h in 3 files, uses C APIs.
> >
> > So it seems like:
> > - only i18npool uses the ICU C++ API.
> > - most ICU usage is in C already.
> > - ICU is used relatively lightly in AOO, only 10 (or less) out of our 185
> > modules use it, and in those modules only a small number of files call a
> > small number of ICU functions.
> >
> >
> --------------------------------------------------------------------------------------
> > What can we expect if we start using the C API for ICU instead?
> >
> --------------------------------------------------------------------------------------
> > Linux and FreeBSD could use --with-system-icu even with much newer ICU
> > versions, and system ICU upgrades would not require AOO upgrades.
> >
> > When we are building with Clang or GCC, we might be able to build ICU by
> > using -std=gnu++98 for other AOO code, and -std=<something else> for ICU.
> >
> > What about Windows? ICU provides prebuilt binaries for both Win32 and
> > Win64, that we could use instead of building our own, hopefully allowing
> us
> > to link against them from our older MSVC compiler/linker. That precludes
> > the use of patches to the source code though. Or we could use Clang to
> > build ICU and MSVC to build the rest of AOO.
>
> We have a patch to file source/layout/ArabicShaping.cpp, that I do not
> fully understand, and therefore I suggest we keep it.
>
> (we probably need an Arabic-proficient developer to fully understand
> its meaning, and I do not speak Arabic, unfortunately)
>
> But I think that adapting C code to older standard is much better than
> C++; we did it already for other modules!
>
> > Anyway, let me know what you think?
>
> I think that changing C++ calls into C calls, as you proposed in your
> message, is a great idea and could be the quickest path to get to an
> updated ICU.
>
> Maybe only starting the ``porting'' of the i18npool module will tell
> how expensive this transition will be.
>
> Adapting the ICU C interface to our current MSVC requirements should
> be an easier task. Boring and possibly long, but easy.
>
> Best regards,
> --
> Arrigo
>
>
Hi

Thank you.

I've now managed to patch OpenOffice to only use the C API to ICU, and it
almost fully works: it successfully compiles against (system) ICU 76.1,
starts up, opens foreign language documents, special characters can be
inserted, searching and regular expression searching work with foreign
characters, etc. (not sure whether ICU is used in those tests though).
However, the massively customized and outdated break iterators and some
collators (for Korean and Chinese?) aren't working, and the ICU layout
engine no longer exists in newer versions.

I'll send a longer email later with more details.

Regards
Damjan

Reply via email to