As you have no doubt figured out, for input and output I am converting, as best I can, from system locale to CP1252 for "ASCII" and CP1140 for EBCDIC.
We can't use UTF-8 internally for most purposes, because going back to a time before the Cuban Missile Crisis means that COBOL is built around an assumption that a character is a byte is a character. Wide characters, and the variable length codepoints of UTF-8, need not apply. The language does have provision for that newfangled stuff, but we haven't implemented it as yet. So, please, stick with the default 1252 for existing code -- as you noted, changing the page breaks some tests. Handling additional code pages is a can we've been kicking down the road. (Actually, I have a "can cannon" that I use to launch cans over the horizon. But don't tell anybody.) > -----Original Message----- > From: Iain Sandoe <iains....@gmail.com> > Sent: Friday, March 21, 2025 06:06 > To: rdub...@symas.com; gcc-patches@gcc.gnu.org > Subject: [PATCH] cobol: Address some iconv issues. > > Darwin/macOS installed libiconv does not accept // trailers on > conversion codes; (it does accept // with TRANSLIT etc after it) > Anyway the current setting causes the init_iconv to fail - and > then that SEGVs later. So let's at least print a warning if we > fail to init the conversion. > > Secondly, using Windows code page 1252 as a default seems overly > restrictive. Ideally, we should be using something like "char" > which represents the prevailing charset for the locale. However > that causes testsuite fails, since the tests are expecting CP1252 > or similar - for Apple/Darwin, we should use ISO-8859-1 (the actual > system, in common with most modern systems uses UTF-8). > > NOTE I seem to be unable to use LC_ALL= to override this (but I > did not attempt to sort that out so far). This is just a patch to > allow build to succeed on Darwin/macOS. > > gcc/cobol/ChangeLog: > > * symbols.cc : Initialise standard_internal to ISO8859-1 > for Apple/Dawin platforms. > (cbl_field_t::internalize): Print a warning if we fail to > initialise iconv. > > Signed-off-by: Iain Sandoe <i...@sandoe.co.uk> > --- > gcc/cobol/symbols.cc | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/gcc/cobol/symbols.cc b/gcc/cobol/symbols.cc > index e078412e4ea..ebabcfd3070 100644 > --- a/gcc/cobol/symbols.cc > +++ b/gcc/cobol/symbols.cc > @@ -3566,7 +3566,12 @@ cbl_field_t::is_ascii() const { > * compilation, if it moves off the default, it adjusts only once, and > * never reverts. > */ > -static const char standard_internal[] = "CP1252//"; > +static const char standard_internal[] = > +#if __APPLE__ > +"ISO8859-1"; > +#else > +"CP1252//"; > +#endif > extern os_locale_t os_locale; > > static const char * > @@ -3594,6 +3599,10 @@ cbl_field_t::internalize() { > static iconv_t cd = iconv_open(tocode, fromcode); > static const size_t noconv = size_t(-1); > > + if (cd == (iconv_t)-1) { > + yywarn("failed iconv_open tocode = '%s' fromcode = %s", tocode, > fromcode); > + } > + > // Sat Mar 16 11:45:08 2024: require temporary environment for testing > if( getenv( "INTERNALIZE_NO") ) return data.initial; > > -- > 2.39.2 (Apple Git-143)