Paul Eggert wrote: > Why would gnulib itself need to care > about the difference between (2) and (4)? Either way, gnulib can > easily look for '/' and '\' in path names. Isn't it up to the > supplier of the underlying system-call implementation, and/or the > gnulib user, to decide whether (2) or (4) is in use? In other words, > can't gnulib itself be agnostic about (2) versus (4)?
Let's take an example contained in gnulib. (You can find many more examples which are half contained in gnulib and half contained in coreutils/findutils/...) Take localcharset.c. (Forget for one moment that the code is currently not used in Woe32, for different reasons.) The code currently is essentially dir = relocate (LIBDIR); /* Concatenate dir and base into freshly allocated file_name. */ { size_t dir_len = strlen (dir); size_t base_len = strlen (base); int add_slash = (dir_len > 0 && !ISSLASH (dir[dir_len - 1])); file_name = (char *) malloc (dir_len + add_slash + base_len + 1); if (file_name != NULL) { memcpy (file_name, dir, dir_len); if (add_slash) file_name[dir_len] = DIRECTORY_SEPARATOR; memcpy (file_name + dir_len + add_slash, base, base_len + 1); } } fp = fopen (file_name, "r"); In approach (2) LIBDIR will be an UTF-8 encoded pathname. The ISSLASH operation will therefore work correctly. However, fopen() expects a string in locale encoding, not in UTF-8 encoding. Therefore we have to replace the last line with char *real_file_name = u8_conv_to_locale (file_name); fp = fopen (real_file_name, "r"); free (real_file_name); Or, alternatively, replace the whose set of libc functions dealing with pathnames with wrappers that take an UTF-8 string: fp = u8_fopen (file_name, "r"); Whereas in approach (4), we can leave the code as it is. > For example, EUC-JP is also safe. Or perhaps you're not > mentioning this because Microsoft doesn't support EUC-JP? (I'm not > familiar with their support for various encodings.) I'm not familiar with it either. But the most comprehensive charset aliases table http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?rev=1.115 shows that EUC-JP is unknown as a CP<nnn> encoding, whereas UTF-8 is known as CP1208 and as CP65001. It is also mentioned in http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_17si.asp Therefore I think it's not possible to recommend an EUC-JP encoded locale to Windows users. Bruno _______________________________________________ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib