rms said ok (a while ago, sorry) to our text about quote characters, which I'll append below for one final check. But before I check it in, we should add the promised documentation to Gnulib about the quote and quotearg modules.
So I wrote the section below, basically just formatting a message Paul sent to the list back on May 29. Seems fine as far as it goes, but we need to say something about locales. Is it true that if some environment variable (LANG?) is set to something (e.g., fr_FR?) that a "translation" of ` and ' will be used? Looking at quotearg.h, it seems so. Do any translations in fact do this? Thanks, k -- (gnulib.texi text) @node Quoting @section Quoting @cindex Quoting @findex quote @findex quotearg Gnulib provides @samp{quote} and @samp{quotearg} modules to help with quoting text, such as file names, in messages to the user. Here's an example of using @samp{quote}: @example #include <quote.h> ... error (0, errno, _("cannot change owner of %s"), quote (fname)); @end example This differs from @example error (0, errno, _("cannot change owner of `%s'"), fname); @end example @noindent in that @code{quote} escapes unusual characters in @code{fname}, e.g., @samp{'} and control characters like @samp{\n}. @findex quote_n However, a caveat: @code{quote} reuses the storage that it returns. Hence if you need more than one thing quoted at the same time, you need to use @code{quote_n}. @findex quotearg_alloc Also, the quote module is not suited for multithreaded applications. In that case, you have to use @code{quotearg_alloc}, defined in the @samp{quotearg} module, which is decidedly less convenient. -- (standards.texi text) @node Character set @section Character set @cindex character set @cindex encodings @cindex ASCII characters @cindex non-ASCII characters Sticking to the ASCII character set (plain text, 7-bit characters) is preferred in GNU source code comments, text documents, and other contexts, unless there is good reason to do something else because of the application domain. For example, if source code deals with the French Revolutionary calendar, it is OK if its literal strings contain accented characters in month names like ``Flor@'eal''. Also, it is OK to use non-ASCII characters to represent proper names of contributors in change logs (@pxref{Change Logs}). If you need to use non-ASCII characters, you should normally stick with one encoding, as one cannot in general mix encodings reliably. @node Quote characters @section Quote characters @cindex quote characters In the C locale, GNU programs should stick to plain ASCII for quotation characters in messages to users: preferably 0x60 (@samp{`}) for left quotes and 0x27 (@samp{'}) for right quotes. It is ok, but not required, to use locale-specific quotes in other locales. The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote} and @code{quotearg} modules provide a reasonably straightforward way to support locale-specific quote characters, as well as taking care of other issues, such as quoting a filename that itself contains a quote character. See the Gnulib documentation for usage details. In any case, the documentation for your program should clearly specify how it does quoting, if different than the preferred method of @samp{`} and @samp{'}. This is especially important if the output of your program is ever likely to be parsed by another program. Quotation characters are a difficult area in the computing world at this time: there are no true left or right quote characters in ASCII, or even Latin1; the @samp{`} character we use was standardized as a grave accent. Moreover, Latin1 is still not universally usable. Unicode contains the unambiguous quote characters required, and its common encoding UTF-8 is upward compatible with [EMAIL PROTECTED] However, Unicode and UTF-8 are not universally well-supported, either. This may change over the next few years, and then we will revisit this. _______________________________________________ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib