On Wed, Dec 24, 2025 at 12:59:00AM +0100, Patrice Dumas wrote:
> On Tue, Dec 23, 2025 at 10:32:25PM +0000, Gavin Smith wrote:
> > On Sun, Dec 21, 2025 at 10:08:58PM +0000, Gavin Smith wrote:
> > > Here's a patch:
> > 
> > Here's a more complete patch.  To avoid changing the output for
> > HTML, DocBook and one other other output format ("Texinfo XML"), when the
> > input was not UTF-8, I had to remove the default OUTPUT_ENCODING_NAME
> > UTF-8 setting.  Otherwise these formats would be forced to UTF-8 as
> > well.
> 
> > I don't think that the OUPTUT_ENCODING_NAME defaults did very much,
> > but I'm not certain.  It's possible these default values stemmed from
> > a time before UTF-8 was the default input encoding for Texinfo.  (For
> > example, "git blame" tracks the setting in DocBook.pm to a commit on
> > 2012-09-14 (49aa00da6ae37), whereas UTF-8 only became the default input
> > encoding in 2019.)
> 
> My recalling, but it seems to be wrong, is that the default values were
> not related to the default input encoding, they were the default for the
> output encoding.  More precisely, it seemed to me that DocBook had
> always preferred the output to be UTF-8, independently of the
> @documentencoding, and the OUTPUT_ENCODING_NAME was there to enforce
> that.  Seems like it is not actually the case, and the documentation
> actually states that the DocBook output is based on the document input.

Yes, if the input encoding was ISO-8859-1, the output encoding for DocBook
would be ISO-8859-1.  OUTPUT_ENCODING_NAME would be set to "utf-8" from
%defaults, but this would then be immediately overridden by the input
encoding when set_document is called.  That's how it works currently: it's
possible it has worked differently in the past.

We could prefer UTF-8 for DocBook and HTML output too.  There doesn't
seem to be a strong reason to prefer the input encoding.  (Users could
still override the output encoding by setting OUTPUT_ENCODING_NAME on the
command line.)  There's just a small chance that users are processing input
files for some niche case where they want to preserve the input encoding.
But it doesn't seem as necessary as the output is not actually broken for
these output formats.  I think we should change it only if there are problems
(e.g. if DocBook processors refuse to process non-UTF-8 input).

> It is not clear to me what the best interface could be.  We could
> imagine using something similar to the file names encoding, ie have a
> variable like
>   DOC_ENCODING_FOR_OUTPUT_ENCODING_NAME
> and if it is set to 0, the default OUTPUT_ENCODING_NAME would be left as
> is.
> 
> But your approach is ok too.

I don't see the need for a new variable for this.  Users can already set
OUTPUT_ENCODING_NAME if they need to.

Reply via email to