On Wed, Dec 24, 2025 at 12:23:08AM +0000, Gavin Smith wrote: > Yes, if the input encoding was ISO-8859-1, the output encoding for DocBook > would be ISO-8859-1. OUTPUT_ENCODING_NAME would be set to "utf-8" from > %defaults, but this would then be immediately overridden by the input > encoding when set_document is called. That's how it works currently: it's > possible it has worked differently in the past.
I do not think so, I tesed with texinfo 5 and it is the same. > We could prefer UTF-8 for DocBook and HTML output too. There doesn't > seem to be a strong reason to prefer the input encoding. (Users could > still override the output encoding by setting OUTPUT_ENCODING_NAME on the > command line.) There's just a small chance that users are processing input > files for some niche case where they want to preserve the input encoding. > But it doesn't seem as necessary as the output is not actually broken for > these output formats. I think we should change it only if there are problems > (e.g. if DocBook processors refuse to process non-UTF-8 input). To me there are two use cases. * Users could prefer an encoding for the manual, it makes sense to use this encoding for output formats too. * There are legacy manuals in encodings that were used before but are superceded by UTF-8. In that case it would be better to have UTF-8 as output encoding independently of the input encoding. The user may not even have the tools to process files in this other encoding. > > It is not clear to me what the best interface could be. We could > > imagine using something similar to the file names encoding, ie have a > > variable like > > DOC_ENCODING_FOR_OUTPUT_ENCODING_NAME > > and if it is set to 0, the default OUTPUT_ENCODING_NAME would be left as > > is. > > > > But your approach is ok too. > > I don't see the need for a new variable for this. Users can already set > OUTPUT_ENCODING_NAME if they need to. It is not exactly the same. A variable could allow to specify one of the use cases above in a more generic way. However, given that the output encoding that would be used is UTF-8 for all the formats that could specify it and is also the default encoding. And also given that there is not much point with using anything else than UTF-8 and that it is pretty easy to change the encoding of a manual, I agree that it is not needed to add another variable. -- Pat
