On Sat, Sep 13, 2025 at 05:48:28PM +0100, Gavin Smith wrote:
> On Wed, Sep 10, 2025 at 09:21:29AM +0200, Patrice Dumas wrote:
> > At the same time, I would like to keep the possibility to use the
> > current code, which adds checks on languages, possibly, mappings of
> > language, as well as default command line options. Also for
> > source-highlight, the processus is quite different, with separators and
> > counts and only one file.
>
> It is hard to comment without knowing in detail what highlight_syntax.pm
> does and how it works. The functions are missing any comments explaining
> what they do or the data structures they access. I am am looking at
> highlight_syntax.pm to understand how it works better. Some notes:
>
> * The language name mapping is extremely rudimentary:
>
> my %highlight_type_languages_name_mappings = (
> 'source-highlight' => {
> 'C++' => 'C',
> 'Perl' => 'perl',
> },
> 'highlight' => {
> 'C++' => 'c++',
> },
> 'pygments' => {
> 'C++' => 'c++',
> }
> );
>
> Is this useful or necessary for us to maintain on a program-by-program
> basis?
It is a pragmatic approach, when I setup the tests, I used one
possibility for the @example block, and it didn't work for all the
highlight programs, and the diverse highlight programs needed different
mappings. I agree that may not be such a good idea, however, since it
language name used in the Texinfo code could correspond to different
languages based on the user preferences. For example should 'C' be
mapped to c++ or c?
Regarding those mappings, my idea was to wait for users feedback and
add entries when relevant.
> * I am cautious about "generic" code that ends up not being generic at
> all.
The code related to extracting @example blocks with the language
information and putting back highlighted code is generic, in my opinion.
Then the way an highlighting program languages list is obtained, the way
the command is launched and how the result is obtained are highlighting
program specific.
> Currently highlight_syntax.pm has three options ("source-highlight",
> "highlight" and "pygments") with checks throughout the code which one
> is being used. In 'highlight_process', there is one block of code for
> "highlight" and "pygments" with further conditionals inside for which
> one is being used.
That's because those two are handled similarly, each @example block
leads to a call of the command, with the example block text being fed to
the standard input and the result obtained on standard output.
> Presumably what follows is the code for
> "source-highlight". This kind of code is not easily extensible to further
> options. It is often yields more straightforward code to duplicate code
> and eliminate the conditionals. (You may remember the C implementation
> of makeinfo which had conditionals for Info, HTML and XML throughout
> the code, which although it was before I was involved with Texinfo, seemed
> to me one of the problems with the maintainability of that program,
> with code originally written for one output format (Info) complexified
> to deal with other output formats.)
That could be as you say, I do not have a good feeling. I tried to have
some abstraction through the %commands data structures, both in term of
@-command, to allow to have something else than @example in the future,
and in term of highlighting program. I think that it is a good thing to
prepare for other commands than @example. Also, after reviewing the
code is seems to me that the data structure is suitable for all the
highlighting programs/possibilities.
I am not sure about having a different .pm file loaded for different
highlighting programs, though, because of the common code. I think that
it is better if the extensions like highlight_syntax.pm using the HTML
customization API are more or less self-contained. Is it what you
proposed?
> * The purpose of 'highlight_preformatted_command' (one of the callbacks)
> is not easy to understand without reading the entire function
> line-by-line. I would have appreciated a comment explaining at a high
> level what it does. It appears to save information and retrieve
> information from the module variable %commands (which is an array of
> complex data structures, which are not documented), and do checks and
> issue warnings. There is no option-specific code in this function.
I will add comments. I do not think that the difficulty of that
function is that much about the access to the %commands structure, but
rather some formatting code cut and pasted from the 'formatted' tree
element formattign code is there.
> If as you say, the "source-highlight" option uses more of the code
> in highlight-syntax.pm perhaps it is more useful to keep most of the
> code for that option?
It is clear that the "source-highlight" uses more code, because it is
possible to have the highlighting being done with one call only, but
this requires more preparation and more code.
> There is another approach which is for the syntax highlighting program
> to post-process the HTML output from texi2any. This should be possible
> as the @example blocks are marked with appropriate classes.
It does not seems to be possible to me, as the result is already HTML?
> > Therefore, it would be nice to have a way to specify the the built-in
> > code should be used. The syntax could be, for the HIGHLIGHT_SYNTAX
> > customization option:
> >
> > default:highlight
> > txi:highlight
> > preset::source-highlight
> >
> > or also with another separator than :.
> >
> > Any idea, preference, comment?
>
> I am not a fan of overloading a single variable with special syntaxes.
> I prefer the idea of having multiple variables, as in Carlos Maniero's
> orignal patch, if we are providing a feature that is not simply
> implemented with a variable with a value of a shell command.
No problem with that, and to me it is orthogonal to the implementation,
there could still be common code with the command-line case and the
"pygments" and "highlight" cases, as they are quite common.
> In the thread I linked above, somebody wanted to pass extra flags to
> pygmentize. So there you have a mixture, using the existing code but
> adding command-line options. Would the existing code be useful for this
> use case or is it mainly for "source-highlight"?
It is, you can see the in the commit
https://cgit.git.savannah.gnu.org/cgit/texinfo.git/commit/?id=5a772156f803490fa29391e8862bbbe58ded5c93
that it was relatively easy to implement as a case similar to pygments
and highlight. However, this could still be a bad idea, as the
in case the reused data structures are too complex for the neds of this
possibility. I think that it is not the case, but I may be wrong.
When reviewing the code, I noticed some issues of common data structures
that should not be, I will try to fix as much as possible. Also I will
document properly the %commands data structure (and probably rename).
--
Pat