Re: [PATCH] syntax highlight: allows any program to be used as a syntax highlighter

Patrice Dumas Sat, 23 Nov 2024 11:20:09 -0800

On Sat, Nov 23, 2024 at 12:16:14AM +0000, Carlos Maniero wrote:
> > I agree, it would be better to use HIGHLIGHT_SYNTAX.  If the value is
> > not a known value, then the command would be called.  I prefer a called
> > with fixed informations rather than %X arguments replacement.  That
> > means that users would have to use wrappers to translate the arguments
> > to a command doing the highlighting, but that does not seems to be an
> > issue to me.
> 
> The suggestion is to follow the behavior of my patch but to use the existent
> variable, right?


I think that the code in tp/ext/highlight_syntax.pm could be directly
extended.

>  If "source-highlight --src-lang" is set as
> HIGHLIGHT_SYNTAX, the command that will be called for an "@example perl"
> will be "source-highlight --src-lang perl".  The @example text will be
> used as stdin of the command and the command stdout will be used as the
> highlighted HTML.  Is that correct?

It is the idea, yes (for the first mode).

> > There are two modes for HIGHLIGHT_SYNTAX.  Pass fragments on stdin and
> > get results on stdout.  In that case, I think that the only argument
> > should be the $language.  The other possibility is to go through input
> > and output files, with fragments to be highlited separated by
> >
> > _______________________ $counter\n
> >
> > line ranges passed on the command line and the output files is supposed
> > to use a range separator passed on the command line.  In that case, more
> > arguments should be passed, the language, the input and output files,
> > the line ranges and the range separator.
> 
> Considering that you are fine with users having to write their own
> wrappers, WDYT about just using the stdin and stdout instead of also
> accept input/output files?  So the new variable you suggested can be
> used to format the data sent over stdin.

> Example "-c HIGHLIGHT_SYNTAX_BULK_SEPARATOR=-->8--":
> 
>   stdin:
> 
>     my example fragment 1
>     -->8--
>     my example fragment 2
> 
>   expected stdout:
> 
>     <b>my example fragment 1</b>
>     -->8--
>     <b>my example fragment 2</b>
> 
> So we could use the same separator to identify the highlighted fragment
> expecting that they came out at the same order.

That is very similar with the mode used for source-highlight, but with
stdin and stdout instead of using an input and an output file.  I think
that there is no need to implement that as a different case of the mode
two.  Indeed, what I proposed would be that the command is called like

$HIGHLIGHT_SYNTAX $language $input $output $separator $line_ranges...

Therefore the user could do a wrapper like (not tested, to get the
idea):

# ================================================
#! /bin/sh
language=$1
input=$2
output=$3

sed 's/^_______________________ [0-9]\\+/-->8--/' $input | real_command > 
$output

# ================================================

> > Maybe using a user-specified command could be first implemented for
> > the first mode only.
> 
> Sorry, I didn't understand what you meant.

I meant that it could be possible to implement user-defined HIGHLIGHT_SYNTAX
for the first mode only, where each fragment is send on stdout and the result
comes in stdin and the argument $HIGHLIGHT_SYNTAX use called with is
$language.

> > No problem with breaking HIGHLIGHT_SYNTAX (reasonably), this is marked
> > clearly as experimental.
> 
> If breaking changes are not an issue, do you think we can stop receiving
> the programs and expect customers to update their HIGHLIGHT_SYNTAX
> config with a command-line to be executed?

My view was that for known values passed through HIGHLIGHT_SYNTAX there
would be a specific code run as is already implemented, but if the value
passed through HIGHLIGHT_SYNTAX is unknown, it is interpreted as a
command to run.

We could wait for an actual need to implement the use of input and
output files and a call like
 $HIGHLIGHT_SYNTAX $language $input $output $separator $line_ranges...
similar to what is used currently for source-highlight that you aptly
called 'bulk optimization'.

> Because considering we start accepting a command-line, users will be
> able to replace "-c HIGHLIGHT_SYNTAX=source-highlight" with
> "-c HIGHLIGHT_SYNTAX=source-highlight --src-lang" and they will have
> almost the same behaviour, except that:
> 
> 1. They will not have the bulk optimization
> 2. They will not have the language map translation (currently, it swap
>    C++ to C since source-highlight does not expects C++ as --src-lang).

I see no specific reason to remove those two features, they seem to be
useful to me.  Another feature is the check of the availability of
the language.

What could be changed is the syntax, maybe the recognized values could
be

 default:highlight

 default:pygments

 default:source-highlight

such that source-highlight, for example would be called as a
user-defined program.

> The same may be achievable by the other highlighters (I have only tested
> with source-highlight so far).
> 
> Regarding the optimization, the document may need to have a huge number
> of examples to this become an issue.  For users for whom performance is
> sensitive, they have the option to use the bulk option and write their
> own wrapper.
> 
> This will significantly simplify the highlight logic and the tex2any
> will no longer have so much knowledge on the external tools used to
> highlight.
> 
> But if you think it is better to keep the current behaviour and
> treat unknowns programs to be treated as command to be executed I can
> just extend the current implementation to support it.

I think that it would be better, indeed, with possibly a change in
values that trigger using some knowledge on the external tool.

-- 
Pat

Re: [PATCH] syntax highlight: allows any program to be used as a syntax highlighter

Reply via email to