Re: Warn on semantic newlines

Alejandro Colomar Thu, 27 Apr 2023 02:49:34 -0700

[CC -= Ingo, as requested]

Hi Bjarni,

On 4/27/23 03:40, Bjarni Ingi Gislason wrote:
> 
>   "groff" is not the right tool for such things, but "grep" is.

It could work for an initial implementation.  It would only have
some false positives for things like defining your own macros at
the top of a page, but that's not something I do, so I could live
with it.

> 
>   The attachment contains a shell script that tests various cases of
> defects in man pages.
> 
>   It can test for just one or few cases or all of them.
> 
>   For example create a file with
> 
> foo. bar
> foo.  bar
> foo.  Bar
> foo. Bar
> 
>   or more examples
> 
> and run 
> 
> <name of script> all <file>

$ ./semantic_newlines all ./foo.man 
checking test case nr 7
./semantic_newlines: line 803: 3*/4: syntax error: operand expected (error 
token is "/4")
./semantic_newlines: line 1450: mjd: command not found
./semantic_newlines: line 1470: /semantic_newlines.all.diff.new..112419: 
Permission denied
sed: couldn't open file /home/alx/bin/groff.comment.sed: No such file or 
directory
sed: couldn't open file /home/alx/bin/groff.comment.sed: No such file or 
directory
Input file is ./foo.man, case 1

I guess this script has dependencies that I don't know about?

$ apt-file find bin/mjd
$ 

> 
>   Later you can use the reported test numbers to just run those tests.
> 
>   The script can (still) produce a lot of wrong positive results.
> 

Regarding the script itself (and its documentation), here goes some
review:

> #!/bin/bash
> # Input
> # 1) one number, one or more files
> # 2) "all", one or more files

What's the meaning of the number?  I'd appreciate a man-page-like
documentation, hopefully available via -h.  See what I mean:

<https://github.com/nginx/unit/blob/1a485fed6a8353ecc09e6c0f050e44c0a2d30419/tools/setup-unit#L827>

> 
> # In $SEDLIB: "groff.comment.sed", "groff.TH.sed", "groff.hyphen-minus.sed",
> # "check_manuals", "strings_gt"
> #
> # "chk_manuals" uses: "in_out_put.sh", "mandoc", "groff.lint", and
> # "roff.singleword.sed" 
> 
> # Environmental variable: MANWIDTH (see man(1)) with 'm' unit
> #
> # Instead of "test-groff" (in the git repository)

I don't understand the above (expect for MANWIDTH).

> 
> *) if test "$type" -gt $total; then
>      echo "$Cmd_name: test number \"$type\" is greater than defined 
> \($total\)" >&2
>      exit 1
>    else
>      diff_file=${TMPDIR}/type.${type}.diff.new.$time
>      prof_listi="$type"
>    fi

else is unnecessary after exit.  Conditional code is harder to read,
since the brain needs to remember one more thing, so I suggest making
that code unconditional.

> 
>     if eval eval \""\${command[${prof}]}"\" "$input" > "$tempfile"  ; then
> #      eval printf "'%s\n\n'" \"Test nr. ${prof}: \${do_what[${prof}]}\"
> #      if test  "$old_filename" != "$file" ; then
>       if test "$filename_printed" = 'no'; then
>         printf '%s\n' "Input file is $input, case 1"
>         filename_printed=yes
>       fi
>       if test "$print_test_nr" = 'yes'; then
>         printf '%s\n' "Test nr. ${prof}:"
>       fi
>       eval printf "'\n%s\n\n' \"\${patch_explain[${prof}]}\""
>       eval printf "'%s\n\n' \"\${do_what[${prof}]}\""  >> 
> "$TMPDIR/${file##*/}".summary
>       case "$1" in
>         file) :
>       ;;
>         test)
>           if [ -s "$tempfile" ] ; then
>           :
> #            printf '####\n\n%s\n\n' "Input file is $file, case 2"
> #             printf '%s\n' '#### evaluate: case "test"'
>           fi
>       ;;
>         *) echo "$Cmd_name"': function evaluate: case "'"$1"'" is missing' >&2
>            exit 1
>       ;;
>       esac
> #      eval printf "'%s\n\n'" \"\${patch_explain[${prof}]}\"\
> #        | tee -a  "$diff_file"
>       cat "$tempfile"
>       printf '\n#####\n\n'
>       return 0
>     else

At this point, I already forgot what the condition was.  Cache misses
hurt in meatware too.  When writing 2-branch conditionals, prefer
writing the short branch first.  It will help human readers avoid
cache misses.  Hopefully, it will also help the CPU avoid similar
problems.

Also, for when both branches are similarly long, putting the
exceptional condition in the first branch usually results in faster
and more readable code, since the else branch usually loads less
instructions[1].

>       return 1
>     fi

On top of that, this branch has a return, so when reversed, the
other branch wouldn't even need to be within an else, which
reduces the complexity even more.

[1]:  

if (A == exceptional_value)
        f();
else
        g();

translates into

        load A;
        if not exceptional_value, jump to else;

        f();
        jump to done;
else:
        g();
done:

This kind of micro-optimization is probably too much to be considered
sane, but I find that it also optimizes for human brains reading the
code (at least for mine).

Cheers!
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

OpenPGP_signature
Description: OpenPGP digital signature

Re: Warn on semantic newlines

Reply via email to