po 1. 6. 2026 v 23:15 odesÃlatel John Kacur <[email protected]> napsal:
>
> The timerlat hist command fails to parse short options with attached
> numeric arguments (e.g., -p100) due to conflicts between digit characters
> used as option values and numeric arguments to other options.
>
> This issue was discovered when testing rtla 7.1.0-rc6 with rteval,
> which passes arguments in the compact -p100 format. The rteval tests
> failed with the confusing error "no-irq and no-thread set, there is
> nothing to do here" even though neither option was specified.
>
> The root cause is two-fold:
>
> 1. Digit characters ('0'-'9') were used as short option values for
> long-only options like --no-irq, --no-thread, etc. This caused
> getopt_auto() to generate an option string like 'a:b:...:u0123456:7:8:9'
> which made getopt treat digits as valid option characters.
>
> 2. The two-phase option parsing approach (alternating calls between
> common_parse_options() and local option parsing) confused getopt's
> internal state when encountering arguments like -p100.
>
What actually happens is that the call to getopt_long() in
common_parse_options() does not recognize -p, but it still increments
the internal static variable nextchar by 1 before falling back to
timerlat_hist_parse_args()'s getopt_long(). That means -p is ignored
entirely and timerlat_hist_parse_args() only sees -100.
Note that options are not required to trigger the bug:
$ rtla timerlat hist -nu -c 0 -d 1s
# RTLA timerlat histogram
# Time unit is microseconds (us)
# Duration: 0 00:00:02
(rtla 7.0)
vs:
$ rtla timerlat hist -nu -c 0 -d 1s
rtla timerlat hist -nu -c 0 -d 1s
# RTLA timerlat histogram
# Time unit is nanoseconds (ns)
# Duration: 0 00:00:02
(rtla 6.19)
Again, the nanosecond option gets dropped by the
common_parse_options() mechanism pushing nextchar to 1.
> When a user passed -p100, getopt would incorrectly parse it as three
> separate options: -p, -1, -0, and -0, silently setting no_irq and
> no_thread flags instead of recognizing "100" as the period argument.
>
> The two-phase parsing was introduced in commit 850cd24cb6d6 ("tools/rtla:
> Add common_parse_options()") which first appeared in v7.0-rc1. Prior to
> that commit, -p100 worked correctly. The digit characters as option
> values existed since the original timerlat implementation, but only
> became problematic when combined with the two-phase parsing approach.
>
Note that RTLA documentation only ever mentions the syntax "-p 100".
Nevertheless, this is a real regression, and it's not unreasonable for
users to assume the syntax without the space also works, as is common
for most commands on Un*x, for example, gcc's -I/include/path syntax.
> Fix this by:
>
> 1. Eliminating digit characters from the option string by filtering them
> out in getopt_auto(). This prevents conflicts with numeric arguments.
>
> 2. Refactoring timerlat_hist_parse_args() to use single-pass option
> parsing. Instead of alternating between common_parse_options() and
> local parsing, merge all options (common and local) into a single
> option table and parse them in one pass. This matches the approach
> used by cyclictest and other tools.
This is a partial revert of the common_parse_options() patchset [1],
it does fix the bug but only for one tool (timerlat hist).
getopt_long()'s design does not allow the user to reset its internal
nextchar variable; it can be reset (by calling it with optind = 0, not
1 as the documentation says) but that would require a lot of work, as
we'd have to calculate and restore the original nextchar. It might be
the easiest to revert the entire consolidation patchset [1], if that's
worth it.
[1]
https://lore.kernel.org/linux-trace-kernel/[email protected]/T/#u
<truncated>
Tomas