Re: [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py

Peter Maydell Fri, 12 Jan 2018 04:03:07 -0800

On 18 December 2017 at 17:45, Richard Henderson
<richard.hender...@linaro.org> wrote:
> To be used to decode ARM SVE, but could be used for any 32-bit RISC.
> It would need additional work to extend to insn sizes other than 32-bit.
>
> Signed-off-by: Richard Henderson <richard.hender...@linaro.org>


I have some comments here (mostly about the syntax, error checking
and generated code, rather than the script itself), but overall I
like this approach and I think we can drop the "RFC" from the patchset.

> ---
>  scripts/decodetree.py | 984 
> ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 984 insertions(+)
>  create mode 100755 scripts/decodetree.py
>
> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> new file mode 100755
> index 0000000000..acb0243915
> --- /dev/null
> +++ b/scripts/decodetree.py
> @@ -0,0 +1,984 @@
> +#!/usr/bin/env python

I asked on #qemu-devel for some review from people who are more
familiar with Python than I am. One of the suggestions (from
Marc-André Lureau) was to run pycodestyle on this and fix the
(mostly coding style nits) reported by it. (pycodestyle may
be called 'pep8' on older distros.)

> +#
> +# Generate a decoding tree from a specification file.
> +#
> +# The tree is built from instruction "patterns".  A pattern may represent
> +# a single architectural instruction or a group of same, depending on what
> +# is convenient for further processing.
> +#
> +# Each pattern has "fixedbits" & "fixedmask", the combination of which
> +# describes the condition under which the pattern is matched:
> +#
> +#   (insn & fixedmask) == fixedbits
> +#
> +# Each pattern may have "fields", which are extracted from the insn and
> +# passed along to the translator.  Examples of such are registers,
> +# immediates, and sub-opcodes.
> +#
> +# In support of patterns, one may declare fields, argument sets, and
> +# formats, each of which may be re-used to simplify further definitions.
> +#
> +## Field syntax:
> +#
> +# field_def    := '%' identifier ( unnamed_field )+ ( !function=identifier )?
> +# unnamed_field := number ':' ( 's' ) number
> +#
> +# For unnamed_field, the first number is the least-significant bit position 
> of
> +# the field and the second number is the length of the field.  If the 's' is
> +# present, the field is considered signed.  If multiple unnamed_fields are
> +# present, they are concatenated.  In this way one can define disjoint 
> fields.

This syntax lets you specify that fields other than the first one in
a concatenated set are signed, like
    10:5 | 3:s5
That doesn't seem to me like it's meaningful. Shouldn't the signedness
or otherwise be a property of the whole extracted field, rather than
an individual component of it? (In practice creating a signed combined
value is implemented by doing the most-significant component as sextract,
obviously.)

> +#
> +# If !function is specified, the concatenated result is passed through the
> +# named function, taking and returning an integral value.
> +#
> +# FIXME: the fields of the structure into which this result will be stored
> +# is restricted to "int".  Which means that we cannot expand 64-bit items.
> +#
> +# Field examples:
> +#
> +#   %disp   0:s16          -- sextract(i, 0, 16)
> +#   %imm9   16:6 10:3      -- extract(i, 16, 6) << 3 | extract(i, 10, 3)
> +#   %disp12 0:s1 1:1 2:10  -- sextract(i, 0, 1) << 11
> +#                             | extract(i, 1, 1) << 10
> +#                             | extract(i, 2, 10)
> +#   %shimm8 5:s8 13:1 !function=expand_shimm8
> +#                          -- expand_shimm8(sextract(i, 5, 8) << 1
> +#                                           | extract(i, 13, 1))

Do we syntax-check for accidentally specifying a field-def whose
components overlap (eg "3:5 0:5")? I think we should, but I didn't
see a check in a quick scan through the parsing code.

> +#
> +## Argument set syntax:
> +#
> +# args_def    := '&' identifier ( args_elt )+
> +# args_elt    := identifier
> +#
> +# Each args_elt defines an argument within the argument set.
> +# Each argument set will be rendered as a C structure "arg_$name"
> +# with each of the fields being one of the member arguments.
> +#
> +# Argument set examples:
> +#
> +#   &reg3       ra rb rc
> +#   &loadstore  reg base offset
> +#
> +## Format syntax:
> +#
> +# fmt_def      := '@' identifier ( fmt_elt )+
> +# fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref
> +# fixedbit_elt := [01.]+
> +# field_elt    := identifier ':' 's'? number
> +# field_ref    := '%' identifier | identifier '=' '%' identifier
> +# args_ref     := '&' identifier
> +#
> +# Defining a format is a handy way to avoid replicating groups of fields
> +# across many instruction patterns.
> +#
> +# A fixedbit_elt describes a contiguous sequence of bits that must
> +# be 1, 0, or "." for don't care.
> +#
> +# A field_elt describes a simple field only given a width; the position of
> +# the field is implied by its position with respect to other fixedbit_elt
> +# and field_elt.
> +#
> +# If any fixedbit_elt or field_elt appear then all 32 bits must be defined.
> +# Padding with a fixedbit_elt of all '.' is an easy way to accomplish that.

What is a format that doesn't specify the full 32 bits useful for?
Do you have an example of one?

> +#
> +# A field_ref incorporates a field by reference.  This is the only way to
> +# add a complex field to a format.  A field may be renamed in the process
> +# via assignment to another identifier.  This is intended to allow the
> +# same argument set be used with disjoint named fields.
> +#
> +# A single args_ref may specify an argument set to use for the format.
> +# The set of fields in the format must be a subset of the arguments in
> +# the argument set.  If an argument set is not specified, one will be
> +# inferred from the set of fields.
> +#
> +# It is recommended, but not required, that all field_ref and args_ref
> +# appear at the end of the line, not interleaving with fixedbit_elf or
> +# field_elt.
> +#
> +# Format examples:
> +#
> +#   @opr    ...... ra:5 rb:5 ... 0 ....... rc:5
> +#   @opi    ...... ra:5 lit:8    1 ....... rc:5
> +#
> +## Pattern syntax:
> +#
> +# pat_def      := identifier ( pat_elt )+
> +# pat_elt      := fixedbit_elt | field_elt | field_ref
> +#               | args_ref | fmt_ref | const_elt
> +# fmt_ref      := '@' identifier
> +# const_elt    := identifier '=' number
> +#
> +# The fixedbit_elt and field_elt specifiers are unchanged from formats.
> +# A pattern that does not specify a named format will have one inferred
> +# from a referenced argument set (if present) and the set of fields.
> +#
> +# A const_elt allows a argument to be set to a constant value.  This may
> +# come in handy when fields overlap between patterns and one has to
> +# include the values in the fixedbit_elt instead.
> +#
> +# The decoder will call a translator function for each pattern matched.
> +#
> +# Pattern examples:
> +#
> +#   addl_r   010000 ..... ..... .... 0000000 ..... @opr
> +#   addl_i   010000 ..... ..... .... 0000000 ..... @opi
> +#
> +# which will, in part, invoke
> +#
> +#   trans_addl_r(ctx, &arg_opr, insn)
> +# and
> +#   trans_addl_i(ctx, &arg_opi, insn)
> +#

I notice in the generated code that all the trans_FOO functions
are global, not file-local. That seems like it's going to lead
to name clashes down the line, especially if/when we ever get
to supporting multiple different target architectures in a single
QEMU binary.

Also from the generated code, "arg_incdec2_pred" &c don't follow
our coding style preference for CamelCase for typedef names. On
the other hand it's not immediately obvious how best to pick
a camelcase approach for them...

> +if sys.version_info >= (3, 0):
> +    re_fullmatch = re.fullmatch
> +else:
> +    def re_fullmatch(pat, str):
> +        return re.match('^' + pat + '$', str)
> +
> +def output_autogen():
> +    output('/* This file is autogenerated.  */\n\n')

"autogenerated by decodetree.py" might assist some future
person in tracking down how it got generated.

> +
> +def parse_generic(lineno, is_format, name, toks):
> +    """Parse one instruction format from TOKS at LINENO"""
> +    global fields
> +    global arguments
> +    global formats
> +    global patterns
> +    global re_ident
> +
> +    fixedmask = 0
> +    fixedbits = 0
> +    width = 0
> +    flds = {}
> +    arg = None
> +    fmt = None
> +    for t in toks:
> +        # '&Foo' gives a format an explcit argument set.

"explicit"

> +
> +def main():
> +    global arguments
> +    global formats
> +    global patterns
> +    global translate_prefix
> +    global output_file
> +
> +    h_file = None
> +    c_file = None
> +    decode_function = 'decode'
> +
> +    long_opts = [ 'decode=', 'translate=', 'header=', 'output=' ]
> +    try:
> +        (opts, args) = getopt.getopt(sys.argv[1:], 'h:o:', long_opts)
> +    except getopt.GetoptError as err:
> +        error(0, err)
> +    for o, a in opts:
> +        if o in ('-h', '--header'):
> +            h_file = a
> +        elif o in ('-o', '--output'):
> +            c_file = a
> +        elif o == '--decode':
> +            decode_function = a
> +        elif o == '--translate':
> +            translate_prefix = a
> +        else:
> +            assert False, 'unhandled option'

A --help would be helpful (as would documenting the command
line syntax in the comment at the top of the file).

thanks
-- PMM

Re: [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py

Reply via email to