On 16/01/2023 6:10 pm, Anthony PERARD wrote:
> The get-fields.sh which generate all the include/compat/.xlat/*.h
> headers is quite slow. It takes for example nearly 3 seconds to
> generate platform.h on a recent machine, or 2.3 seconds for memory.h.
>
> Rewriting the mix of shell/sed/python into a single python script make
> the generation of those file a lot faster.
>
> No functional change, the headers generated are identical.
>
> Signed-off-by: Anthony PERARD <anthony.per...@citrix.com>

I'll happily trust the testing you've done, and that the outputs are
equivalent.  However, I have some general python suggestions.

> diff --git a/xen/tools/compat-xlat-header.py b/xen/tools/compat-xlat-header.py
> new file mode 100644
> index 0000000000..c1b361ac56
> --- /dev/null
> +++ b/xen/tools/compat-xlat-header.py
> @@ -0,0 +1,468 @@
> +#!/usr/bin/env python
> +
> +from __future__ import print_function
> +import re
> +import sys
> +
> +typedefs = []
> +
> +def removeprefix(s, prefix):
> +    if s.startswith(prefix):
> +        return s[len(prefix):]
> +    return s
> +
> +def removesuffix(s, suffix):
> +    if s.endswith(suffix):
> +        return s[:-len(suffix)]
> +    return s
> +
> +def get_fields(looking_for, header_tokens):
> +    level = 1
> +    aggr = 0
> +    fields = []
> +    name = ''
> +
> +    for token in header_tokens:
> +        if token in ['struct', 'union']:

('struct', 'union')

A tuple here rather than a list is marginally more efficient.

> +            if level == 1:
> +                aggr = 1
> +                fields = []
> +                name = ''
> +        elif token == '{':
> +            level += 1
> +        elif token == '}':
> +            level -= 1
> +            if level == 1 and name == looking_for:
> +                fields.append(token)
> +                return fields
> +        elif re.match(r'^[a-zA-Z_]', token):

Here and many other places, you've got re.{search,match} with repeated
regexes.   This can end up being quite expensive, and we already had one
build system slowdown caused by it.

Up near typedefs at the top, you want something like:

token_re = re.compile('[a-zA-Z_]')

to prepare the regex once at the start of day, and and use 'elif
token_re.match(token)' here.

Another thing to note, the difference between re.search and re.match is
that match has an implicit '^' anchor.  You need to be aware of this
when compiling one regex to be used with both.


All of this said, where is 0-9 in the token regex?  Have we just got
extremely lucky with having no embedded digits in identifiers thus far?

> +            if not (aggr == 0 or name != ''):
> +                name = token
> +
> +        if aggr != 0:
> +            fields.append(token)
> +
> +    return []
> +
> +def get_typedefs(tokens):
> +    level = 1
> +    state = 0
> +    typedefs = []

This shadows the global typedefs list, but the result of calling this
function is simply assigned to it.  Looking at the code, the global list
is used in several places

It would be better to "global typedefs" here, and make this function
void, unless you want to modify handle_field() and build_body() to take
it in as a regular parameter.

I'm pretty sure typedefs actually wants to be a dict rather than a list
(will have better "id in typedefs" performance lower down), but that
wants matching with code changes elsewhere, and probably wants doing
separately.

> +    for token in tokens:
> +        if token == 'typedef':
> +            if level == 1:
> +                state = 1
> +        elif re.match(r'^COMPAT_HANDLE\((.*)\)$', token):
> +            if not (level != 1 or state != 1):
> +                state = 2
> +        elif token in ['[', '{']:
> +            level += 1
> +        elif token in [']', '}']:
> +            level -= 1
> +        elif token == ';':
> +            if level == 1:
> +                state = 0
> +        elif re.match(r'^[a-zA-Z_]', token):
> +            if not (level != 1 or state != 2):
> +                typedefs.append(token)
> +    return typedefs
> +
> +def build_enums(name, tokens):
> +    level = 1
> +    kind = ''
> +    named = ''
> +    fields = []
> +    members = []
> +    id = ''
> +
> +    for token in tokens:
> +        if token in ['struct', 'union']:
> +            if not level != 2:
> +                fields = ['']
> +            kind = "%s;%s" % (token, kind)
> +        elif token == '{':
> +            level += 1
> +        elif token == '}':
> +            level -= 1
> +            if level == 1:
> +                subkind = kind
> +                (subkind, _, _) = subkind.partition(';')
> +                if subkind == 'union':
> +                    print("\nenum XLAT_%s {" % (name,))
> +                    for m in members:
> +                        print("    XLAT_%s_%s," % (name, m))
> +                    print("};")
> +                return
> +            elif level == 2:
> +                named = '?'
> +        elif re.match(r'^[a-zA-Z_]', token):
> +            id = token
> +            k = kind
> +            (_, _, k) = k.partition(';')
> +            if named != '' and k != '':
> +                if len(fields) > 0 and fields[0] == '':
> +                    fields.pop(0)
> +                build_enums("%s_%s" % (name, token), fields)
> +                named = '!'
> +        elif token == ',':
> +            if level == 2:
> +                members.append(id)
> +        elif token == ';':
> +            if level == 2:
> +                members.append(id)
> +            if named != '':
> +                (_, _, kind) = kind.partition(';')
> +            named = ''
> +        if len(fields) != 0:
> +            fields.append(token)
> +
> +def handle_field(prefix, name, id, type, fields):
> +    if len(fields) == 0:
> +        print(" \\")
> +        if type == '':
> +            print("%s(_d_)->%s = (_s_)->%s;" % (prefix, id, id), end='')
> +        else:
> +            k = id.replace('.', '_')
> +            print("%sXLAT_%s_HNDL_%s(_d_, _s_);" % (prefix, name, k), end='')
> +    elif not re.search(r'[{}]', ' '.join(fields)):
> +        tag = ' '.join(fields)
> +        tag = re.sub(r'\s*(struct|union)\s+(compat_)?(\w+)\s.*', '\\3', tag)
> +        print(" \\")
> +        print("%sXLAT_%s(&(_d_)->%s, &(_s_)->%s);" % (prefix, tag, id, id), 
> end='')
> +    else:
> +        func_id = id
> +        func_tokens = fields
> +        kind = ''
> +        array = ""
> +        level = 1
> +        arrlvl = 1
> +        array_type = ''
> +        id = ''
> +        type = ''
> +        fields = []
> +        for token in func_tokens:
> +            if token in ['struct', 'union']:
> +                if level == 2:
> +                    fields = ['']
> +                if level == 1:
> +                    kind = token
> +                    if kind == 'union':
> +                        tmp = func_id.replace('.', '_')
> +                        print(" \\")
> +                        print("%sswitch (%s) {" % (prefix, tmp), end='')
> +            elif token == '{':
> +                level += 1
> +                id = ''
> +            elif token == '}':
> +                level -= 1
> +                id = ''
> +                if level == 1 and kind == 'union':
> +                    print(" \\")
> +                    print("%s}" % (prefix,), end='')
> +            elif token == '[':
> +                if level != 2 or arrlvl != 1:
> +                    pass
> +                elif array == '':
> +                    array = ' '
> +                else:
> +                    array = "%s;" % (array,)
> +                arrlvl += 1
> +            elif token == ']':
> +                arrlvl -= 1
> +            elif re.match(r'^COMPAT_HANDLE\((.*)\)$', token):
> +                if level == 2 and id == '':
> +                    m = re.match(r'^COMPAT_HANDLE\((.*)\)$', token)
> +                    type = m.groups()[0]
> +                    type = removeprefix(type, 'compat_')

So this pattern is what lead to the introduction of the := operator in
Python 3.8, but we obviously can't use it yet.

At least pre-compiling the regexes will reduce the hit from the double
evaluation here.

~Andrew

P.S. I probably don't want to know why we have to special case evtchn
port, argo port and domain handle.  I think it says more about the this
bodge of a parser than anything else...

Reply via email to