On Wed, Apr 17, 2024 at 5:29 PM Luca Vizzarro <luca.vizza...@arm.com> wrote:
>
> On 17/04/2024 15:25, Luca Vizzarro wrote:
> > On 17/04/2024 14:22, Juraj Linkeš wrote:
> >>> I'll
> >>> experiment with some look ahead constructs. The easiest solution is to
> >>> match everything that is not * ([^*]+) but can we be certain that there
> >>> won't be any asterisk in the actual information?
> >>
> >> We can't. But we can be reasonably certain there won't be five
> >> consecutive asterisks, so maybe we can work with that.
> >
> > We can work with that by using look ahead constructs as mentioned, which
> > can be quite intensive. For example:
> >
> >    /(?<=\n\*).*?(?=\n\*|$)/gs
> >
> > looks for the start delimiter and for the start of the next block or the
> > end. This works perfectly! But it's performing 9576 steps (!) for just
> > two ports. The current solution only takes 10 steps in total.
>
> Thinking of it... we are not really aiming for performance, so I guess
> if it simplifies and it's justifiable, then it's not a problem.
> Especially since this command shouldn't be called continuosly.
>

We have to weigh the pros and cons on an individual basis. In this
case, the output is going to be short so basically any solution is
going to be indistinguishable from any other, performance wise.

> The equivalent /\n\*.+?(?=\n\*|$)/gs (but slightly more optimised) takes
> approximately 3*input_length steps to run (according to regex101 at
> least). If that's reasonable enough, I can do this:
>
>    iter = re.finditer(input, "\n\*.+?(?=\n\*|$)", re.S)
>    return [TestPmdPortInfo.parse(match.group(0)) for match in iter]
>
> Another optimization is artificially adding a `\n*` delimiter at the end
> before feeding it to the regex, thus removing the alternative case (|$),
> and making it 2*len steps:
>
>    input += "\n*"
>    iter = re.finditer(input, "\n\*.+?(?=\n\*)", re.S)
>    return [TestPmdPortInfo.parse(match.group(0)) for match in iter]
>

I like this second one a bit more. How does the performance change if
we try to match four asterisks "\n\****.+?(?=\n\****)"? Four asterisks
shouldn't randomly be in the output as that's basically another
delimited.

And we should document this in the docstring - sample output, then
explain the extra characters and the regex itself. We shouldn't forget
this in the other commit as well (show port stats).

> Let me know what you think!
>
> Best,
> Luca

Reply via email to