On 17/04/2024 15:25, Luca Vizzarro wrote:
On 17/04/2024 14:22, Juraj Linkeš wrote:
I'll
experiment with some look ahead constructs. The easiest solution is to
match everything that is not * ([^*]+) but can we be certain that there
won't be any asterisk in the actual information?

We can't. But we can be reasonably certain there won't be five
consecutive asterisks, so maybe we can work with that.

We can work with that by using look ahead constructs as mentioned, which can be quite intensive. For example:

   /(?<=\n\*).*?(?=\n\*|$)/gs

looks for the start delimiter and for the start of the next block or the end. This works perfectly! But it's performing 9576 steps (!) for just two ports. The current solution only takes 10 steps in total.

Thinking of it... we are not really aiming for performance, so I guess if it simplifies and it's justifiable, then it's not a problem. Especially since this command shouldn't be called continuosly.

The equivalent /\n\*.+?(?=\n\*|$)/gs (but slightly more optimised) takes approximately 3*input_length steps to run (according to regex101 at least). If that's reasonable enough, I can do this:

  iter = re.finditer(input, "\n\*.+?(?=\n\*|$)", re.S)
  return [TestPmdPortInfo.parse(match.group(0)) for match in iter]

Another optimization is artificially adding a `\n*` delimiter at the end before feeding it to the regex, thus removing the alternative case (|$), and making it 2*len steps:

  input += "\n*"
  iter = re.finditer(input, "\n\*.+?(?=\n\*)", re.S)
  return [TestPmdPortInfo.parse(match.group(0)) for match in iter]

Let me know what you think!

Best,
Luca

Reply via email to