On Sat, 29 Oct 2022 at 22:15, Greg Wooledge <g...@wooledge.org> wrote:
> On Sat, Oct 29, 2022 at 04:50:00PM +1100, Martin D Kealey wrote: > > This seems like a good reason to simply translate extglobs into regexes, > > which should run in linear time, rather than put effort into building and > > debugging a parallel implementation. > > This isn't straightforward, because of the !(list) feature of extglob. > There's no analogous construct for that in standard regexes. > That's true. and my "simply" is understating the complexity of the task, but it's not impossible. There was a recent discussion about this very point on irc://libera.chat#Bash where it was pointed out that supporting the !(list) style of inversions in the regex compiler is simply a matter of inverting the "success" or "failure" indications attached to the states when compiling a regex, which means that it works correctly when these constructs are nested. So the options would seem to be: (a) prohibit inversions (you get to pick EITHER extglob or rexglob, not both); (b) bypass convert-to-regex when inversions are present; (c) use PCRE or Vim RE, which already support negations (though not in the same form); note that these do not have linear ("regular") time performance in any but the trivial cases; (d) compute the "inverse" regex for a given negation (this may require Vim RE, see below); (e) post-process the compiled regex (this would be highly dependent on the specific RE implementation); (f) pick an existing regex engine, and add the necessary logic to handle negations and conjunctions (see below); A particular difficulty is handling !(!(X)|!(Y)) which by de Morgan's law translates into @(X&Y) - meaning a target needs to match BOTH X and Y. Only Vim's Regex engine has direct support for that construct, and it exhibits non-linear timing when using it. A naïve implementation could easily require state table space that's exponential in the breadth of the conjunctions, meaning that the user could very easily write a rexglob that cannot be compiled with available memory. So there needs to be a fallback position: either fail, telling the user we cannot honour the linear speed guarantee, or "try anyway" with a procedure that could exhibit truly horrible time complexity. -Martin PS: I say "we" and "us" advisedly; let's not leave everything to Chet.