Broadly I meant translating into a "preferably" Deterministic (stackless/non-backtracking) Finite State Automaton.
However I note that it's not always possible to make a Deterministic FSA when you have repeatable groups which themselves don't have fixed lengths (like a+(a|abbba|aabb*aba)b); either the extglob compiler would need to start over and compile to a Non Deterministic (stacking) FSA, or just give up and go back to the recursive approach. Personally I would favour the addition of «shopt -s dfa_extglob» that would block the fall-back, causing extglobs that would need a stack to be treated as magic match-never tokens. I say "extglob", but this would also speed up silly ordinary globs like [a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z] -Martin On Tue, 11 Oct 2022 at 15:57, Phi Debian <phi.deb...@gmail.com> wrote: > > > On Tue, Oct 11, 2022 at 7:23 AM Martin D Kealey <mar...@kurahaupo.gen.nz> > wrote: > >> >> 4) compile globs into state machines, the same way that regexes get >> compiled, so that they can be matched without needing any recursion. >> >> -Martin >> > > Do you mean the NFA colud trade recursion for malloc()'d backtracking > point record? this would trade stacksize vs bss size. Or may be you mean > converting the NFA into DFA, that I could not handle myself :) > > That's why at first I thought limiting the input string length was may be > an option, but Chet demonstrated it was not. > > Then returning an error instead of core dumps on recursion too deep on the > NFA walk seems to be the thing to do, and then I thought a stack > availabitly check for a given recursive level could be done, this is pretty > easy to do and almost costless regarding perf, specially because bash (and > other shells) have access to alloca(), well we could admit that os/arch > sans alloca() could still core dump. > > A stack check with alloca() ressemble something along this lines > > int stack_too_small(size_t s) > { return alloca(s)!=NULL; > } > > This non inlinable function do allocate the stack provision, return true > if not possible, false if possible and return the provision to the stack, > the caller is then assured there is enough space in the stack for its > recursion frame. > > and the shell code would look like > ... recursive_func(...) > {... > if(stack_too_small(recursion_frame_size) > { handle nicely; // longjmp, return.... > } > > What Chet call the right size to define is possibly something like the > recursion frame size, that is the stack needed by the current recursion > level instance (its locals) + all the frame size sub function would need > until the next recurence, i.e if the call path look like > > a()-->b()-->c()-->a()... > The recursion frame size is sum(framesize(a)+framesize(b)+framesize(c)) > > Some empiric constant can also be used, for instance on linux a max > stacksize of 0x800000 is pretty common deciding we big constant like > 32K could be simple and good enough. > >