Took me a bit of looking, but the issue is in the details- the decision character for "stuff" is '-' since that is always the beginning of "ruler", the first regex under "control_2" which is the first regex following "stuff". Changing to "-" and re-ordering the stuff regex fixes the proof-of-concept, and speeds it up roughly 40% on my system.
regex stuff { [ # stuff is a group of either <-[-]>+: # a ratcheting string of non-decision points. Removing ratcheting makes it hang on Yary's system. || # or '-' # a "dash" decision point ]* # 0-many of those. Greedy or non-greedy both work, about the same speed. } # end regex Will commit this soon. (and thanks for making this runnable & pointing out that I can use your test files all in the repo) -y On Mon, Mar 29, 2021 at 8:07 PM Joseph Brenner <doom...@gmail.com> wrote: > That's interesting thinking... I've been playing around with the idea > over here, but haven't got it to work (it fails to parse): > > > https://github.com/doomvox/raku-study/blob/main/bin/2021mar28/doomfiles_browse_sequence-iii.raku > > This version does work-- just using greedy matching on "stuff" makes > it use orders-of-magnitude less resources: > > > https://github.com/doomvox/raku-study/blob/main/bin/2021mar28/doomfiles_browse_sequence-ii.raku > > > And yes, I would guess that the non-greedy matching probably works > here because the following material is effectively pinned at the end > of the document. > > Note that you should be able to run these scripts as written, provided > you also pull copies of these source files: > > https://github.com/doomvox/raku-study/tree/main/dat/doomfiles > > > > On 3/29/21, yary <not....@gmail.com> wrote: > > Hi Joe & other Raku study group attendees, > > > > At the time I left, we were looking at a grammar with a speed-memory > issue > > on large-ish files. I had a germ of an idea which I couldn't express, and > > from the meeting notes I see you have a simple fix *"by changing stuff > > regex (.\*?) to non-greedy (.\*)*" I suspect the greedy-optimization > works > > because the thing after the "stuff" regex is near the end of the file. > Thus > > if instead it was close to the beginning, it would have a similar issue > > with greedy and non-greedy would fix. > > > > With a night to sleep on it, the thing I was thinking & trying to say is > > that, in the specialized HTML-grammar you had, the decision points are > all > > at left-brackets. By re-writing "stuff" so that it will only backtrack > when > > it hits a bracket, I expect more speed-memory gains. > > > > How well does this perform vs the simple .* greedy fix? > > > > regex stuff > > { ( # capture stuff (positional capture might not be needed) > > [ # Stuff is a group of either > > \< # a left-bracket decision point > > || # or > > <-[ \< ]>+: # a ratcheting string of non-decision points > > ]* # 0-many of those. Greedy or non-greedy both > work? > > ) } # end capture, end regex > > > > This was harder to express verbally & in code than I expected! > > > > -y > > > > > > On Sun, Mar 28, 2021 at 4:23 PM Joseph Brenner <doom...@gmail.com> > wrote: > > > >> I did send this one out, but it doesn't seem that it went out exactly, > >> so let's try this one more time. The Study Group is happening, > >> already in progress, though we'll be taking a break next week and > >> broadcasting a burning yule log with the soundtrack to Jesus Christ > >> Superstar. (Just kidding) > >> > >> > >> Flaming Carrot, "Night Patrol" (1986) by Bob Burden: > >> > >> I feel it rising now... > >> ... like little bubbles... > >> THE MOON IS FULL... > >> ... in a full moon, your brain floats to top of your head... > >> I feel it... > >> beginning to boil... > >> a lot will happen tonight. > >> > >> The Raku Study Group > >> > >> March 28, 2021 1pm in California, 8pm in the UK > >> > >> Zoom meeting link: > >> > >> > https://us02web.zoom.us/j/81127128506?pwd=N0I5bkxUZTRLaWwxN2RJTGlsT254QT09 > >> > >> Passcode: 4RakuRoll > >> > >> RSVPs are useful, though not needed: > >> https://www.meetup.com/San-Francisco-Perl/events/277163968/ > >> _______________________________________________ > >> SanFrancisco-pm mailing list > >> sanfrancisco...@pm.org > >> https://mail.pm.org/mailman/listinfo/sanfrancisco-pm > >> > > >