Hi Gary, Peter, * Gary V. Vaughan wrote on Mon, Oct 24, 2005 at 10:32:52AM CEST: > Peter O'Gorman wrote: > >Ralf Wildenhues wrote: > >| > >| So, let's do in some manual tail recursion elimination. > > Wow! Great work :-D
Thanks! > >| OK to apply the combined patch below? > > > >You know, I looked at sppeding up bootstrap for myself about a year ago, > >but couldn't figure out the cause. Don't be fooled into thinking that this patch needed less than a few months, on and off. I don't know how experts profile m4 scripts, but I first tried removing macros one by one (starting with a good guess) and ended up manually #define'ing DEBUG_SYM in m4-1.4.3 together with a form of manual statistical profiling: interrupt m4 in gdb every now and then, look where it's at. :) valgrind's massif for overall memory usage profiling showed me m4 uses little heap until it starts outputting; that's what finally got me to get to the point that we needed to make use of storage in order to gain on execution time. Some off-list discussion with Stepan (thanks, BTW!) got me on the right track. > >This is okay, even thouh I don't actually understand it :) Before this patch, we did this (informal notation): join(Separator, Arg1, Arg2, ...) { if Arg1 last arg produce Arg1 else if Arg1 empty call join(Separator, Arg2, Arg3, ...) else call join(Separator, Arg1Separator, Arg2, ...) end if } which ends up recursively at: -> ... join(Separator, Arg1SeparatorArg2Separator...ArgN-1, ArgN) -> Arg1SeparatorArg2Separator...ArgN Now we do roughly this: join(Separator, Arg1, Arg2, ...) { if Arg1 nonempty produce Arg1 for Arg in [Arg2, ..., ArgN]; do produce SeparatorArg2 end for else call join(Separator, Arg2, Arg3, ...) end if } Similarly with the other two functions, except that lt_combine actually did a two-staged recursion, and in that recursion called lt_join in each step. > No, not okay! > > I haven't tested the code yet, but I do remember that the areas you are > touching are exceedingly delicate. And even though on a cursory look, > your patch looks sane, this is something to apply after 2.0! Please rethink this judgement again. I deliberately tried for a minimal- invasive change; consistency-wise, it would probably be better to work with m4 lists all the time instead of converting to and from; but my patch actually keeps each of the three functions and their interfaces the same. Also, it might be much more efficient to change _some_ of the code not to use the dictionary lookup code: a list of tagged/non-tagged variables may easily be updated in _LT_DECL already; however, this is not a minimal change, either. > On the other hand, thankyou very much for taking the time to fix this > problem! I'm going to add your patch to my quilt series and enjoy the > speedup until 2.0 is done. It'll also mean that I'll have been able to > give it a good workout before approving. I can only rephrase: please rethink your post-2.0 judgement. Tell me what you need to be convinced (test suite tests for these functions, whatever). Please also remember that, if _you_ use this, the old version will not be tested any more. It might thus just as well break in between, unnoticed, or be broken before, but breakage only uncovered later. Surely I would like the patch to be dissected as much as possible before applying it; I'd also love to see it given a whirl in the next alpha release, though. Cheers, Ralf