Re: Post release texi2any performance regression

Patrice Dumas Wed, 25 Oct 2023 15:13:21 -0700

On Wed, Oct 25, 2023 at 10:12:48PM +0100, Gavin Smith wrote:
> On Tue, Oct 24, 2023 at 06:15:06PM +0200, Patrice Dumas wrote:
> There still seems to be lacking any way to turn off the new XS code that
> is being run, in order to judge the performance impact.  TEXINFO_XS_PARSER
> is now used for a lot of the new XS code, although it is not the parser.


The use of TEXINFO_XS_PARSER is not because it is the parser, it is
because if the XS parser is not used, there is no point in loading
and calling XS code.

> I tried commenting out all the new XS overrides (diff at the end of this
> mail) and this seemed to get the execution time back to what it was at
> the time of the last release, but I am not convinced it is functioning
> properly, as error messages are doubled.

I am surprised that messages could be doubled with everything commented
out.

> These latter timings match more closely the timings of texi2any 7.1.
> 

> If I comment out a further block of code, I get rid of the duplicate
> errors:

That makes sense, but I do not get at all why they are there in the
first place if the XS code is not run.  Also it would mean that perl and
XS codes are both run, which should not happen now since yesterday
commits.  Something strange is going on.

> (Simply commenting out
> 
> $document = Texinfo::Structuring::rebuild_document($document);
> 
> improved run times dramatically, but gave incorrect output without
> disabling the other XS code.)

Indeed, all the structuring/transformations are not taken into acocunt
if rebuild_document is not called.  Currently the tree is built twice,
one by the parser and once by rebuild_document.  An optimization would
be to remove the rebuilt by the parser, as it is not useful since the
perl codes that use those data are not run with XS.

> Can I please propose that it is made easy to disable all new XS code
> in texi2any, as I have done here, so we can avoid losing the performance
> of texi2any 7.1.  I don't really care how it's done, as long as TEXINFO_XS
> and TEXINFO_XS_PARSER work as before (so TEXINFO_XS_PARSER should be limited
> to turning off the XS parser).  It could be configured with a new variable
> TEXINFO_XS_MISC or TEXINFO_XS_NEW, or depend on TEXINFO_XS_CONVERT.
> 
> It worries me that the new code is entwined with the program and hard to
> disable, and could lead to permanent slowdowns if the work doesn't proceed
> as hoped.  It would seem much safer for the development of the program
> to separate it and be able to disable it cleanly.

It is probably not so difficult to do, but I think that we should not.
First, I can't see a concrete situation where the combination of
TEXINFO_XS_PARSER and not TEXINFO_XS_MISC should never be used by users,
it would only be possibly interesting for development or timing (the
combination of not TEXINFO_XS_PARSER and TEXINFO_XS_MISC is not
possible).  Second, it makes yet another combination to test during
development, which I think is too much work for little gain.  We already
do not test much some combinations, for instance I guess that
TEXINFO_XS_PARSER=0 is almost never tested.  There will already be more
combinations with TEXINFO_XS_CONVERT set or not, adding more seems
to me to be useless.  And lastly, the comparison of TEXINFO_XS_MISC and
not TEXINFO_XS_MISC does not seem to be that useful for development to
me, as we cannot gain much insight on what to do based on those
comparisons of perl code and C code, what is interesting would
be to improve each of these codes, be it for timing or anything else.

For timing optimization, it seems to me that we should have only one
target, the combination were everything is done in XS.  To me there is
little point in trying to optimize other combinations, because they are
very unlikely to be used.  The only other combination I can imagine to
be used is the perl only combination triggered by TEXINFO_XS=omit that
could be used to workaround a bug in the full XS combination at the cost
of performance.

-- 
Pat

Re: Post release texi2any performance regression

Reply via email to