Bug#738342: lintian: checks/cruft - GFDL check is slow

Niels Thykier Sun, 09 Feb 2014 04:55:08 -0800

Package: lintian
Version: 2.5.21
Severity: normal

A quick benchmark suggests that lintian spends nearly 2 minutes on the
Linux source package (I tested with linux/3.10~rc7-1~exp1).  Profiling
Lintian with perl -d:NYTProf suggests that the vast majority of the time
is spent in:

"""
            if ($cleanedblock =~ $gfdlpattern) {
"""

Where $gfdlpattern is one of:

"""
            # classical gfdl matching pattern
            my $normalgfdlpattern = qr/
                 (?'contextbefore'(?:
                    (?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){1024}|
                    (?:\s+ copy \s+ of \s+ the \s+ license \s+ is.{0,1024}?)))
                 gnu \s+ free \s+ documentation \s+ license
                 (?'rawgfdlsections'(?:(?!gnu \s+ free \s+ documentation \s+ 
license).){0,1024}?)
                 a \s+ copy \s+ of \s+ the \s+ license \s+ is
                /xsmo;

            # for first block we get context from the beginning
            my $firstblockgfdlpattern = qr/
                 (?'rawcontextbefore'(?:
                    (?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){1024}|
                  \A(?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ 
is).){0,1024}|
                    (?:\s+ copy \s+ of \s+ the \s+ license \s+ is.{0,1024}?)
                  )
                 )
                 gnu \s+ free \s+ documentation \s+ license
                 (?'rawgfdlsections'(?:(?!gnu \s+ free \s+ documentation \s+ 
license).){0,1024}?)
                 a \s+ copy \s+ of \s+ the \s+ license \s+ is
                 /xsmo;
"""


The profiler suggests that 60% of the runtime is spent in the
"CORE:match" operations inside "license_check" from c/cruft.  The
regex appeas to be hit "only" 2452 times, but it spends an average of
55.9ms per time totalling 137s.

Bastian, do you have an ideas for reducing the cost of the regex?

~Niels


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#738342: lintian: checks/cruft - GFDL check is slow

Reply via email to