Jonas Smedegaard: > [..] > > Have a look (if interested) at /usr/share/perl5/String/Copyright.pm and > in particular the (huge when expanded) $signs_and_more_re at line 138. > > [..]
Thanks for the tips! I'm not sure if you got my other follow-ups to the bug report - I did in fact find String::Copyright, but I didn't know about the history nor plans for it, so thanks for filling me in on that. At any rate, here is an updated version of my patch, along with some test cases for Sage's copyright notices. I did try to think of a way to achieve the same logic *inside* the massive $re regexes. However I don't think this is possible, at least with my current approach - which tries to be conservative in order to adapt to humans being annoyingly inconsistent. What it does is, it joins subsequent lines only when the indent is greater than the main line (with the "Copyright" part). This means I have to call length() in an expression-replacement, which I don't think is possible to do inside a normal regex... As for speed: # with the patch $ time debian/rules debian/licensecheck.copyright licensecheck -l250 -i ^sage/build/ -r --deb-machine --merge-licenses sage > "debian/licensecheck.copyright" real 0m35.318s user 0m35.204s sys 0m0.056s # without the patch $ time debian/rules debian/licensecheck.copyright licensecheck -l250 -i ^sage/build/ -r --deb-machine --merge-licenses sage > "debian/licensecheck.copyright" real 0m31.168s user 0m31.040s sys 0m0.076s X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
--- /usr/share/perl5/String/Copyright.pm.old 2016-11-30 20:08:44.000000000 +0100
+++ /usr/share/perl5/String/Copyright.pm 2017-07-05 21:02:01.060002642 +0200
@@ -104,7 +104,7 @@
my $comma_re
= qr/$blank_re*,$blank_or_break_re|$blank_or_break_re,?$blank_re*/;
my $dash_re
- = qr/$blank_re*[-Ëââââââââï¹£ï¼]$blank_or_break_re*/;
+ = qr/$blank_re*[-Ëââââââââï¹£ï¼]+$blank_or_break_re*/;
my $owner_intro_re = qr/\bby$blank_or_break_re/;
my $owner_prefix_re = qr/[(*<@[{]/;
my $owner_initial_re = qr/[^\s!\"#$%&'()*+,.\/:;<=>?@[\\\]^_`{|}~]/;
@@ -135,6 +135,8 @@
my $years_re = qr/$yearspan_re(?:$comma_re$yearspan_re)*/;
my $owners_re = qr/$owner_prefix_re*$owner_initial_re\S*(?:$blank_re*\S+)*/;
+my $line_preamble_re
+ = qr/(?:#|\/\/|\/\*)?\s*/;
my $signs_and_more_re
= qr/(?:$chatter_re.*|$signs_re(?::$blank_or_break_re|$comma_re)$broken_sign_re?($years_re?$comma_re?$owner_intro_re?$owners_re?)|(?:\n|\z))/;
@@ -155,6 +157,14 @@
# stringify objects
$copyright = "$copyright";
+ # concatenate multi-line notices together
+ my $old_copyright;
+ do {
+ $old_copyright = $copyright;
+ $copyright =~ s/((?:^|\n)$line_preamble_re)($signs_and_more_re,?)\n($line_preamble_re)/
+ (length $4 <= length $1)? "$1$2\n$4":
+ (sub{ shift =~ m{(?:\band|,)$}; })->($2)? "$1$2 ": "$1$2, "/eg;
+ } while ($copyright ne $old_copyright);
# TODO: also parse @_ - but each separately!
my @block;
copyright-test.sh
Description: application/shellscript

