On Fri, May 09, 2025 at 02:01:32PM +0100, Peter Maydell wrote:
> On Thu, 8 May 2025 at 18:01, Daniel P. Berrangé <berra...@redhat.com> wrote:
> >
> > The new attempt at detecting missing SPDX-License-Identifier in
> > new files is using the following logic
> >
> >  * When seeing a line starting 'diff --git ...' it indicates
> >    the start of a file in the patch. This must trigger reporting
> >    of violations in the previous file (if any).
> >
> >    It must reset the validation state, since this may now be a
> >    pre-existing file being changed. This will be resolved by
> >    the next rule.
> >
> >  * When seeing a line starting 'new file mode...' it indicates
> >    a newly created file and must enable SPDX validation.
> >
> >  * When seeing EOF, it must trigger reporting of violations in
> >    the last new file in the patch, if any.
> 
> > @@ -1442,6 +1457,8 @@ sub process {
> >         my $in_imported_file = 0;
> >         my $in_no_imported_file = 0;
> >         my $non_utf8_charset = 0;
> > +       my $expect_spdx = 0;
> > +       my $expect_spdx_file;
> >
> >         our @report = ();
> >         our $cnt_lines = 0;
> > @@ -1679,9 +1696,38 @@ sub process {
> >                         WARN("added, moved or deleted file(s), does 
> > MAINTAINERS need updating?\n" . $herecurr);
> >                 }
> >
> > +# All new files should have a SPDX-License-Identifier tag
> > +               if ($line =~ /^diff --git/) {
> > +                   # Start of file diff marker, report last file if it 
> > failed
> > +                   # SPDX validation
> > +                   if (defined $expect_spdx_file) {
> > +                       &check_spdx_present($expect_spdx_file);
> > +                   }
> > +
> > +                   # Reset state ready to find new file
> > +                   $expect_spdx = 0;
> > +                   $expect_spdx_file = undef;
> 
> 
> We already have a point in the code where we say "ah, this looks
> like the start of a new file" (under the comment "extract the
> filename as it passes"), and it looks for two possible regexes,
> not just "diff --git". Maybe we should combine these so that
> we have something like
> 
>                 if ($line =~ /^diff --git.*?(\S+)$/) {
>                         handle_end_of_file($realfile) if $realfile ne '';
>                         $realfile = $1;
>                         $realfile =~ s@^([^/]*)/@@ if (!$file);
>                         handle_start_of_file($realfile);
>                 } elsif ($line =~ /^\+\+\+\s+(\S+)/) {
>                         handle_end_of_file($realfile) if $realfile ne '';
>                         $realfile = $1;
>                         $realfile =~ s@^([^/]*)/@@ if (!$file);
>                         handle_start_of_file($realfile);
> 
>                         $p1_prefix = $1;
>                         if (!$file && $tree && $p1_prefix ne '' &&
>                             -e "$root/$p1_prefix") {
>                                 WARN("patch prefix '$p1_prefix'
> exists, appears to be a -p0 patch\n");
>                         }
> 
>                         next;
> 
> (side note: seems odd that we have 'next' here but not in the
> previous half of this if()...)
> 
>                 }
> 
> and move checkfilename() to inside handle_start_of_file(),
> and have the spdx check handling done inside
> handle_start_of_file() and handle_end_of_file() ?
> 
> > +       # End of diff, report last file block if it failed
> 
> and we would call
>           handle_end_of_file($realfile) if $realfile ne '';

Yeah, having standlone handle_start_of_file/handle_end_of_file
methods would make it easier to understand what's going on, as
this method with all the check rules is insanely long and hard
to follow.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to