Hi guys,

During the 1.4.1 vote, we identified some files without ASF headers,
more specifically in hidden directories (like .baseline). These files
have not been detected by dev/check-license script.

The reason is because the script uses apache-rat via java -jar (the
Apache RAT CLI), executing the RAT Report class ()

When providing a directory to scan, the Report class uses a
DirectoryWalker to traverse the directories, looking for files to
check.
Unfortunately, by default, the DirectoryWalker ignores the hidden
directory (all directories starting with .):

https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/DirectoryWalker.java#L71

https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/Walker.java#L53

In our case, it means that it ignores .baseline, .github, .git,
.palantir directories. This is not good as .baseline, .github and
.palantir directories are included in our source distribution, so the
ASF headers should be clean here.

FYI, I will propose a change in rat to, at least, be able to use a
ReportConfiguration to define if we want to restrict directories or
not (be able to configure the Walkers basically). I will try to
include it in rat 0.17 (I discussed with Claude about that).
NB: rat gradle and maven plugins define their own Walkers/IReportable
to avoid this issue.

I propose three options to improve this:
1. We keep dev/check-license as it is today (using rat 0.15) and we
exclude .baseline, .github, .palantir directories from our source
distribution. It means that we will probably have issues while
building from source distribution. We can revisit dev/check-license
when we upgrade to rat 0.17

2. We keep dev/check-license but we create our own rat scanner with a
custom IReportable class considering all files/directories. Something
like:

    public static void main(String[] args) throws Exception {
        ReportConfiguration reportConfiguration = new ReportConfiguration();
        reportConfiguration.setHeaderMatcher(Defaults.createDefaultMatcher());

        XmlWriter writer = new XmlWriter(new FileWriter("report.xml"));
        ClaimStatistic claimStatistic = new ClaimStatistic();
        RatReport ratReport = XmlReportFactory.createStandardReport(
                writer,
                claimStatistic,
                reportConfiguration
        );
        ratReport.startReport();
        IncludeHiddenDirectoryWalker walker = new
IncludeHiddenDirectoryWalker(new File("/path/to/iceberg"));
        walker.run(ratReport);
        ratReport.endReport();
        writer.closeDocument();
    }

public class IncludeHiddenDirectoryWalker implements IReportable {


    private File file;

    public IncludeHiddenDirectoryWalker(File file) {
        this.file = file;
    }

    @Override
    public void run(RatReport report) throws RatException {
        process(report, file);
    }

    public void process(RatReport report, File file) {
        final File[] files = file.listFiles();
        if (files != null) {
            for (File current : files) {
                if (current.isDirectory()) {
                    process(report, current);
                } else {
                    try {
                        Document document = new FileDocument(current);
                        report.report(document);
                    } catch (RatException e) {
                        System.err.println("Can't report file " +
current.getAbsolutePath() + ": " + e);
                    }
                }
            }
        }
    }

}

So, I can contribute this in dev/src for example.

3. Instead of using dev/check-license, we can use the rat gradle
plugin (https://github.com/eskatos/creadur-rat-gradle/). I tested it
and it works fine as it uses a custom IReportable like:

https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L135

https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L189

We can include this plugin the check gradle phase, meaning that we can
verify headers for each PR.

My preference would be for 3, mainly because:
1. it integrates smoothly in our gradle ecosystem, adding a new plugin
as we have gradle-baseline-java, gradle-errorprone-plugin,
spotless-plugin-gradle, etc
2. As we can hook rat gradle plugin in the gradle check task, it means
license check will be perform at build time, including check on PR by
GitHub Actions

If you are OK with 3, I will work on:
1. a PR to use it
2. a PR for website to update release check procedure

Thoughts ?

Regards
JB

Reply via email to