Hi guys, During the 1.4.1 vote, we identified some files without ASF headers, more specifically in hidden directories (like .baseline). These files have not been detected by dev/check-license script.
The reason is because the script uses apache-rat via java -jar (the Apache RAT CLI), executing the RAT Report class () When providing a directory to scan, the Report class uses a DirectoryWalker to traverse the directories, looking for files to check. Unfortunately, by default, the DirectoryWalker ignores the hidden directory (all directories starting with .): https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/DirectoryWalker.java#L71 https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/Walker.java#L53 In our case, it means that it ignores .baseline, .github, .git, .palantir directories. This is not good as .baseline, .github and .palantir directories are included in our source distribution, so the ASF headers should be clean here. FYI, I will propose a change in rat to, at least, be able to use a ReportConfiguration to define if we want to restrict directories or not (be able to configure the Walkers basically). I will try to include it in rat 0.17 (I discussed with Claude about that). NB: rat gradle and maven plugins define their own Walkers/IReportable to avoid this issue. I propose three options to improve this: 1. We keep dev/check-license as it is today (using rat 0.15) and we exclude .baseline, .github, .palantir directories from our source distribution. It means that we will probably have issues while building from source distribution. We can revisit dev/check-license when we upgrade to rat 0.17 2. We keep dev/check-license but we create our own rat scanner with a custom IReportable class considering all files/directories. Something like: public static void main(String[] args) throws Exception { ReportConfiguration reportConfiguration = new ReportConfiguration(); reportConfiguration.setHeaderMatcher(Defaults.createDefaultMatcher()); XmlWriter writer = new XmlWriter(new FileWriter("report.xml")); ClaimStatistic claimStatistic = new ClaimStatistic(); RatReport ratReport = XmlReportFactory.createStandardReport( writer, claimStatistic, reportConfiguration ); ratReport.startReport(); IncludeHiddenDirectoryWalker walker = new IncludeHiddenDirectoryWalker(new File("/path/to/iceberg")); walker.run(ratReport); ratReport.endReport(); writer.closeDocument(); } public class IncludeHiddenDirectoryWalker implements IReportable { private File file; public IncludeHiddenDirectoryWalker(File file) { this.file = file; } @Override public void run(RatReport report) throws RatException { process(report, file); } public void process(RatReport report, File file) { final File[] files = file.listFiles(); if (files != null) { for (File current : files) { if (current.isDirectory()) { process(report, current); } else { try { Document document = new FileDocument(current); report.report(document); } catch (RatException e) { System.err.println("Can't report file " + current.getAbsolutePath() + ": " + e); } } } } } } So, I can contribute this in dev/src for example. 3. Instead of using dev/check-license, we can use the rat gradle plugin (https://github.com/eskatos/creadur-rat-gradle/). I tested it and it works fine as it uses a custom IReportable like: https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L135 https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L189 We can include this plugin the check gradle phase, meaning that we can verify headers for each PR. My preference would be for 3, mainly because: 1. it integrates smoothly in our gradle ecosystem, adding a new plugin as we have gradle-baseline-java, gradle-errorprone-plugin, spotless-plugin-gradle, etc 2. As we can hook rat gradle plugin in the gradle check task, it means license check will be perform at build time, including check on PR by GitHub Actions If you are OK with 3, I will work on: 1. a PR to use it 2. a PR for website to update release check procedure Thoughts ? Regards JB