By the way, as dev/check-license is also used in iceberg-python and iceberg-go repositories (iceberg-rust doesn't have it), maybe I can move forward on new rat release with the fix on hidden directories and update there as well.
Regards JB On Thu, Oct 26, 2023 at 5:19 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > > Hi guys, > > During the 1.4.1 vote, we identified some files without ASF headers, > more specifically in hidden directories (like .baseline). These files > have not been detected by dev/check-license script. > > The reason is because the script uses apache-rat via java -jar (the > Apache RAT CLI), executing the RAT Report class () > > When providing a directory to scan, the Report class uses a > DirectoryWalker to traverse the directories, looking for files to > check. > Unfortunately, by default, the DirectoryWalker ignores the hidden > directory (all directories starting with .): > > https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/DirectoryWalker.java#L71 > > https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/Walker.java#L53 > > In our case, it means that it ignores .baseline, .github, .git, > .palantir directories. This is not good as .baseline, .github and > .palantir directories are included in our source distribution, so the > ASF headers should be clean here. > > FYI, I will propose a change in rat to, at least, be able to use a > ReportConfiguration to define if we want to restrict directories or > not (be able to configure the Walkers basically). I will try to > include it in rat 0.17 (I discussed with Claude about that). > NB: rat gradle and maven plugins define their own Walkers/IReportable > to avoid this issue. > > I propose three options to improve this: > 1. We keep dev/check-license as it is today (using rat 0.15) and we > exclude .baseline, .github, .palantir directories from our source > distribution. It means that we will probably have issues while > building from source distribution. We can revisit dev/check-license > when we upgrade to rat 0.17 > > 2. We keep dev/check-license but we create our own rat scanner with a > custom IReportable class considering all files/directories. Something > like: > > public static void main(String[] args) throws Exception { > ReportConfiguration reportConfiguration = new ReportConfiguration(); > reportConfiguration.setHeaderMatcher(Defaults.createDefaultMatcher()); > > XmlWriter writer = new XmlWriter(new FileWriter("report.xml")); > ClaimStatistic claimStatistic = new ClaimStatistic(); > RatReport ratReport = XmlReportFactory.createStandardReport( > writer, > claimStatistic, > reportConfiguration > ); > ratReport.startReport(); > IncludeHiddenDirectoryWalker walker = new > IncludeHiddenDirectoryWalker(new File("/path/to/iceberg")); > walker.run(ratReport); > ratReport.endReport(); > writer.closeDocument(); > } > > public class IncludeHiddenDirectoryWalker implements IReportable { > > > private File file; > > public IncludeHiddenDirectoryWalker(File file) { > this.file = file; > } > > @Override > public void run(RatReport report) throws RatException { > process(report, file); > } > > public void process(RatReport report, File file) { > final File[] files = file.listFiles(); > if (files != null) { > for (File current : files) { > if (current.isDirectory()) { > process(report, current); > } else { > try { > Document document = new FileDocument(current); > report.report(document); > } catch (RatException e) { > System.err.println("Can't report file " + > current.getAbsolutePath() + ": " + e); > } > } > } > } > } > > } > > So, I can contribute this in dev/src for example. > > 3. Instead of using dev/check-license, we can use the rat gradle > plugin (https://github.com/eskatos/creadur-rat-gradle/). I tested it > and it works fine as it uses a custom IReportable like: > > https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L135 > > https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L189 > > We can include this plugin the check gradle phase, meaning that we can > verify headers for each PR. > > My preference would be for 3, mainly because: > 1. it integrates smoothly in our gradle ecosystem, adding a new plugin > as we have gradle-baseline-java, gradle-errorprone-plugin, > spotless-plugin-gradle, etc > 2. As we can hook rat gradle plugin in the gradle check task, it means > license check will be perform at build time, including check on PR by > GitHub Actions > > If you are OK with 3, I will work on: > 1. a PR to use it > 2. a PR for website to update release check procedure > > Thoughts ? > > Regards > JB