By the way, as dev/check-license is also used in iceberg-python and
iceberg-go repositories (iceberg-rust doesn't have it), maybe I can
move forward on new rat release with the fix on hidden directories and
update there as well.

Regards
JB

On Thu, Oct 26, 2023 at 5:19 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
>
> Hi guys,
>
> During the 1.4.1 vote, we identified some files without ASF headers,
> more specifically in hidden directories (like .baseline). These files
> have not been detected by dev/check-license script.
>
> The reason is because the script uses apache-rat via java -jar (the
> Apache RAT CLI), executing the RAT Report class ()
>
> When providing a directory to scan, the Report class uses a
> DirectoryWalker to traverse the directories, looking for files to
> check.
> Unfortunately, by default, the DirectoryWalker ignores the hidden
> directory (all directories starting with .):
>
> https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/DirectoryWalker.java#L71
>
> https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/Walker.java#L53
>
> In our case, it means that it ignores .baseline, .github, .git,
> .palantir directories. This is not good as .baseline, .github and
> .palantir directories are included in our source distribution, so the
> ASF headers should be clean here.
>
> FYI, I will propose a change in rat to, at least, be able to use a
> ReportConfiguration to define if we want to restrict directories or
> not (be able to configure the Walkers basically). I will try to
> include it in rat 0.17 (I discussed with Claude about that).
> NB: rat gradle and maven plugins define their own Walkers/IReportable
> to avoid this issue.
>
> I propose three options to improve this:
> 1. We keep dev/check-license as it is today (using rat 0.15) and we
> exclude .baseline, .github, .palantir directories from our source
> distribution. It means that we will probably have issues while
> building from source distribution. We can revisit dev/check-license
> when we upgrade to rat 0.17
>
> 2. We keep dev/check-license but we create our own rat scanner with a
> custom IReportable class considering all files/directories. Something
> like:
>
>     public static void main(String[] args) throws Exception {
>         ReportConfiguration reportConfiguration = new ReportConfiguration();
>         reportConfiguration.setHeaderMatcher(Defaults.createDefaultMatcher());
>
>         XmlWriter writer = new XmlWriter(new FileWriter("report.xml"));
>         ClaimStatistic claimStatistic = new ClaimStatistic();
>         RatReport ratReport = XmlReportFactory.createStandardReport(
>                 writer,
>                 claimStatistic,
>                 reportConfiguration
>         );
>         ratReport.startReport();
>         IncludeHiddenDirectoryWalker walker = new
> IncludeHiddenDirectoryWalker(new File("/path/to/iceberg"));
>         walker.run(ratReport);
>         ratReport.endReport();
>         writer.closeDocument();
>     }
>
> public class IncludeHiddenDirectoryWalker implements IReportable {
>
>
>     private File file;
>
>     public IncludeHiddenDirectoryWalker(File file) {
>         this.file = file;
>     }
>
>     @Override
>     public void run(RatReport report) throws RatException {
>         process(report, file);
>     }
>
>     public void process(RatReport report, File file) {
>         final File[] files = file.listFiles();
>         if (files != null) {
>             for (File current : files) {
>                 if (current.isDirectory()) {
>                     process(report, current);
>                 } else {
>                     try {
>                         Document document = new FileDocument(current);
>                         report.report(document);
>                     } catch (RatException e) {
>                         System.err.println("Can't report file " +
> current.getAbsolutePath() + ": " + e);
>                     }
>                 }
>             }
>         }
>     }
>
> }
>
> So, I can contribute this in dev/src for example.
>
> 3. Instead of using dev/check-license, we can use the rat gradle
> plugin (https://github.com/eskatos/creadur-rat-gradle/). I tested it
> and it works fine as it uses a custom IReportable like:
>
> https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L135
>
> https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L189
>
> We can include this plugin the check gradle phase, meaning that we can
> verify headers for each PR.
>
> My preference would be for 3, mainly because:
> 1. it integrates smoothly in our gradle ecosystem, adding a new plugin
> as we have gradle-baseline-java, gradle-errorprone-plugin,
> spotless-plugin-gradle, etc
> 2. As we can hook rat gradle plugin in the gradle check task, it means
> license check will be perform at build time, including check on PR by
> GitHub Actions
>
> If you are OK with 3, I will work on:
> 1. a PR to use it
> 2. a PR for website to update release check procedure
>
> Thoughts ?
>
> Regards
> JB

Reply via email to