Thanks for the heads up Xuanwo.

It's the fourth option :) I will make a comparison with RAT.

Regards
JB

On Fri, Oct 27, 2023 at 12:15 PM Xuanwo <xua...@apache.org> wrote:
>
> iceberg-rust is using apache/skywalking-eyes/header@v0.5.0 now.
>
> BTW, we found skywalking-eyes works really well. It's fast, correct and 
> well-maintained.
>
> Maybe worth take a look.
>
> On Fri, Oct 27, 2023, at 17:48, Jean-Baptiste Onofré wrote:
> > By the way, as dev/check-license is also used in iceberg-python and
> > iceberg-go repositories (iceberg-rust doesn't have it), maybe I can
> > move forward on new rat release with the fix on hidden directories and
> > update there as well.
> >
> > Regards
> > JB
> >
> > On Thu, Oct 26, 2023 at 5:19 PM Jean-Baptiste Onofré <j...@nanthrax.net> 
> > wrote:
> >>
> >> Hi guys,
> >>
> >> During the 1.4.1 vote, we identified some files without ASF headers,
> >> more specifically in hidden directories (like .baseline). These files
> >> have not been detected by dev/check-license script.
> >>
> >> The reason is because the script uses apache-rat via java -jar (the
> >> Apache RAT CLI), executing the RAT Report class ()
> >>
> >> When providing a directory to scan, the Report class uses a
> >> DirectoryWalker to traverse the directories, looking for files to
> >> check.
> >> Unfortunately, by default, the DirectoryWalker ignores the hidden
> >> directory (all directories starting with .):
> >>
> >> https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/DirectoryWalker.java#L71
> >>
> >> https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/Walker.java#L53
> >>
> >> In our case, it means that it ignores .baseline, .github, .git,
> >> .palantir directories. This is not good as .baseline, .github and
> >> .palantir directories are included in our source distribution, so the
> >> ASF headers should be clean here.
> >>
> >> FYI, I will propose a change in rat to, at least, be able to use a
> >> ReportConfiguration to define if we want to restrict directories or
> >> not (be able to configure the Walkers basically). I will try to
> >> include it in rat 0.17 (I discussed with Claude about that).
> >> NB: rat gradle and maven plugins define their own Walkers/IReportable
> >> to avoid this issue.
> >>
> >> I propose three options to improve this:
> >> 1. We keep dev/check-license as it is today (using rat 0.15) and we
> >> exclude .baseline, .github, .palantir directories from our source
> >> distribution. It means that we will probably have issues while
> >> building from source distribution. We can revisit dev/check-license
> >> when we upgrade to rat 0.17
> >>
> >> 2. We keep dev/check-license but we create our own rat scanner with a
> >> custom IReportable class considering all files/directories. Something
> >> like:
> >>
> >>     public static void main(String[] args) throws Exception {
> >>         ReportConfiguration reportConfiguration = new 
> >> ReportConfiguration();
> >>         
> >> reportConfiguration.setHeaderMatcher(Defaults.createDefaultMatcher());
> >>
> >>         XmlWriter writer = new XmlWriter(new FileWriter("report.xml"));
> >>         ClaimStatistic claimStatistic = new ClaimStatistic();
> >>         RatReport ratReport = XmlReportFactory.createStandardReport(
> >>                 writer,
> >>                 claimStatistic,
> >>                 reportConfiguration
> >>         );
> >>         ratReport.startReport();
> >>         IncludeHiddenDirectoryWalker walker = new
> >> IncludeHiddenDirectoryWalker(new File("/path/to/iceberg"));
> >>         walker.run(ratReport);
> >>         ratReport.endReport();
> >>         writer.closeDocument();
> >>     }
> >>
> >> public class IncludeHiddenDirectoryWalker implements IReportable {
> >>
> >>
> >>     private File file;
> >>
> >>     public IncludeHiddenDirectoryWalker(File file) {
> >>         this.file = file;
> >>     }
> >>
> >>     @Override
> >>     public void run(RatReport report) throws RatException {
> >>         process(report, file);
> >>     }
> >>
> >>     public void process(RatReport report, File file) {
> >>         final File[] files = file.listFiles();
> >>         if (files != null) {
> >>             for (File current : files) {
> >>                 if (current.isDirectory()) {
> >>                     process(report, current);
> >>                 } else {
> >>                     try {
> >>                         Document document = new FileDocument(current);
> >>                         report.report(document);
> >>                     } catch (RatException e) {
> >>                         System.err.println("Can't report file " +
> >> current.getAbsolutePath() + ": " + e);
> >>                     }
> >>                 }
> >>             }
> >>         }
> >>     }
> >>
> >> }
> >>
> >> So, I can contribute this in dev/src for example.
> >>
> >> 3. Instead of using dev/check-license, we can use the rat gradle
> >> plugin (https://github.com/eskatos/creadur-rat-gradle/). I tested it
> >> and it works fine as it uses a custom IReportable like:
> >>
> >> https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L135
> >>
> >> https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L189
> >>
> >> We can include this plugin the check gradle phase, meaning that we can
> >> verify headers for each PR.
> >>
> >> My preference would be for 3, mainly because:
> >> 1. it integrates smoothly in our gradle ecosystem, adding a new plugin
> >> as we have gradle-baseline-java, gradle-errorprone-plugin,
> >> spotless-plugin-gradle, etc
> >> 2. As we can hook rat gradle plugin in the gradle check task, it means
> >> license check will be perform at build time, including check on PR by
> >> GitHub Actions
> >>
> >> If you are OK with 3, I will work on:
> >> 1. a PR to use it
> >> 2. a PR for website to update release check procedure
> >>
> >> Thoughts ?
> >>
> >> Regards
> >> JB
>
> --
> Xuanwo

Reply via email to