Thanks for the heads up Xuanwo. It's the fourth option :) I will make a comparison with RAT.
Regards JB On Fri, Oct 27, 2023 at 12:15 PM Xuanwo <xua...@apache.org> wrote: > > iceberg-rust is using apache/skywalking-eyes/header@v0.5.0 now. > > BTW, we found skywalking-eyes works really well. It's fast, correct and > well-maintained. > > Maybe worth take a look. > > On Fri, Oct 27, 2023, at 17:48, Jean-Baptiste Onofré wrote: > > By the way, as dev/check-license is also used in iceberg-python and > > iceberg-go repositories (iceberg-rust doesn't have it), maybe I can > > move forward on new rat release with the fix on hidden directories and > > update there as well. > > > > Regards > > JB > > > > On Thu, Oct 26, 2023 at 5:19 PM Jean-Baptiste Onofré <j...@nanthrax.net> > > wrote: > >> > >> Hi guys, > >> > >> During the 1.4.1 vote, we identified some files without ASF headers, > >> more specifically in hidden directories (like .baseline). These files > >> have not been detected by dev/check-license script. > >> > >> The reason is because the script uses apache-rat via java -jar (the > >> Apache RAT CLI), executing the RAT Report class () > >> > >> When providing a directory to scan, the Report class uses a > >> DirectoryWalker to traverse the directories, looking for files to > >> check. > >> Unfortunately, by default, the DirectoryWalker ignores the hidden > >> directory (all directories starting with .): > >> > >> https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/DirectoryWalker.java#L71 > >> > >> https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/java/org/apache/rat/walker/Walker.java#L53 > >> > >> In our case, it means that it ignores .baseline, .github, .git, > >> .palantir directories. This is not good as .baseline, .github and > >> .palantir directories are included in our source distribution, so the > >> ASF headers should be clean here. > >> > >> FYI, I will propose a change in rat to, at least, be able to use a > >> ReportConfiguration to define if we want to restrict directories or > >> not (be able to configure the Walkers basically). I will try to > >> include it in rat 0.17 (I discussed with Claude about that). > >> NB: rat gradle and maven plugins define their own Walkers/IReportable > >> to avoid this issue. > >> > >> I propose three options to improve this: > >> 1. We keep dev/check-license as it is today (using rat 0.15) and we > >> exclude .baseline, .github, .palantir directories from our source > >> distribution. It means that we will probably have issues while > >> building from source distribution. We can revisit dev/check-license > >> when we upgrade to rat 0.17 > >> > >> 2. We keep dev/check-license but we create our own rat scanner with a > >> custom IReportable class considering all files/directories. Something > >> like: > >> > >> public static void main(String[] args) throws Exception { > >> ReportConfiguration reportConfiguration = new > >> ReportConfiguration(); > >> > >> reportConfiguration.setHeaderMatcher(Defaults.createDefaultMatcher()); > >> > >> XmlWriter writer = new XmlWriter(new FileWriter("report.xml")); > >> ClaimStatistic claimStatistic = new ClaimStatistic(); > >> RatReport ratReport = XmlReportFactory.createStandardReport( > >> writer, > >> claimStatistic, > >> reportConfiguration > >> ); > >> ratReport.startReport(); > >> IncludeHiddenDirectoryWalker walker = new > >> IncludeHiddenDirectoryWalker(new File("/path/to/iceberg")); > >> walker.run(ratReport); > >> ratReport.endReport(); > >> writer.closeDocument(); > >> } > >> > >> public class IncludeHiddenDirectoryWalker implements IReportable { > >> > >> > >> private File file; > >> > >> public IncludeHiddenDirectoryWalker(File file) { > >> this.file = file; > >> } > >> > >> @Override > >> public void run(RatReport report) throws RatException { > >> process(report, file); > >> } > >> > >> public void process(RatReport report, File file) { > >> final File[] files = file.listFiles(); > >> if (files != null) { > >> for (File current : files) { > >> if (current.isDirectory()) { > >> process(report, current); > >> } else { > >> try { > >> Document document = new FileDocument(current); > >> report.report(document); > >> } catch (RatException e) { > >> System.err.println("Can't report file " + > >> current.getAbsolutePath() + ": " + e); > >> } > >> } > >> } > >> } > >> } > >> > >> } > >> > >> So, I can contribute this in dev/src for example. > >> > >> 3. Instead of using dev/check-license, we can use the rat gradle > >> plugin (https://github.com/eskatos/creadur-rat-gradle/). I tested it > >> and it works fine as it uses a custom IReportable like: > >> > >> https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L135 > >> > >> https://github.com/eskatos/creadur-rat-gradle/blob/master/src/main/kotlin/org/nosphere/apache/rat/RatWork.kt#L189 > >> > >> We can include this plugin the check gradle phase, meaning that we can > >> verify headers for each PR. > >> > >> My preference would be for 3, mainly because: > >> 1. it integrates smoothly in our gradle ecosystem, adding a new plugin > >> as we have gradle-baseline-java, gradle-errorprone-plugin, > >> spotless-plugin-gradle, etc > >> 2. As we can hook rat gradle plugin in the gradle check task, it means > >> license check will be perform at build time, including check on PR by > >> GitHub Actions > >> > >> If you are OK with 3, I will work on: > >> 1. a PR to use it > >> 2. a PR for website to update release check procedure > >> > >> Thoughts ? > >> > >> Regards > >> JB > > -- > Xuanwo