Sorry if I was not clear: on walking the file system, that's clear, I did not intend to talk about that, only about matching and reporting on matching. The example I gave was just to put in context why I believe I'd need a different api.
Using the Options field is good enough in the first example. (That's how I used licensecheck first.) Although for the second example Cover() does not report what I'd need. As far as I've seen currently using func Cover(INPUT []byte, opts Options) (Coverage, bool) reports 100% MIT if INPUT matches byte for byte 100% MIT. If INPUT has more text than the complete 100% matching text of MIT license, for example the MIT license is only in the beginning of INPUT and the rest of INPUT is for example Go code, than Coverage will report len(INPUT)/len(MIT license) which is less than 100%. In this case, the new api would report 100%, since input contains 100% MIT license text (and some programming code, which is not relevant here). If I understand correctly the current api is for checking _already_ identified license files, which contain _only_ the license text. I believe to look for files containing - complete or possibly broken - license references a different matching is needed. On 11/14/19, Rob Pike <r...@golang.org> wrote: > As I understand what you're trying to do, you just need to write a tree > walker, perhaps using filepath.Walk, that opens each file and calls Cover > on it. You can set the Options field to control the threshold for > reporting, and use the result of that to choose which licenses to report. > > I don't believe an API change is called for. > > -rob > > > On Thu, Nov 14, 2019 at 6:14 PM <fge...@gmail.com> wrote: > >> func Cover(input []byte, opts Options) (Coverage, bool) in >> licensecheck currently reports len(input)/len(one of the licenses) for >> each known license. I'd need for all known licenses len(known >> license)/len(license reference in input). >> >> I'd like to scan >100000 files (possibly a lot more), where some of >> them (<0.1%) contain full or partial known license texts. >> >> An example scenario for an example /src, containing >100000 files: >> $ listlicenses /src # to get an overview of 100% matching license >> references >> LGPL-2.1 >> MIT >> $ listlicenses -details /src # same tree, more detailed >> output, >> to >> see the details >> /src/license refers 100% MIT # the bytes in /src/license correspond >> one for one for the MIT license >> /src/fonts/LICENSE refers 100% MIT # the bytes in /src/fonts/LICENSE >> correspond one for one for the MIT license >> /src/a/Notice refers 100% LGPL-2.1 # same as above with LGPL-2.1 >> /src/a/b/whatever.go refers 94% GPL2 # most probably a broken >> license reference in whatever.go, maybe someone inadvertently deleted >> the last word from the lines containing the GPL2 license text. Needs >> human inspection to check what's the license situation with >> whatever.go >> /src/c/ConfusingLicenseReferences.c refers 7% ZLIB # >> ConfusingLicenseReferences.c has most probably a false positive report >> for reference to ZLIB >> /src/c/ConfusingLicenseReferences.c refers 65% MIT # >> ConfusingLicenseReferences.c has only 65% of MIT, the author intended >> to refer to MIT, but some inadvertent edit later broke the license >> reference in ConfusingLicenseReferences.c >> >> Command listlicenses iterates over all files in the subtree, gathering >> all full or partial (broken) license references. Command listlicenses >> uses the functionality similar to github.com/google/licensecheck to >> check the files in the file system. >> >> >> >> thanks! >> >> On 11/13/19, Rob Pike <r...@golang.org> wrote: >> > Can you please explain in more detail what you're asking for? I don't >> > understand the problem you have or why the current package cannot >> > handle >> > it. >> > >> > -rob >> > >> > >> > On Wed, Nov 13, 2019 at 7:05 PM <fge...@gmail.com> wrote: >> > >> >> Hi, >> >> >> >> "licensecheck classifies license files and heuristically determines >> >> how well they correspond to known open source licenses." >> >> >> >> I'd like to identify license references in the file system. If I >> >> understand correctly package licensecheck in it's current form is not >> >> useful to help with this. >> >> If it's still possible, could you please share a hint how to do that? >> >> (input: byte array, output: license references in the byte array) >> >> If I understand correctly and I can't use licensecheck in it's current >> >> form, which one is preferred: >> >> extend current api, (maybe: func Refers(input []byte) (References, >> >> bool) or fork+rename the package? (References{...} being similar to >> >> Coverage{...}) >> >> >> >> thanks, >> >> Gergely Födémesi >> >> >> >> -- >> >> You received this message because you are subscribed to the Google >> Groups >> >> "golang-nuts" group. >> >> To unsubscribe from this group and stop receiving emails from it, send >> an >> >> email to golang-nuts+unsubscr...@googlegroups.com. >> >> To view this discussion on the web visit >> >> >> https://groups.google.com/d/msgid/golang-nuts/CA%2BctqrqKKUPTHihMLhLTH5O-tBm1qENQV6y41Qwde4jHp1kNmA%40mail.gmail.com >> >> . >> >> >> > >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CA%2BctqrqVMz%3DZBSLFf6nJRABPEdmaFOr9EEnad2r%3DucszpV_E-g%40mail.gmail.com.