Interesting but at first glance the data seems too unreliable to be of any use. I started checking the identified projects under the so-called Clear BSD license (the FSF-free, never-OSI-submitted BSD variant that explicitly excludes patent licenses) and the ones I looked at were all spurious matches.
Richard On Thu, Apr 6, 2017, at 11:21 AM, Luis Villa wrote: > Yet another (inevitably flawed) data set: > https://libraries.io/licenses > > On Tue, Jan 10, 2017, 11:07 AM Luis Villa <[email protected]> wrote: >> [Apparently I got unsubscribed at some point, so if you've sent an >> email here in recent months seeking my feedback, please resend.] >> >> Hey, all- >> I promised some board members a summary of my investigation in '12- >> '13 into updating, supplementing, or replacing the "popular licenses" >> list. Here goes. >> >> *tl;dr* >> I think OSI should have an data-driven short license list with a >> replicable and transparent methodology, supplemented by a new-and- >> good(?) list that captures licenses that aren't yet popular but are >> high quality and have some substantial improvement that advances the >> goals of OSI. >> >> *Purposes of non-comprehensive lists* >> If you Google "open source licenses", OSI pages are the top two hits. >> Historically, those pages were not very helpful unless you already >> knew something about open source. Having a shorter "top" list can >> help make the OSI website more useful to newcomers by suggesting a >> starting place for their exploration and education about open source. >> >> In addition, third parties often look to OSI as a trusted (neutral?) >> source for "top" or "best" licenses that they can incorporate into >> products. (The full OSI-approved list is not practical for many >> applications.) For example, if OSI had an up-to-date short list, it >> might have been the basis for GitHub's license chooser. >> A list that is purely based on popularity would freeze open source in >> a particular time, likely making it hard for new licenses with >> important innovations to get adoption. However, a list based on more >> subjective criteria is hard to create and update. >> *Past attempts* >> The proliferation report attempted to address this problem by >> categorizing existing licenses. These categories were, >> intentionally or not, seen as the "popular or strong communities >> list" and "everything else". Without a process or clear set of >> criteria to update the "popular" list, however, it became frozen in >> time. It is now difficult to credibly recommend the list to >> newcomers or third parties (MPL 1.1 is deprecated; no mention of >> Blackduck #4 GPL v3; etc.). >> There was also substantial work done towards a license "chooser" or >> "wizard". However, this runs into some of the same problems - either >> the chooser is opinionated (and so pisses off people, and potentially >> locks the licenses in time) or is borderline-useless for newcomers >> (because it still requires substantial additional research after >> using it). >> *Data-driven "popular" list* >> With all that in mind, I think that OSI needs a (mostly) data-driven >> "popular" shortlist, based on a scan of public code + application of >> (mostly?) objective rules to the outcome of that scan. >> To maintain OSI's reputation as being (reasonably) neutral and >> independent, OSI should probably avoid basing this on third-party >> license surveys (e.g., Black Duck[1]) unless their methodologies and >> data sources are well-documented. Ideally someone will write code so >> that the "survey" can be run by OSI and reproduced by others. >> Hard decisions on how to collect and "process" the data will include: >> * *choice of data sources:* What data sources are drawn on? Key >> Linux distros? GitHub? per-language repos like maven, cpan, npm, >> etc? >> * *what are you counting?** *Projects? (May favor small, throwaway >> projects?) Lines of code? (May favor the largest, most complex >> projects?) ... ? >> * *which license tools? *Some scanners are more aggressive in trying >> to identify *something*, while others prefer accuracy over >> comprehensiveness. In 2013 there was no good answer to this, but >> my understanding is that fossology now has three different >> scanners, so for OSI's purposes it may be sufficient to take those >> three and average. >> * Could throw in Black Duck or other non-transparent surveys as a >> fourth, fifth, etc.? >> * *new versions? *If a new version exists but isn't widely adopted >> yet, how does the list reflect that? e.g., MPL 1.1 still shows up >> in Black Duck's survey; should OSI replace 1.1 with 2.0 in the >> "processed" list? What about GPL v2 v. v3? BSD/MIT v. UPL? >> * *gaps/"mistakes":* What happens when the board thinks the data is >> incorrect? :) e.g., should ISC be listed? >> Part of why we didn't go very far in 2013 is because there are no >> great answers for these - different answers will reflect different >> values, and have different engineering impact. They're all hard >> choices for the board, the developers, hopefully license-discuss, and >> perhaps a broader community. >> Hat tip: Daniel German was invaluable to me in thinking through these >> questions. >> *Supplementing with high-quality, value-adding options* >> To encourage progress, while still avoiding proliferation, I'd >> suggest a second list of licenses that are good but not (yet?) >> popular. "Good" would be defined as something like: >> 1. meets the OSD >> 2. isn't on the data-driven popularity list >> 3. drafted by an attorney (at minimum) or by a collaborative, public >> drafting process with clear support from a sponsoring-maintaining >> organization (ideal) >> 4. has a new "feature" that is firmly in keeping with the overall >> goals of open source and can be concisely explained in a few >> sentences (e.g., for UPL, "GPL-compatible permissive license with >> explicit patent grant") >> 1. but not "just for a particular community" - has to be at least >> plausible applicable to most open source projects >> 2. this is unavoidably subjective; suggest having it fall to the >> board with pre-discussion on license-review. >> #4 allows for some innovation (and OSI support of such innovation) >> #while #3 applies a quality filter. (Both #3 and #4 have anti- >> #proliferation effects.) Hopefully licenses that meet #3 and #4 would >> #eventually move into #2, but you could imagine placing a time limit >> #on this list; if you're not in the top 10 most popular within five >> #years, then you get retired? But not sure that's a good idea at all >> #- just throwing it out as one option. >> If a new license meets #1, but not #3 and #4, then OSI's formal >> policy should be to approve, but bury it in one of the other >> proliferation list groups. (Those groups are actually quite good, and >> should be fairly non-controversial — once you have a good policy for >> what gets in the more "favored" groups.) I don't think a new >> "deprecated" group is necessary - the proliferation categories are >> basically a good list of that already. >> This is still a somewhat subjective process, and if it had been in >> place in '99-'06, it would have been fairly fraught. However, I >> think most of the "action" in open source organization has moved on >> to other areas (e.g., foundation structure, CoCs, etc.), and the >> field has matured in other ways, so I think this is now a >> practicable approach in ways it would not have been a decade or even >> five years ago. >> *Miscellaneous notes* >> * I don't recommend merely updating the existing "popular and..." >> list through a subjective or one-time process. The politics of >> that will be messy, and without a documented, mostly-objective, >> data-driven method, it'll again become an outdated mess. >> * The OSD should probably be updated. At the least this should be by >> addressing things like whether a formal patent grant is required >> of new licenses; more ambitiously it might follow Open Data >> Definition 2.x[2] by splitting out open licenses from open works. >> * With SPDX and Fedora providing more comprehensive lists of FOSS >> licenses, it might make sense for OSI to link to those as >> "extended" resources, to reduce pressure from obscure license >> authors to get their license approved. >> * The biggest pressure on this process will continue to be licenses >> that try to open up space for new commercial business models >> (e.g., Fair Source). The more OSI can write/document/buttress OSD >> #1, the better. >> * I used to think a license wizard was a good idea, but I don't any >> more. I thought copyleft spectrum was really the only important >> decision-making factor, which made the idea plausible, but non- >> copyleft factors matter much more than I once thought, and make >> simplifying to a "wizard" too hard for OSI (though perhaps still >> plausible for a third party). >> * Documentation of what the copyleft spectrum *is*, what the key >> licenses on it are, and what other factors might be relevant, is >> still a good idea, but are secondary to getting the basic lists >> right. >> HTH- >> Luis > -- > *Luis Villa: Open Law and Strategy[3]* > *+1-415-938-4552* > _________________________________________________ > License-discuss mailing list > [email protected] > https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss Links: 1. https://www.blackducksoftware.com/top-open-source-licenses 2. http://opendefinition.org/od/2.1/en/ 3. http://lu.is
_______________________________________________ License-discuss mailing list [email protected] https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss

