Sam Hartman <hartm...@debian.org> writes: > Russ, I'm sure you are aware, but things get very interesting if the > input to AI training is not fair use.
> In particular, if Github copilot is a derivative work of everything fed > to it (including all the copylefted works), that gets kind of awkward > for Microsoft. > Perhaps the Github user agreement grants permission for every copyright > holder who has a Github account. > But for everyone else, things could be very interesting. Yes. I didn't express an opinion on what the correct outcome is because it's not at all obvious to me and I'm not sure that I have an opinion. As a general principle, as a free software advocate, I approve of an expansive definition of fair use and believe that far more uses of copyrighted material should be fair use than are normally considered fair use today. Expansive definitions of fair use are a key legal component to enabling reverse engineering and compatible replacement of non-free software with free software, for example. I'm seeing some tendency for free software advocates who are disturbed by the other social effects of large AI models (and there are quite a few things to be disturbed about), and about the degree to which some of them are parasitic on free software and other free information communities, to respond by advocating for a narrow definition of fair use, at least in this specific area. I'm worried that this is counterproductive; I think we rely on fair use much more than incredibly wealthy multinational software corporations do. But the specific ramifications of an expansive fair use position for the societal effect of AI models isn't clear to me, and to be honest I'm dubious that it's clear to anyone at this point. There are obviously some significant risks, including the tendency of scale effects with large models to further consolidate power into the hands of a small number of very wealthy organizations. -- Russ Allbery (r...@debian.org) <https://www.eyrie.org/~eagle/>