"M. Zhou" <lu...@debian.org> writes: > On Sun, 2025-01-12 at 16:56 +0000, Colin Watson wrote: >> >> (I have less fixed views on locally-trained models, but I see no very >> compelling need to find more things to spend energy on even if the costs >> are lower.) > > Locally-trained models are not practical in the current stage. > State-of-the-art > models can only be trained by the richest capitals who have GPU clusters. > Training > and deploying smaller models like 1 billion can lead to a very wrong > impression > and conclusion on those models.
Isn't the corollary of that statement that all the useful models available have been trained on material that we have no clues about regarding the status of the copyright/licensing? Even without considering the case where the training data belongs to some litigious corporation, the thing that concerns me is that these models have presumably sucked up every scrap of e.g. GPL code on the net. Having done that, they produce answers that are somehow informed by that data, without any indication of how they arrived at the answer, and certainly not a notice that the authors who produced the training data intended there to be restrictions on the use of their creativity (that an ethical person would want to honour). I'd really like to know how it is possible for one to use an LLM to make a contribution to a permissively licensed project (e.g. Expat) without in effect stealing the code from one's own tribe of Copyleft authors. Can one even play with an LLM without somehow contaminating one's brain? Cheers, Phil. P.S. AFAIK the likes of OpenAI declare that the output of the model belongs to the prompter, but that strikes me as self-serving nonsense that the courts will eventually rule on. I'd love to be proved wrong on that though, because I'd quite like to play with LLMs, if only to do things like generating potentially lethal cooking recipes to try out ;-) -- Philip Hands -- https://hands.com/~phil
signature.asc
Description: PGP signature