Noi qui abbiamo una posizione diversa: Preserving Balance in the EU Digital Single Market: How Like Company Could Reframe Copyright and Innovation in the Generative AI Era by Enrico Bonadio, Giancarlo Frosio, Christophe Geiger, Andrés Guadamuz, Stavroula Karapapa, Irini A. Stamatoudi :: SSRN <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6326401>
Lo studio Ahmed et al mostra quei risultati su un libro che è stato 'overfitted' migliaia di volte e con adversarial prompting che ha costi tre o quattro volte il valore del libro. Leggerò l'altro studio di cui vedo Ginsburg è una co-autrice. Nel frattempo questo studio ha posizioni diverse: Haviv, A. et al., ‘We Should Separate Memorization from Copyright’, arXiv:2602.08632v1 [cs.CY], 9 February 2026, https://doi.org/10.48550/arXiv.2602.08632. In questo secondo studio, Niva Elkin Koren è una delle co-autrici. Ginsburg e Koren hanno entrambe grande reputazione internazionale in materia di diritti d'autore. Ginsburg però è una massimalista del copyright, mentre Koren una minimalista... Giancarlo On Fri, 27 Mar 2026 at 10:22, Enrico Nardelli via nexa < [email protected]> wrote: > Per chi se li fosse persi: un paio di lavori che indicano in modo > abbastanza chiaro come gli LLM siano certamente delle gigantesche memorie > che contengono intere opere soggette a diritto d'autore. > > Ciao, Enrico > > 1) > > Extracting books from production language models > hmed Ahmed, A. Feder Cooper, Sanmi Koyejo, Percy Liang > https://arxiv.org/abs/2601.02671 > > Many unresolved legal questions over LLMs and copyright center on > memorization: whether specific training data have been encoded in the > model's weights during training, and whether those memorized data can be > extracted in the model's outputs. While many believe that LLMs do not > memorize much of their training data, recent work shows that substantial > amounts of copyrighted text can be extracted from open-weight models. > However, it remains an open question if similar extraction is feasible for > production LLMs, given the safety measures these systems implement. We > investigate this question ... and we measure extraction success with a > score computed from a block-based approximation of longest common substring > (nv-recall). With different per-LLM experimental configurations, we were > able to extract varying amounts of text. ... e.g, nv-recall of 76.8% and > 70.3%, respectively, for Harry Potter and the Sorcerer's Stone ... Taken > together, our work highlights that, even with model- and system-level > safeguards, extraction of (in-copyright) training data remains a risk for > production LLMs. > > ---------------- > > 2) > > Xinyue Liu, Niloofar Mireshghallah, Jane C. Ginsburg, Tuhin Chakrabarty > Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of > Copyrighted Books in Large Language Models > https://arxiv.org/abs/2603.20957 > > Frontier LLM companies have repeatedly assured courts and regulators that > their models do not store copies of training data. They further rely on > safety alignment strategies via RLHF, system prompts, and output filters to > block verbatim regurgitation of copyrighted works, and have cited the > efficacy of these measures in their legal defenses against copyright > infringement claims. We show that finetuning bypasses these protections: by > training models to expand plot summaries into full text, a task naturally > suited for commercial writing assistants, we cause GPT-4o, Gemini-2.5-Pro, > and DeepSeek-V3.1 to reproduce up to 85-90% of held-out copyrighted books, > with single verbatim spans exceeding 460 words, using only semantic > descriptions as prompts and no actual book text. > > ... > > Our findings offer compelling evidence that model weights store copies of > copyrighted works and that the security failures that manifest after > finetuning on individual authors’ works undermine a key premise of recent > fair use rulings, where courts have conditioned favorable outcomes on the > adequacy of measures preventing reproduction of protected expression. > > -- > > -- EN > https://www.hoepli.it/libro/la-rivoluzione-informatica/9788896069516.html > ====================================================== > Prof. Enrico Nardelli > Past President di "Informatics Europe" > Direttore del Laboratorio Nazionale "Informatica e Scuola" del CINI > Dipartimento di Matematica - Università di Roma "Tor Vergata" > Via della Ricerca Scientifica snc - 00133 Roma > home page: https://www.mat.uniroma2.it/~nardelli > blog: https://link-and-think.blogspot.it/ > tel: +39 06 7259.4204 fax: +39 06 7259.4699 > mobile: +39 335 590.2331 e-mail: [email protected] > online meeting: https://blue.meet.garr.it/b/enr-y7f-t0q-ont > ====================================================== > -- >
