Caro Stefano, il tuo ragionamento è corretto. Aggiungo che la rimozione del DRM è già perseguibile come violazione del diritto d’autore, non solo nei paesi UE, ma anche in USA. Se l’opera non è protetta da DRM il discorso è un po’ più complicato: in UE, il contratto di licenza non può escludere certi usi consentiti, in particolare il text and data mining per scopi non commerciali. Può però escludere lo stesso utilizzo se per scopi commerciali e se l’uso è espressamente riservato. In USA non ci sono regole precise, ma la libertà contrattuale tende di solito a prevalere sulla disponibilità di eccezioni (fair use). Non è un caso che nella class action contro GitHub / Copilot i claim si basino interamente su violazione dei contratti di licenza (open source) e sulla rimozione dei DRM, anziché sulla violazione del copyright nel software utilizzato per addestrare l’algoritmo. Un caro saluto Maurizio
On Fri, 29 Sep 2023 at 15:21, Stefano Quintarelli <stef...@quintarelli.it> wrote: > Ho una domanda per i giuristi (anzi, piu' di una) > > per allenare un modello, ho bisogno di un file con la versione digitale di > un testo. > (cosnsidero ovviamente testi non PD, CC0, ecc.) > > la versione digitale di un testo la posso ottenere da un ebook (gia' > digitale), togliendo > il probabile DRM. > ma un ebook non e' unbene ma e' un servizio soggetto a licenza d'uso, > quindi se non e' > prevista nella licenza d'uso la facolta' di estrarre il testo digitale per > allenarci un > modello, mi sembra che ci sia gia' una violazione della licenza, per cui, > credo, non possa > essere usato come base di un allenamento, tanto piu' se il fine di tale > allenamento e' > commerciale (se vendo un servizio basato su quel modello). > > se e' cosi', per allenare il mio modello devo allora prednere il testo > digitale facendo > scan/ocr di un testo cartaceo. > ma cio' e' possibile, se non erro, solo per uso personale e non > commerciale. > > se questo e' corretto, non mi pare ci sia un modo per prendere un testo > digitale senza > infrangere una licenza d'uso/copyright > > dove e' la fallacia del ragionamento ? > > grazie, s. > > On 29/09/23 15:00, Stefano Borroni Barale wrote: > > Buongiorno lista, > > > >> L'idea che istruire un modello su dei testi coperti da copyright sia > una violazione del suddetto copyright è altamente opinabile > > > > Fin qui, ho l'impressione che tutti i legali in lista concorderanno. > > > >> ragionamento è in realtà abbastanza semplice: se istruirsi su un > >> testo ne violasse il copyright, saremmo tutti dei criminali. > > > > Ma siccome noi siamo umani e quello che produciamo non è - salvo i > discorsi dei politici(*) - ontologicamente identico alla produzione di > esseri tecnici non viventi, logica vuole che quanto si applica a noi non > possa applicarsi a un LLM, tanto quanto la legge sul copyright non si > applica pedissequamente all'utilizzo di testi umani per creare modelli > linguistici. > > > > Questo è il motivo per il quale tutti i tentativi di "proteggere via > copyright" il prodotto di software generativi sono falliti miseramente, e > con motivazioni scritte in sentenze; che per il diritto credo abbiano un > peso assai maggiore del sito di CC. > > > > La mia impressione è che la questione terrà impegnati legali, > informatici, filosofi e società ancora moooooolto a lungo. > > SBB > > > > (*) Come sanno bene i bambini degli anni '80 che hanno giocato con > questo spassoso giocattolo: > https://www.enricodalbosco.it/giochi/tubolario/ > > > > > > Di quei testi > >> non c'è fisicamente traccia all'interno dei modelli, non viene copiato > >> niente. I modelli sono un'opera trasformativa di quei testi, non > >> derivativa. > >> > >> Lo argomenta molto bene Creative Commons: > >> https://creativecommons.org/2023/02/17/fair-use-training-generative-ai/ > >> > >> Detto questo, cito le parole di un altro autore, Jeff Jarvis: > >> > https://www.facebook.com/jeff.jarvis/posts/pfbid0LMFeqdTYoxnGHQAZwp5HMmeeVqgMSjL2dkcwMcBojkb2cinBpgYTHyc7Fhq1B9NPl > >> > >> «I, for one, am not complaining about my books being in in large > >> language model training sets. I write to enter ideas into public > >> discourse. I prefer informed over ignorant AI. I believe it is fair > >> use for anyone to read & use books for transformative work. In fact, > >> I'd probably feel snubbed if my books were not there. I'm happy when > >> they are in libraries. I'm fine that they're here.» > >> > >> Fabio > >> > >> Il giorno ven 29 set 2023 alle ore 07:52 Alberto Cammozzo via nexa > >> nexa@server-nexa.polito.it ha scritto: > >> > >>> > https://www.theguardian.com/australia-news/2023/sep/28/australian-books-training-ai-books3-stolen-pirated > >>> > >>> Thousands of books from some of Australia’s most celebrated authors > have potentially been caught up in what Booker prize-winning novelist > Richard Flanagan has called “the biggest act of copyright theft in history”. > >>> > >>> The works have allegedly been pirated by the US-based Books3 dataset > and used to train generative AI for corporations such as Meta and Bloomberg. > >>> > >>> Flanagan, who found 10 of his works, including the multi-international > award-winning 2013 novel The Narrow Road to the Deep North, on the Books3 > dataset, told Guardian Australia he was deeply shocked by the discovery > made several days ago. > >>> > >>> “I felt as if my soul had been strip mined and I was powerless to stop > it,” he said in a statement. > >>> > >>> “This is the biggest act of copyright theft in history.” > >>> > >>> AI could ‘turbo-charge fraud’ and be monopolised by tech companies, > Andrew Leigh warns > >>> > >>> The Australian Publishers Association confirmed to Guardian Australia > on Wednesday that as many as 18,000 fiction and nonfiction titles with > Australian ISBNs (unique international standard book numbers) appeared to > be affected by the copyright infringement, although it is not yet clear > what proportion of these are Australian editions of internationally > authored books. > >>> > >>> “We’re still working through [the data] to work out the impact in > terms of Australian authors,” APA spokesperson Stuart Glover said. > >>> > >>> “This is a massive legal and ethical challenge for the publishing > industry and for authors globally.” > >>> > >>> A search tool published on Monday by US media platform The Atlantic > and uploaded by the US Authors Guild on Wednesday revealed the works of > Peter Carey, Helen Garner, Kate Grenville, Anna Funder, Christos Tsiolkas > and Thomas Keneally, as well as Flanagan and dozens of other high-profile > Australian authors, were included in the pirated dataset containing more > than 180,000 titles. > >>> > >>> On Thursday, the Australian Society of Authors issued a statement > saying it was “horrified” to learn that the works of Australian writers > were being used to train artificial intelligence without permission from > the authors. > >>> > >>> ASA chief executive, Olivia Lanchester, described the Books3 dataset > as piracy on an industrial scale. > >>> > >>> “Authors appropriately feel outraged,” Lanchester said. “The fact is > this technology relies upon books, journals, essays written by authors, yet > permission was not sought nor compensation granted.” > >>> > >>> Lanchester said the Australian literary industry, while not objecting > per se to emerging technologies such as AI, was deeply concerned about the > lack of transparency evident in the development and monetisation of AI by > global tech companies. > >>> > >>> “Turning a blind eye to the legitimate rights of copyright owners > threatens to diminish already precarious creative careers,” she said. > >>> > >>> “The enrichment of a few powerful companies is at the cost of > thousands of individual creators. This is not how a fair market functions.” > >>> > >>> Josephine Johnston, chief executive of Australia’s Copyright Agency, > described the Books3 development as “a free kick to big tech” at the > expense of Australia’s creative and cultural life. > >>> > >>> “We’re going to need greater transparency – how these tools have been > developed, trained, how they operate – before people can truly understand > what their legal rights might be,” she said. > >>> > >>> “We seem to be in this terrible position now where content owners – > remembering that the vast majority of them will be individual authors – may > actually have to take out court cases to enforce their rights.” > >>> > >>> Australian copyright law protects creators of original content from > data scraping. > >>> > >>> Litigation in the US against ChatGPT creator OpenAI over use of > allegedly pirated book datasets, Books1 and Books2 (which do not appear to > be affiliated with Books3) has already commenced. > >>> > >>> In July, North American horror/fantasy writers Mona Awad (author of > Bunny) and Paul Tremblay (author of The Cabin at the End of the World) > filed a lawsuit in a San Francisco federal court, alleging ChatGPT > unlawfully digested their books as part of its AI training data. > >>> > >>> On 28 August, OpenAI filed a motion to dismiss the lawsuit, arguing > that the authors “misconceive the scope of copyright, failing to take into > account the limitations and exceptions (including fair use) that properly > leave room for innovations like the large language models now at the > forefront of artificial intelligence”. > >>> > >>> On 19 September the Writers Guild and 17 of its members, including > bestselling novelists John Grisham, George RR Martin and Jodi Picoult, > filed a complaint in a New York district court against OpenAI, seeking > redress for “flagrant and harmful infringements” of guild members’ > registered copyrights. > >>> > >>> In a statement on its website, the guild says while it is aware that > companies such as Meta and Bloomberg have used the Books3 dataset to train > their LLMs, it is not yet clear whether OpenAI is using Books3 to train its > ChatGPT models GPT 3.5 or GPT 4. > >>> > >>> Democracies face ‘truth decay’ as AI blurs fact and fiction, warns > head of Australia’s military > >>> > >>> Guardian Australia has sought comment from OpenAI, which has yet to > officially respond to the guild’s complaint, and Meta. > >>> > >>> On 4 September, US technology magazine Wired reported that a Danish > anti-piracy group called Rights Alliance had been told by Bloomberg that > the company did not plan to train future versions of its BloombergGPT using > Books3. > >>> > >>> Bloomberg declined to respond to the Guardian’s queries. > >>> > >>> The APA said the global nature of the issue would present significant > challenges in enforcement and prosecution, and has joined the authors’ > society in calling for AI technologies to be regulated. > >>> > >>> Consultation closed last month for a Department of Industry, Science > and Resources discussion paper on supporting responsible AI. > >>> > >>> A parliamentary inquiry is under way examining the use of generative > artificial intelligence in the Australian education system. > >>> > >>> Flanagan said it was up to the Australian government to act to protect > Australia’s writers. > >>> > >>> “It has power and we do not,” he said. > >>> > >>> “If it cares for our culture it must now stand up and fight for it.” > >>> > >>> _______________________________________________ > >>> nexa mailing list > >>> nexa@server-nexa.polito.it > >>> https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa > >> > >> _______________________________________________ > >> nexa mailing list > >> nexa@server-nexa.polito.it > >> https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa > > _______________________________________________ > > nexa mailing list > > nexa@server-nexa.polito.it > > https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa > _______________________________________________ > nexa mailing list > nexa@server-nexa.polito.it > https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa >
_______________________________________________ nexa mailing list nexa@server-nexa.polito.it https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa