Caro Stefano, il tuo ragionamento è corretto. Aggiungo che la rimozione del
DRM è già perseguibile come violazione del diritto d’autore, non solo nei
paesi UE, ma anche in USA.
Se l’opera non è protetta da DRM il discorso è un po’ più complicato: in
UE, il contratto di licenza non può escludere certi usi consentiti, in
particolare il text and data mining per scopi non commerciali. Può però
escludere lo stesso utilizzo se per scopi commerciali e se l’uso è
espressamente riservato. In USA non ci sono regole precise, ma la libertà
contrattuale tende di solito a prevalere sulla disponibilità di eccezioni
(fair use). Non è un caso che nella class action contro GitHub / Copilot i
claim si basino interamente su violazione dei contratti di licenza (open
source) e sulla rimozione dei DRM, anziché sulla violazione del copyright
nel software utilizzato per addestrare l’algoritmo.
Un caro saluto
Maurizio



On Fri, 29 Sep 2023 at 15:21, Stefano Quintarelli <stef...@quintarelli.it>
wrote:

> Ho una domanda per i giuristi (anzi, piu' di una)
>
> per allenare un modello, ho bisogno di un file con la versione digitale di
> un testo.
> (cosnsidero ovviamente testi non PD, CC0, ecc.)
>
> la versione digitale di un testo la posso ottenere da un ebook (gia'
> digitale), togliendo
> il probabile DRM.
> ma un ebook non e' unbene ma e' un servizio soggetto a licenza d'uso,
> quindi se non e'
> prevista nella licenza d'uso la facolta' di estrarre il testo digitale per
> allenarci un
> modello, mi sembra che ci sia gia' una violazione della licenza, per cui,
> credo, non possa
> essere usato come base di un allenamento, tanto piu' se il fine di tale
> allenamento e'
> commerciale (se vendo un servizio basato su quel modello).
>
> se e' cosi', per allenare il mio modello  devo allora prednere il testo
> digitale facendo
> scan/ocr di un testo cartaceo.
> ma cio' e' possibile, se non erro, solo per uso personale e non
> commerciale.
>
> se questo e' corretto, non mi pare ci sia un modo per prendere un testo
> digitale senza
> infrangere una licenza d'uso/copyright
>
> dove e' la fallacia del ragionamento ?
>
> grazie, s.
>
> On 29/09/23 15:00, Stefano Borroni Barale wrote:
> > Buongiorno lista,
> >
> >> L'idea che istruire un modello su dei testi coperti da copyright sia
> una violazione del suddetto copyright è altamente opinabile
> >
> > Fin qui, ho l'impressione che tutti i legali in lista concorderanno.
> >
> >> ragionamento è in realtà abbastanza semplice: se istruirsi su un
> >> testo ne violasse il copyright, saremmo tutti dei criminali.
> >
> > Ma siccome noi siamo umani e quello che produciamo non è - salvo i
> discorsi dei politici(*) - ontologicamente identico alla produzione di
> esseri tecnici non viventi, logica vuole che quanto si applica a noi non
> possa applicarsi a un LLM, tanto quanto la legge sul copyright non si
> applica pedissequamente all'utilizzo di testi umani per creare modelli
> linguistici.
> >
> > Questo è il motivo per il quale tutti i tentativi di "proteggere via
> copyright" il prodotto di software generativi sono falliti miseramente, e
> con motivazioni scritte in sentenze; che per il diritto credo abbiano un
> peso assai maggiore del sito di CC.
> >
> > La mia impressione è che la questione terrà impegnati legali,
> informatici, filosofi e società ancora moooooolto a lungo.
> > SBB
> >
> > (*) Come sanno bene i bambini degli anni '80 che hanno giocato con
> questo spassoso giocattolo:
> https://www.enricodalbosco.it/giochi/tubolario/
> >
> >
> > Di quei testi
> >> non c'è fisicamente traccia all'interno dei modelli, non viene copiato
> >> niente. I modelli sono un'opera trasformativa di quei testi, non
> >> derivativa.
> >>
> >> Lo argomenta molto bene Creative Commons:
> >> https://creativecommons.org/2023/02/17/fair-use-training-generative-ai/
> >>
> >> Detto questo, cito le parole di un altro autore, Jeff Jarvis:
> >>
> https://www.facebook.com/jeff.jarvis/posts/pfbid0LMFeqdTYoxnGHQAZwp5HMmeeVqgMSjL2dkcwMcBojkb2cinBpgYTHyc7Fhq1B9NPl
> >>
> >> «I, for one, am not complaining about my books being in in large
> >> language model training sets. I write to enter ideas into public
> >> discourse. I prefer informed over ignorant AI. I believe it is fair
> >> use for anyone to read & use books for transformative work. In fact,
> >> I'd probably feel snubbed if my books were not there. I'm happy when
> >> they are in libraries. I'm fine that they're here.»
> >>
> >> Fabio
> >>
> >> Il giorno ven 29 set 2023 alle ore 07:52 Alberto Cammozzo via nexa
> >> nexa@server-nexa.polito.it ha scritto:
> >>
> >>>
> https://www.theguardian.com/australia-news/2023/sep/28/australian-books-training-ai-books3-stolen-pirated
> >>>
> >>> Thousands of books from some of Australia’s most celebrated authors
> have potentially been caught up in what Booker prize-winning novelist
> Richard Flanagan has called “the biggest act of copyright theft in history”.
> >>>
> >>> The works have allegedly been pirated by the US-based Books3 dataset
> and used to train generative AI for corporations such as Meta and Bloomberg.
> >>>
> >>> Flanagan, who found 10 of his works, including the multi-international
> award-winning 2013 novel The Narrow Road to the Deep North, on the Books3
> dataset, told Guardian Australia he was deeply shocked by the discovery
> made several days ago.
> >>>
> >>> “I felt as if my soul had been strip mined and I was powerless to stop
> it,” he said in a statement.
> >>>
> >>> “This is the biggest act of copyright theft in history.”
> >>>
> >>> AI could ‘turbo-charge fraud’ and be monopolised by tech companies,
> Andrew Leigh warns
> >>>
> >>> The Australian Publishers Association confirmed to Guardian Australia
> on Wednesday that as many as 18,000 fiction and nonfiction titles with
> Australian ISBNs (unique international standard book numbers) appeared to
> be affected by the copyright infringement, although it is not yet clear
> what proportion of these are Australian editions of internationally
> authored books.
> >>>
> >>> “We’re still working through [the data] to work out the impact in
> terms of Australian authors,” APA spokesperson Stuart Glover said.
> >>>
> >>> “This is a massive legal and ethical challenge for the publishing
> industry and for authors globally.”
> >>>
> >>> A search tool published on Monday by US media platform The Atlantic
> and uploaded by the US Authors Guild on Wednesday revealed the works of
> Peter Carey, Helen Garner, Kate Grenville, Anna Funder, Christos Tsiolkas
> and Thomas Keneally, as well as Flanagan and dozens of other high-profile
> Australian authors, were included in the pirated dataset containing more
> than 180,000 titles.
> >>>
> >>> On Thursday, the Australian Society of Authors issued a statement
> saying it was “horrified” to learn that the works of Australian writers
> were being used to train artificial intelligence without permission from
> the authors.
> >>>
> >>> ASA chief executive, Olivia Lanchester, described the Books3 dataset
> as piracy on an industrial scale.
> >>>
> >>> “Authors appropriately feel outraged,” Lanchester said. “The fact is
> this technology relies upon books, journals, essays written by authors, yet
> permission was not sought nor compensation granted.”
> >>>
> >>> Lanchester said the Australian literary industry, while not objecting
> per se to emerging technologies such as AI, was deeply concerned about the
> lack of transparency evident in the development and monetisation of AI by
> global tech companies.
> >>>
> >>> “Turning a blind eye to the legitimate rights of copyright owners
> threatens to diminish already precarious creative careers,” she said.
> >>>
> >>> “The enrichment of a few powerful companies is at the cost of
> thousands of individual creators. This is not how a fair market functions.”
> >>>
> >>> Josephine Johnston, chief executive of Australia’s Copyright Agency,
> described the Books3 development as “a free kick to big tech” at the
> expense of Australia’s creative and cultural life.
> >>>
> >>> “We’re going to need greater transparency – how these tools have been
> developed, trained, how they operate – before people can truly understand
> what their legal rights might be,” she said.
> >>>
> >>> “We seem to be in this terrible position now where content owners –
> remembering that the vast majority of them will be individual authors – may
> actually have to take out court cases to enforce their rights.”
> >>>
> >>> Australian copyright law protects creators of original content from
> data scraping.
> >>>
> >>> Litigation in the US against ChatGPT creator OpenAI over use of
> allegedly pirated book datasets, Books1 and Books2 (which do not appear to
> be affiliated with Books3) has already commenced.
> >>>
> >>> In July, North American horror/fantasy writers Mona Awad (author of
> Bunny) and Paul Tremblay (author of The Cabin at the End of the World)
> filed a lawsuit in a San Francisco federal court, alleging ChatGPT
> unlawfully digested their books as part of its AI training data.
> >>>
> >>> On 28 August, OpenAI filed a motion to dismiss the lawsuit, arguing
> that the authors “misconceive the scope of copyright, failing to take into
> account the limitations and exceptions (including fair use) that properly
> leave room for innovations like the large language models now at the
> forefront of artificial intelligence”.
> >>>
> >>> On 19 September the Writers Guild and 17 of its members, including
> bestselling novelists John Grisham, George RR Martin and Jodi Picoult,
> filed a complaint in a New York district court against OpenAI, seeking
> redress for “flagrant and harmful infringements” of guild members’
> registered copyrights.
> >>>
> >>> In a statement on its website, the guild says while it is aware that
> companies such as Meta and Bloomberg have used the Books3 dataset to train
> their LLMs, it is not yet clear whether OpenAI is using Books3 to train its
> ChatGPT models GPT 3.5 or GPT 4.
> >>>
> >>> Democracies face ‘truth decay’ as AI blurs fact and fiction, warns
> head of Australia’s military
> >>>
> >>> Guardian Australia has sought comment from OpenAI, which has yet to
> officially respond to the guild’s complaint, and Meta.
> >>>
> >>> On 4 September, US technology magazine Wired reported that a Danish
> anti-piracy group called Rights Alliance had been told by Bloomberg that
> the company did not plan to train future versions of its BloombergGPT using
> Books3.
> >>>
> >>> Bloomberg declined to respond to the Guardian’s queries.
> >>>
> >>> The APA said the global nature of the issue would present significant
> challenges in enforcement and prosecution, and has joined the authors’
> society in calling for AI technologies to be regulated.
> >>>
> >>> Consultation closed last month for a Department of Industry, Science
> and Resources discussion paper on supporting responsible AI.
> >>>
> >>> A parliamentary inquiry is under way examining the use of generative
> artificial intelligence in the Australian education system.
> >>>
> >>> Flanagan said it was up to the Australian government to act to protect
> Australia’s writers.
> >>>
> >>> “It has power and we do not,” he said.
> >>>
> >>> “If it cares for our culture it must now stand up and fight for it.”
> >>>
> >>> _______________________________________________
> >>> nexa mailing list
> >>> nexa@server-nexa.polito.it
> >>> https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa
> >>
> >> _______________________________________________
> >> nexa mailing list
> >> nexa@server-nexa.polito.it
> >> https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa
> > _______________________________________________
> > nexa mailing list
> > nexa@server-nexa.polito.it
> > https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa
> _______________________________________________
> nexa mailing list
> nexa@server-nexa.polito.it
> https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa
>
_______________________________________________
nexa mailing list
nexa@server-nexa.polito.it
https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa

Reply via email to