Re: GPLed code on github (given the copilot controversy)

2021-07-13 Thread Daniele "Mte90" Scasciafratte
Also I experience something similar of CoPilot for the Mozilla Italia DeepSpeech Italian model (https://github.com/MozillaItalia/DeepSpeech-Italian-Model). When I studied how to deal with various audio+text/text-only italian datasets I talked a bit with other people of the Machine Learning com

Re: GPLed code on github (given the copilot controversy)

2021-07-13 Thread marc
Hi, me again So I am going to respond to multiple comments in one go: I had a look at Julia Reda's post, and as far as I can make out, she only focuses on the fact that individual snippets are very short - but doesn't make any mention that inserting *lots* snippets algorithmically is *all* that

Re: GPLed code on github (given the copilot controversy)

2021-07-13 Thread Michael Pöhn
Maybe future versions of GPL could cover this with an extended copyleft clause. I think it would be justified that software like copilot (including their datasets) also get GPL'd if they build on top of GPL source-code. br. Michael On 10.07.21 10:58, marc wrote: Hi The way I understand this

Re: GPLed code on github (given the copilot controversy)

2021-07-13 Thread jahoti
Hi, you've certainly raised some interesting and important questions, as well as some deep philosophical comments. Unfortunately they're (obviously) not all things I can help with; however, some clarifications might be useful: * The corpus of software is not part of the copilot "software" as

Re: GPLed code on github (given the copilot controversy)

2021-07-13 Thread Paul Boddie
On Monday, 12 July 2021 23:16:22 CEST marc wrote: > Hi, me again > > So I am going to respond to multiple comments in one go: > > I had a look at Julia Reda's post, and as far as I can > make out, she only focuses on the fact that individual > snippets are very short - but doesn't make any mentio

Re: GPLed code on github (given the copilot controversy)

2021-07-13 Thread Daniele "Mte90" Scasciafratte
I think that the point is mainly how much means a copyright issue to train a machine learning model that can recreate the original with a specific percentage of similarity bu chunks (and define what means a chunk for code). As example if GPL license says that you can use the 30% of code lines fo