Hi MSavoritias, all, Let me provide more context.
The concern started couple of months ago, to my knowledge. And discussion is still on going. So I think that’s incorrect to say “any result for over 6 months”. Moreover, I feel you have a misunderstanding about HuggingFace and SWH partnership. From the reading of public information, HuggingFace and BigCode trains on a subset of SWH source code archive. I mean, it is a snapshot and to my knowledge, they provided the list of source code that had been used for training. Not to avoid the question but from a pragmatic point of view, one might ask if the source code you write and do not want to be included in the training dataset, if this source code is concretely part of that training dataset. HuggingFace is not training continuously with source code from SWH. And technically, SWH is an archive i.e., the code is not stored hot. I do not know and I have not read all details by HuggingFace of their method; i.e., which kind of data they process – independent unique files, complete repository, etc. What I know is that the piece when fetching from SWH is named SWH Vault; it requires to “cook” and prepare all the files that take times, from minutes to days. All that to say two key points: 1. People behind SWH are well-aware about various sides of the concerns. As said, they are long-time free software supporters. Be sure they have eared community concerns. Some discussions are still pending because as explained, all sides of ethical questions needs to be cautious. Please do not think it is ignored. 2. FWIW, I am in touch with SWH people – among other members from Guix community. For instance, in order to feed the discussion, Roberto from SWH pointed to me this blog point by Bruce Perens: https://perens.com/2019/10/12/invasion-of-the-ethical-licenses/ Well, I do not know if the outcome will be aligned with your current opinion, but be sure that your concerns as the others raised by Guix community members are taking into account. Cheers, simon