Hi Andy, Thanks for you comments.
On 2019-05-23 09:28, Andy Simpkins wrote: > Your wording "The model /should/be reproducible with a fixed random seed." > feels > correct but wonder if guidance notes along the following lines should be > added? > > *unless* we can reproduce the same results, from the same training data, > you cannot classify as group 1, "Free Model", because verification that > training has been carried out on the dataset explicitly licensed under a > free software license can not be achieved. This should be treated as a > severe bug and the entire suite should be classified as group 2, > "ToxicCandy Model", until such time that verification is possible. Ummm... This is actually a bit cruel to upstream ... And I think there is still some misunderstanding. I've updated the document and made the following points clear: - "Numerically Reproducible" is the default reproduciblity definition in the context https://salsa.debian.org/lumin/deeplearning-policy#neural-network-reproducibility - A Free Model should be Numerically Reproducible, or at least a locally-trained model can reach similar performance (e.g. accuracy) compared to the original one. Similar results are acceptable. The bar "Identical" is not always reachable. - The datasets used for training a "ToxicCandy" may be private/non-free and not everybody can access them. (This case is more likely a result of problematic upstream licensing, but it sometimes happens). One got a free model from internet. That little candy tastes sweet. One wanted to make this candy at home with the provided recipe, but surprisingly found out that non-free ingredients are inevitable. -- ToxicCandy Is the updated document clearer?