Hi, On Thu, 6 Apr 2023 at 15:41, Kyle <k...@posteo.net> wrote:
> I have only seen situations where the optimization is "too entailed with > randomness" when models are trained on proprietary GPUs with specific > settings. Otherwise, pseudo-random seeds are perfectly sufficient to remove > the indeterminism. Feel free to pick real-world model using 15 billions of parameters and then to train it again. And if you succeed, feel free to train it again to have bit-to-bit reproducibility. Bah the cost (CPU or GPU power and at the end the electricity, so real money) would not be nothing and I am far to be convinced that paying this bill is worth, reproducibility speaking. > => > https://discourse.julialang.org/t/flux-reproducibility-of-gpu-experiments/62092 Ahah! I am laughing when Julia is already not reproducible itself. https://issues.guix.gnu.org/22304 https://issues.guix.gnu.org/47354 And upstream does not care much as you can see https://github.com/JuliaLang/julia/issues/25900 https://github.com/JuliaLang/julia/issues/34753 Well, years ago Nicolô made a patch for improving but it has not been merged yet. For instance, some people are trying to have "reproducible" benchmark of machine learning, https://benchopt.github.io/ and last time I checked, they have good times and a lot of fun. ;-) Well, I would be less confident than "pseudo-random seeds are perfectly sufficient to remove the indeterminism". :-) > Many people think that "ultimate" reproducibility is not a practical either. > It's always going to be easier in the short term to take shortcuts which make > conclusions dependent on secret sauce which few can understand. > > => https://hpc.guix.info/blog/2022/07/is-reproducibility-practical/ Depending on the size of the model, training it again is not practical. Similarly, the computation for predicting weather forecast is not practically reproducible and no one is ready to put the amount of money on the table to do so. Instead, people are exchanging dataset of pressure maps. Bit-to-bit reproducibility is a mean for verifying the correctness between some claim and what had concretely be done. But that's not the only mean. Speaking about some scientific method point of view, it is false to think that it is possible to reproduce all or that it is possible to reproduce all. Consider theoretical physics experiment by LHC; in this case, the confidence in the result is not done using independent bit-to-bit reproducibility but by as much as possible transparency of all the stages. Moreover, what Ludo wrote in this blog post is their own points of view and for example I do not share all. Anyway. :-) For sure, bit-to-bit reproducible is not a end for trusting one result but a mean among many others. It is possible to have bit-to-bit reproducible results that are wrong and other results impossible to reproduce bit-to-bit that are correct. Well, back to Julia, since part of Julia is not bit-to-bit reproducible, does it mean that the scientific outputs generated using Julia are not trustable? All that said, if the re-computation of the weights is affordable because the size of the model is affordable, yes for sure, we could try. But from my point of view, the re-computation of the weights should not be blocking for inclusion. What should be blocking is the license of this data (weights). > From my point of view, pre-trained > >weights should be considered as the output of a (numerical) experiment, > >similarly as we include other experimental data (from genome to > >astronomy dataset). > > I think its a stretch to consider a data compression as an experiment. In > experiments I am always finding mistakes which confuse the interpretation > hidden by prematurely compressing data, e.g. by taking inappropriate > averages. Don't confuse the actual experimental results with dubious data > processing steps. I do not see where I speak about data compression. Anyway. :-) Well, I claim that data processing is an experiment. There is no "actual experiment" and "data processing". It is a continuum. Today, any instrument generating data does internally numerical processing. Other said, what you consider as your raw inputs is considered as output by other, so by following the recursive problem, the true original raw material is physical samples and that is what we should package, i.e., we should send by post mail these physical samples and then reproduce all. Here, I am stretching. ;-) The genomic references that we already packaged are also the result of "data processing" that no one is redoing. I do not see any difference between the weights of machine learning models and these genomic references; they are both generated data resulting from a experiment (broad meaning). Cheers, simon