Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?)

Simon Tournier Thu, 06 Apr 2023 07:54:35 -0700

Hi,

On Thu, 6 Apr 2023 at 15:41, Kyle <k...@posteo.net> wrote:


> I have only seen situations where the optimization is "too entailed with 
> randomness" when models are trained on proprietary GPUs with specific 
> settings. Otherwise, pseudo-random seeds are perfectly sufficient to remove 
> the indeterminism.

Feel free to pick real-world model using 15 billions of parameters and
then to train it again.  And if you succeed, feel free to train it
again to have bit-to-bit reproducibility.  Bah the cost (CPU or GPU
power and at the end the electricity, so real money) would not be
nothing and I am far to be convinced that paying this bill is worth,
reproducibility speaking.


> => 
> https://discourse.julialang.org/t/flux-reproducibility-of-gpu-experiments/62092

Ahah!  I am laughing when Julia is already not reproducible itself.

https://issues.guix.gnu.org/22304
https://issues.guix.gnu.org/47354

And upstream does not care much as you can see

https://github.com/JuliaLang/julia/issues/25900
https://github.com/JuliaLang/julia/issues/34753

Well, years ago Nicolô made a patch for improving but it has not been
merged yet.

For instance, some people are trying to have "reproducible" benchmark
of machine learning,

https://benchopt.github.io/

and last time I checked, they have good times and a lot of fun. ;-)
Well, I would be less confident than "pseudo-random seeds are
perfectly sufficient to remove the indeterminism". :-)


> Many people think that "ultimate" reproducibility is not a practical either. 
> It's always going to be easier in the short term to take shortcuts which make 
> conclusions dependent on secret sauce which few can understand.
>
> => https://hpc.guix.info/blog/2022/07/is-reproducibility-practical/

Depending on the size of the model, training it again is not
practical.  Similarly, the computation for predicting weather forecast
is not practically reproducible and no one is ready to put the amount
of money on the table to do so.  Instead, people are exchanging
dataset of pressure maps.

Bit-to-bit reproducibility is a mean for verifying the correctness
between some claim and what had concretely be done.  But that's not
the only mean.

Speaking about some scientific method point of view, it is false to
think that it is possible to reproduce all or that it is possible to
reproduce all.  Consider theoretical physics experiment by LHC; in
this case, the confidence in the result is not done using independent
bit-to-bit reproducibility but by as much as possible transparency of
all the stages.

Moreover, what Ludo wrote in this blog post is their own points of
view and for example I do not share all.  Anyway. :-)

For sure, bit-to-bit reproducible is not a end for trusting one result
but a mean among many others.  It is possible to have bit-to-bit
reproducible results that are wrong and other results impossible to
reproduce bit-to-bit that are correct.

Well, back to Julia, since part of Julia is not bit-to-bit
reproducible, does it mean that the scientific outputs generated using
Julia are not trustable?

All that said, if the re-computation of the weights is affordable
because the size of the model is affordable, yes for sure, we could
try.  But from my point of view, the re-computation of the weights
should not be blocking for inclusion.  What should be blocking is the
license of this data (weights).


>  From my point of view, pre-trained
> >weights should be considered as the output of a (numerical) experiment,
> >similarly as we include other experimental data (from genome to
> >astronomy dataset).
>
> I think its a stretch to consider a data compression as an experiment. In 
> experiments I am always finding mistakes which confuse the interpretation 
> hidden by prematurely compressing data, e.g. by taking inappropriate 
> averages. Don't confuse the actual experimental results with dubious data 
> processing steps.

I do not see where I speak about data compression.  Anyway. :-)

Well, I claim that data processing is an experiment.  There is no
"actual experiment" and "data processing".  It is a continuum.

Today, any instrument generating data does internally numerical
processing.  Other said, what you consider as your raw inputs is
considered as output by other, so by following the recursive problem,
the true original raw material is physical samples and that is what we
should package, i.e., we should send by post mail these physical
samples and then reproduce all.  Here, I am stretching. ;-)

The genomic references that we already packaged are also the result of
"data processing" that no one is redoing.  I do not see any difference
between the weights of machine learning models and these genomic
references; they are both generated data resulting from a experiment
(broad meaning).


Cheers,
simon

Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?)

Reply via email to