Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?)

Csepp Wed, 12 Apr 2023 02:44:26 -0700


Nathan Dehnel <ncdeh...@gmail.com> writes:


>  a) Bit-identical re-train of ML models is similar to #2; other said
>     that bit-identical re-training of ML model weights does not protect
>     much against biased training.  The only protection against biased
>     training is by human expertise.
>
> Yeah, I didn't mean to give the impression that I thought
> bit-reproducibility was the silver bullet for AI backdoors with that
> analogy. I guess my argument is this: if they release the training
> info, either 1) it does not produce the bias/backdoor of the trained
> model, so there's no problem, or 2) it does, in which case an expert
> will be able to look at it and go "wait, that's not right", and will
> raise an alarm, and it will go public. The expert does not need to be
> affiliated with guix, but guix will eventually hear about it. Similar
> to how a normal security vulnerability works.
>
>  b) The resources (human, financial, hardware, etc.) for re-training is,
>     for most of the cases, not affordable.  Not because it would be
>     difficult or because the task is complex, this is covered by the
>     point a), no it is because the requirements in term of resources is
>     just to high.
>
> Maybe distributed substitutes could change that equation?

Probably not, it would require distributed *builds*.  Right now Guix
can't even use distcc, so it definitely can't use remote GPUs.

Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?)

Reply via email to