Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?)

Simon Tournier Fri, 07 Apr 2023 03:54:01 -0700

Hi,

On ven., 07 avril 2023 at 00:50, Nathan Dehnel <ncdeh...@gmail.com> wrote:

> I am uncomfortable with including ML models without their training
> data available. It is possible to hide backdoors in them.
> https://www.quantamagazine.org/cryptographers-show-how-to-hide-invisible-backdoors-in-ai-20230302/

Thanks for pointing this article! And some non-mathematical part of the
original article [1] are also worth to give a look. :-)

First please note that we are somehow in the case “The Open Box”, IMHO:

But what if a company knows exactly what kind of model it wants,
and simply lacks the computational resources to train it? Such a
company would specify what network architecture and training
procedure to use, and it would examine the trained model
closely.

And yeah there is nothing new ;-) when one says that the result could be
biased by the person that produced the data. Yeah, we have to trust the
trainer as we are trusting the people who generated “biased” (*) genomic
references.

Well, it is very interesting – and scary – to see how to theoretically
exploit “misclassify adversarial examples“ as described e.g. by [2].

This raises questions about “Verifiable Delegation of Learning”.

>From my point of view, the tackle of such biased weights is not via
re-learning because how to draw the line between biased weights,
mistakes on their side, mistakes on our side, etc. and it requires a
high level of expertise to complete a full re-learning. Instead, it
should come from the ML community that should standardize formal methods
for verifying that the training had not been biased, IMHO.

2: https://arxiv.org/abs/1412.6572

(*) biased genomic references, for one example among many others:

Relatedly, reports have persisted of major artifacts that arise
when identifying variants relative to GRCh38, such as an
apparent imbalance between insertions and deletions (indels)
arising from systematic mis-assemblies in GRCh38
[15–17]. Overall, these errors and omissions in GRCh38 introduce
biases in genomic analyses, particularly in centromeres,
satellites, and other complex regions.

https://doi.org/10.1101/2021.07.12.452063

Cheers,
simon

Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?)

Reply via email to