Hi Ilias, On Thu, 9 Dec 2021 at 08:37, Ilias Apalodimas <ilias.apalodi...@linaro.org> wrote: > > Hi Etienne, > > > [...] > > > > > + > > > > > + /* And now the replica */ > > > > > + ret = gpt_write_metadata_partition(desc, metadata, > > > > secondary_mpart); > > > > > + if (ret < 0) { > > > > > + log_err("Updating secondary metadata partition > > > > > failed\n"); > > > > > + return ret; > > > > > + } > > > > > > > > So shouldn't we do something about this case? The first partition was > > > > correctly written and the second failed. Now if the primary GPT somehow > > > > gets corrupted the device is now unusable. > > > > The replica is not there to overcome bitflips of the storage media. > > It's here to allow updates while reliable info a still available in > > the counter part. > > Sure but with this piece of code this assumption is broken. At the > point the secondary partition fails to write, you loose that > reliability. When the next update happens you are left with one valid > and one invalid partition of metadata. > > > The scheme could be to rely on only 1 instance of the fwu-metadata > > (sorry Simon) image is valid. > > A first load: load 1st instance, crap the second. > > At update: find the crapped one: write it with new data. Upon success > > crapped the alternate one. > > This is a suggestion. There are many ways to handle that. > > We could change to something like that, however this is not what's > currently happening. gpt_check_metadata_validity() is trying to check > and make sure both of the partitions are sane. If they aren't it > tries to recover those looking at a sane partition. So the question > for really is, should we do something *here* or rely on the fact that > the next update will try to fix the broken metadata. > > Cheers > /Ilias
I think the right sequence would be to check if 1 of the 2 mdat partitions is broken, update that first and return an error on failure, then update the one sane and emit a warning on failure. Cheers, etienne