On 10.04.19 15:59, Heiko Schocher wrote: > Hello Frieder, > > Am 10.04.2019 um 15:31 schrieb Schrempf Frieder: >> Hi Heiko, >> >> On 10.04.19 14:44, Heiko Schocher wrote: >>> Hello Frieder, >>> >>> Am 10.04.2019 um 12:49 schrieb Schrempf Frieder: >>>> Hi, >>>> >>>> I have a customer who has a NAND device with two MTD partitions and >>>> each > of the partitions contains one UBI volume with a UBIFS >>>> filesystem. >>> >>> Bad idea ... why? >>> >>> You may loose lifetime of the board, as UBI cannot use PEBs between the >>> 2 MTD partitions on the nand ... better you would have one big MTD >>> Partition >>> with n ubi volumes in it ... >>> >>> But ... okay... this must work also. >> >> Yeah, I only recently learned about the disadvantages of this setup. >> Maybe we can change it in the future, but for now they are using >> separate partitions. > > Ok. > >>>> Now U-Boot can mount the UBIFS from the first partition just fine, but >>>> if the UBIFS from the second partition is mounted afterwards this fails >>>> in some cases. >>> >>> :-( >>> >>>> I can reproduce the error and tracked it down to uboot_ubifs_mount() in >>>> fs/ubifs/super.c. If this function is run for the second mount, the >>>> struct ubifs_fs_type is reused and it contains a list fs_supers, that >>>> still holds one entry for the first mount. >>> >>> Sure? >>> >>> fs_supers in struct file_system_type seems used only in none >>> U-Boot code... >> >> Right, I had a closer look and fs_supers seems to be unused indeed, but >> somehow it causes corruption in my case. When I apply 5a08cfee3967 >> (ubifs: remove useless code) the problem disappears. > > Ah, 5a08cfee3967 is in mainline ... good ;-) > >> Without this patch there still is hlist_add_head(&s->s_instances, >> &type->fs_supers) and this line somehow seems to cause the error. >> >> I applied this debug diff: >> >> --- a/fs/ubifs/super.c >> +++ b/fs/ubifs/super.c >> @@ -2428,7 +2428,15 @@ retry: >> #else >> strncpy(s->s_id, type->name, sizeof(s->s_id)); >> #endif >> + printf("%s:%d: ubi_num: %d, vol_id: %d\n", >> + __func__, __LINE__, >> + ((struct ubifs_info *)s->s_fs_info)->vi.ubi_num, >> + ((struct ubifs_info *)s->s_fs_info)->vi.vol_id); >> hlist_add_head(&s->s_instances, &type->fs_supers); >> + printf("%s:%d: ubi_num: %d, vol_id: %d\n", >> + __func__, __LINE__, >> + ((struct ubifs_info *)s->s_fs_info)->vi.ubi_num, >> + ((struct ubifs_info *)s->s_fs_info)->vi.vol_id); >> #ifndef __UBOOT__ >> spin_unlock(&sb_lock); >> get_filesystem(type); >> >> And I'm getting this for the first mount: >> >> sget:2431: ubi_num: 0, vol_id: 0 >> sget:2433: ubi_num: 0, vol_id: 0 >> >> And this for the second mount: >> >> sget:2431: ubi_num: 0, vol_id: 0 >> sget:2433: ubi_num: -1678121552, vol_id: -1678120656 > > Hmm... > >>>> I guess, that if the second mount would happen on a volume that is on >>>> the same MTD partition as the first volume, than this will work. The >>>> second entry is added to ubifs_fs_type.fs_supers. >>> >>> I cannot see this from looking into code ... so hard to say, but >>> I only looked into mainline code ... >> >> Yeah, I was probably wrong with these first wild guesses... > > ;-) > >>>> In my case however, the second entry being added to >>>> ubifs_fs_type.fs_supers is invalid and causes the mount error. >>>> >>>> Reinitializing the list in uboot_ubifs_mount() before each mount, >>>> solves >>>> the problem, but I guess that it will cause failures in other setups, >>>> where there are actually multiple volumes on one MTD device. >>>> >>>> So how can I solve this properly? Do we need one instance of struct >>>> ubifs_fs_type for each MTD device? >>> >>> Hmm.. without digging into it, it is difficult to say... >>> >>>> I tested this on an old version (2017.03), but looking at the current >>>> code, it looks like the same problem applies to current mainline. >>> >>> Is there any chance to try it with current mainline ? >> >> The problem is a bit strange and this what I'm actually worried about. >> It is persistent in a certain environment: U-Boot loaded from SPI NOR, >> environment set to certain values, data written to UBIFS partition in >> Linux and then power-cut. >> >> If one of these conditions changes, the error usually disappears, for >> example if I use the exact same setup, but load the Bootloader from MMC >> or RAM. Or if no write access with power-cut happens. >> >> So I wonder if there's some memory corruption somewhere else. Though, >> the error happens always at the same place. Debug prints or other code >> changes have no influence. >> >> I really would like to understand what's going on so I can make sure >> that 5a08cfee3967 actually solves the real issue or just hides it. > > Please keep me informed ... I try to reproduce it here on a hw I > have access to, but need some "free" time ...
So I got down to the roots of the problem. On the second mount operation, the fs_supers list contains a reference to the super_block of the previous mount, that has local scope and is not valid anymore. Upon calling hlist_add_head(&s->s_instances, &type->fs_supers), the new super_block is added to the list fs_supers as a second entry. As this is a double linked list, the first entries pprev pointer is updated, what causes a corruption. So 5a08cfee3967 is not just a simple cleanup change, as noted in the commit message. It also fixes a real bug by removing the update of the unused list. Regards, Frieder _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot