Hello,
After updating one of our OpenVZ VPS hosting nodes at the end of last
week, we've started to have issues with corruption apparently occurring
inside containers. Issues of this nature have never affected the node
previously, and there do not appear to be any hardware issues that could
explain this.
Specifically, a few hours after updating, we began to see containers
experiencing errors such as this in the logs:
[90471.678994] EXT4-fs (ploop35454p1): error count since last fsck: 25
[90471.679022] EXT4-fs (ploop35454p1): initial error at time 1593205255:
ext4_ext_find_extent:904: inode 136399
[90471.679030] EXT4-fs (ploop35454p1): last error at time 1593232922:
ext4_ext_find_extent:904: inode 136399
[95189.954569] EXT4-fs (ploop42983p1): error count since last fsck: 67
[95189.954582] EXT4-fs (ploop42983p1): initial error at time 1593210174:
htree_dirblock_to_tree:918: inode 926441: block 3683060
[95189.954589] EXT4-fs (ploop42983p1): last error at time 1593276902:
ext4_iget:4435: inode 1849777
[95714.207432] EXT4-fs (ploop60706p1): error count since last fsck: 42
[95714.207447] EXT4-fs (ploop60706p1): initial error at time 1593210489:
ext4_ext_find_extent:904: inode 136272
[95714.207452] EXT4-fs (ploop60706p1): last error at time 1593231063:
ext4_ext_find_extent:904: inode 136272
Shutting the containers down and manually mounting and e2fsck'ing their
filesystems did clear these errors, but each of the containers (which were
mostly used for running Plesk) had widespread issues with corrupt or
missing files after the fsck's completed, necessitating their being
restored from backup.
Concurrently, we also began to see messages like this appearing in
/var/log/vzctl.log, which again have never appeared at any point prior to
this update being installed:
/var/log/vzctl.log:2020-06-26T21:05:19+0100 : Error in fill_hole (check.c:240):
Warning: ploop image '/vz/private/8288448/root.hdd/root.hds' is sparse
/var/log/vzctl.log:2020-06-26T21:09:41+0100 : Error in fill_hole (check.c:240):
Warning: ploop image '/vz/private/8288450/root.hdd/root.hds' is sparse
/var/log/vzctl.log:2020-06-26T21:16:22+0100 : Error in fill_hole (check.c:240):
Warning: ploop image '/vz/private/8288451/root.hdd/root.hds' is sparse
/var/log/vzctl.log:2020-06-26T21:19:57+0100 : Error in fill_hole (check.c:240):
Warning: ploop image '/vz/private/8288452/root.hdd/root.hds' is sparse
The basic procedure we follow when updating our nodes is as follows:
1, Update the standby node we keep spare for this process
2. vzmigrate all containers from the live node being updated to the
standby node
3. Update the live node
4. Reboot the live node
5. vzmigrate the containers from the standby node back to the live node
they originally came from
So the only tool which has been used to affect these containers is
'vzmigrate' itself, so I'm at something of a loss as to how to explain
the root.hdd images for these containers containing sparse gaps. This is
something we have never done, as we have always been aware that OpenVZ
does not support their use inside a container's hard drive image. And the
fact that these images have suddenly become sparse at the same time they
have started to exhibit filesystem corruption is somewhat concerning.
We can restore all affected containers from backups, but I wanted to get
in touch with the list to see if anyone else at any other site has
experienced these or similar issues after applying the 7.0.14 (136)
update.
Thank you,
Kevin Drysdale.
_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users