Hello,

Thanks to all who have replied to this thread so far - my apologies for taking so long to get back to you all.

In terms of where I'm seeing the EXT4 errors, they are showing up in the kernel log on the node itself, so the output of 'dmesg' is regularly seeing entries such as these:

[375095.199203] EXT4-fs (ploop43209p1): Remounting filesystem read-only
[375095.199267] EXT4-fs error (device ploop43209p1) in 
ext4_ext_remove_space:3073: IO failure
[375095.199400] EXT4-fs error (device ploop43209p1) in ext4_ext_truncate:4692: 
IO failure
[375095.199517] EXT4-fs error (device ploop43209p1) in 
ext4_reserve_inode_write:5358: Journal has aborted
[375095.199637] EXT4-fs error (device ploop43209p1) in ext4_truncate:4145: 
Journal has aborted
[375095.199779] EXT4-fs error (device ploop43209p1) in 
ext4_reserve_inode_write:5358: Journal has aborted
[375095.199957] EXT4-fs error (device ploop43209p1) in ext4_orphan_del:2731: 
Journal has aborted
[375095.200138] EXT4-fs error (device ploop43209p1) in 
ext4_reserve_inode_write:5358: Journal has aborted
[461642.709690] EXT4-fs (ploop43209p1): error count since last fsck: 8
[461642.709702] EXT4-fs (ploop43209p1): initial error at time 1593576601: 
ext4_ext_remove_space:3000: inode 136354
[461642.709708] EXT4-fs (ploop43209p1): last error at time 1593576601: 
ext4_reserve_inode_write:5358: inode 136354

Inside the container itself, not much is being logged, since the affected container in in this particular instance is indeed (as per the errors above) mounted read-only due to the errors its root.hdd filesystem is experiencing.

Having dug a bit more into what happened here, I suspect that this corruption may have come about when the containers were being moved either to or from the standby node and the live node, but I can't be 100% sure of that.

The picture is further muddied in that the standby node (the node that we used for evacuating containers from the node to be updated) was itself initially updated to 7.0.14 (135). However, the live node (which was updated a short time after the standby node) appears to have got 7.0.14 (136). So I don't know if the issue was in fact with 7.0.14 (135) (which was on the standby node, where the containers would have been moved to, and moved back from), or on 7.0.14 (136) on the live node. Were there any known issues with 7.0.14 (135) that might correlate with what I'm seeing above ?

Anyway, once again, thanks to everyone who has replied so far. If anyone has any further questions or would like any further information, please let me know and I will be happy to assist.

Thank you,
Kevin Drysdale.


On Thu, 2 Jul 2020, Jehan PROCACCIA wrote:

yes , you are right, I do get the same virtuozzo-release  as mentioned in the 
initial subject, sorry for the noise .

# cat /etc/virtuozzo-release
OpenVZ release 7.0.14 (136)

but anyway, I don't see any ploop / fsck error in the host /var/log/vzctl.log
inside the CT , where did you see those errors ?

Jehan .

_____________________________________________________________________________________________________________________________________________________
De: "jjs - mainphrame" <j...@mainphrame.com>
À: "OpenVZ users" <users@openvz.org>
Envoyé: Jeudi 2 Juillet 2020 19:33:23
Objet: Re: [Users] Issues after updating to 7.0.14 (136)

Thanks for that sanity check, the conundrum is resolved. vzlinux-release and 
virtuozzo-release are indeed different things.
Jake

On Thu, Jul 2, 2020 at 10:27 AM Jonathan Wright <jonat...@knownhost.com> wrote:

      /etc/redhat-release and /etc/virtuozzo-release are two different things.

      On 7/2/20 12:16 PM, jjs - mainphrame wrote:
      Jehan - 

      I get the same output here -

      [root@annie ~]# yum repolist  |grep virt
      virtuozzolinux-base    VirtuozzoLinux Base                            
15,415+189
      virtuozzolinux-updates VirtuozzoLinux Updates                             
     0

      I'm baffled as to how you're on 7.8.0 while I'm at 7.0,15 even though I'm 
fully up to date.

      # uname -a
      Linux annie.ufcfan.org 3.10.0-1127.8.2.vz7.151.10 #1 SMP Mon Jun 1 
19:05:52 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux

Jake

On Thu, Jul 2, 2020 at 10:08 AM Jehan PROCACCIA <jehan.procac...@imtbs-tsp.eu> 
wrote:
      no factory , just repos virtuozzolinux-base and openvz-os

# yum repolist  |grep virt
virtuozzolinux-base    VirtuozzoLinux Base                            15 415+189
virtuozzolinux-updates VirtuozzoLinux Updates                                  0

Jehan .

_____________________________________________________________________________________________________________________________________________________
De: "jjs - mainphrame" <j...@mainphrame.com>
À: "OpenVZ users" <users@openvz.org>
Cc: "Kevin Drysdale" <kevin.drysd...@iomart.com>
Envoyé: Jeudi 2 Juillet 2020 18:22:33
Objet: Re: [Users] Issues after updating to 7.0.14 (136)

Jehan, are you running factory?

My ovz hosts are up to date, and I see:

[root@annie ~]# cat /etc/virtuozzo-release
OpenVZ release 7.0.15 (222)

Jake


On Thu, Jul 2, 2020 at 9:08 AM Jehan Procaccia IMT 
<jehan.procac...@imtbs-tsp.eu> wrote:
      "updating to 7.0.14 (136)" !?

I did an update yesterday , I am far behind that version

# cat /etc/vzlinux-release
Virtuozzo Linux release 7.8.0 (609)

# uname -a
Linux localhost 3.10.0-1127.8.2.vz7.151.14 #1 SMP Tue Jun 9 12:58:54 MSK 2020 
x86_64 x86_64 x86_64 GNU/Linux

why don't you try to update to latest version ?


Le 29/06/2020 à 12:30, Kevin Drysdale a écrit :
      Hello,

      After updating one of our OpenVZ VPS hosting nodes at the end of last 
week, we've started to have issues with
      corruption apparently occurring inside containers.  Issues of this nature 
have never affected the node
      previously, and there do not appear to be any hardware issues that could 
explain this.

      Specifically, a few hours after updating, we began to see containers 
experiencing errors such as this in the
      logs:

      [90471.678994] EXT4-fs (ploop35454p1): error count since last fsck: 25
      [90471.679022] EXT4-fs (ploop35454p1): initial error at time 1593205255: 
ext4_ext_find_extent:904: inode 136399
      [90471.679030] EXT4-fs (ploop35454p1): last error at time 1593232922: 
ext4_ext_find_extent:904: inode 136399
      [95189.954569] EXT4-fs (ploop42983p1): error count since last fsck: 67
      [95189.954582] EXT4-fs (ploop42983p1): initial error at time 1593210174: 
htree_dirblock_to_tree:918: inode
      926441: block 3683060
      [95189.954589] EXT4-fs (ploop42983p1): last error at time 1593276902: 
ext4_iget:4435: inode 1849777
      [95714.207432] EXT4-fs (ploop60706p1): error count since last fsck: 42
      [95714.207447] EXT4-fs (ploop60706p1): initial error at time 1593210489: 
ext4_ext_find_extent:904: inode 136272
      [95714.207452] EXT4-fs (ploop60706p1): last error at time 1593231063: 
ext4_ext_find_extent:904: inode 136272

      Shutting the containers down and manually mounting and e2fsck'ing their 
filesystems did clear these errors, but
      each of the containers (which were mostly used for running Plesk) had 
widespread issues with corrupt or missing
      files after the fsck's completed, necessitating their being restored from 
backup.

      Concurrently, we also began to see messages like this appearing in 
/var/log/vzctl.log, which again have never
      appeared at any point prior to this update being installed:

      /var/log/vzctl.log:2020-06-26T21:05:19+0100 : Error in fill_hole 
(check.c:240): Warning: ploop image
      '/vz/private/8288448/root.hdd/root.hds' is sparse
      /var/log/vzctl.log:2020-06-26T21:09:41+0100 : Error in fill_hole 
(check.c:240): Warning: ploop image
      '/vz/private/8288450/root.hdd/root.hds' is sparse
      /var/log/vzctl.log:2020-06-26T21:16:22+0100 : Error in fill_hole 
(check.c:240): Warning: ploop image
      '/vz/private/8288451/root.hdd/root.hds' is sparse
      /var/log/vzctl.log:2020-06-26T21:19:57+0100 : Error in fill_hole 
(check.c:240): Warning: ploop image
      '/vz/private/8288452/root.hdd/root.hds' is sparse

      The basic procedure we follow when updating our nodes is as follows:

      1, Update the standby node we keep spare for this process
      2. vzmigrate all containers from the live node being updated to the 
standby node
      3. Update the live node
      4. Reboot the live node
      5. vzmigrate the containers from the standby node back to the live node 
they originally came from

      So the only tool which has been used to affect these containers is 
'vzmigrate' itself, so I'm at something of a
      loss as to how to explain the root.hdd images for these containers 
containing sparse gaps.  This is something we
      have never done, as we have always been aware that OpenVZ does not 
support their use inside a container's hard
      drive image.  And the fact that these images have suddenly become sparse 
at the same time they have started to
      exhibit filesystem corruption is somewhat concerning.

      We can restore all affected containers from backups, but I wanted to get 
in touch with the list to see if anyone
      else at any other site has experienced these or similar issues after 
applying the 7.0.14 (136) update.

      Thank you,
      Kevin Drysdale.




      _______________________________________________
      Users mailing list
      Users@openvz.org
      https://lists.openvz.org/mailman/listinfo/users


_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users

--
Jonathan Wright
KnownHost, LLC
https://www.knownhost.com
_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users






_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users

Reply via email to