[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

Strahil Nikolov Sat, 28 Mar 2020 00:42:09 -0700

On March 28, 2020 3:21:45 AM GMT+02:00, Gianluca Cecchi 
<[email protected]> wrote:
>Hello,
>having deployed oVirt 4.3.9 single host HCI with Gluster, I see some
>times
>VM going into paused state for the error above and needing to manually
>run
>it (sometimes this resumal operation fails).
>Actually it only happened with empty disk (thin provisioned) and sudden
>high I/O during the initial phase of install of the OS; it didn't
>happened
>then during normal operaton (even with 600MB/s of throughput).
>I suspect something related to metadata extension not able to be in
>pair
>with the speed of the physical disk growing.... similar to what happens
>for
>block based storage domains where the LVM layer has to extend the
>logical
>volume representing the virtual disk
>
>My real world reproduction of the error is during install of OCP 4.3.8
>master node, when Red Hat Cores OS boots from network and wipes the
>disk
>and I think then transfer an image, so doing high immediate I/O.
>The VM used as master node has been created with a 120Gb thin
>provisioned
>disk (virtio-scsi type) and starts with disk just initialized and
>empty,
>going through PXE install.
>I get this line inside events for the VM
>
>Mar 27, 2020, 12:35:23 AM VM master01 has been paused due to unknown
>storage error.
>
>Here logs around the time frame above:
>
>- engine.log
>https://drive.google.com/file/d/1zpNo5IgFVTAlKXHiAMTL-uvaoXSNMVRO/view?usp=sharing
>
>- vdsm.log
>https://drive.google.com/file/d/1v8kR0N6PdHBJ5hYzEYKl4-m7v1Lb_cYX/view?usp=sharing
>
>Any suggestions?
>
>The disk of the VM is on vmstore storage domain and its gluster volume
>settings are:
>
>[root@ovirt tmp]# gluster volume info vmstore
>
>Volume Name: vmstore
>Type: Distribute
>Volume ID: a6203d77-3b9d-49f9-94c5-9e30562959c4
>Status: Started
>Snapshot Count: 0
>Number of Bricks: 1
>Transport-type: tcp
>Bricks:
>Brick1: ovirtst.mydomain.storage:/gluster_bricks/vmstore/vmstore
>Options Reconfigured:
>performance.low-prio-threads: 32
>storage.owner-gid: 36
>performance.read-ahead: off
>user.cifs: off
>storage.owner-uid: 36
>performance.io-cache: off
>performance.quick-read: off
>network.ping-timeout: 30
>features.shard: on
>network.remote-dio: off
>cluster.eager-lock: enable
>performance.strict-o-direct: on
>transport.address-family: inet
>nfs.disable: on
>[root@ovirt tmp]#
>
>What about config above, related to eventual optimizations to be done
>based
>on having single host?
>And comparing with the virt group of options:
>
>[root@ovirt tmp]# cat /var/lib/glusterd/groups/virt
>performance.quick-read=off
>performance.read-ahead=off
>performance.io-cache=off
>performance.low-prio-threads=32
>network.remote-dio=enable
>cluster.eager-lock=enable
>cluster.quorum-type=auto
>cluster.server-quorum-type=server
>cluster.data-self-heal-algorithm=full
>cluster.locking-scheme=granular
>cluster.shd-max-threads=8
>cluster.shd-wait-qlength=10000
>features.shard=on
>user.cifs=off
>cluster.choose-local=off
>client.event-threads=4
>server.event-threads=4
>performance.client-io-threads=on
>[root@ovirt tmp]#
>
>?
>
>Thanks Gianluca


Hi Gianluca,

Is it happening to machines with preallocated disks or on machines with thin 
disks ?

Best Regards,
Strahil Nikolov
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/LHLQUMBQSY6NPC6LBWJ2TLRNP3M7LZWZ/

[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

Reply via email to