Description of problem:

Intermittent VM pause and Qcow image corruption after add new bricks.

I'm suffered an issue on image corruption on oVirt 4.3 caused by default 
gluster ovirt profile, and intermittent VM pause. the problem is similar to 
#2246 #2254 in glusterfs issue and VM pause issue report in ovirt user group. 
The gluster vol did not have pending heal object, vol appear in good shape, xfs 
is healthy, no hardware issue. Sadly few VM have mystery corruption after new 
bricks added.

Afterwards, I try to simulate the problem with or without 
"cluster.lookup-optimize off" few time, but the problem is not 100% 
reproducible with lookup-optimize on, I got 1 of 3 attempt that able to 
reproduce it. It really depend on the workloads and cache status at that moment 
and the number of object after rebalance as well.

Also I tried to disable all sharding features, it ran very solid, write 
performance increase by far, no corruption, no VM pause when the gluster under 
stress.

So, here is a decision question on shard or not shard.

IMO, even recommendation document saying it break large file into smaller chunk 
that allow healing to complete faster, a larger file can spread over multiple 
bricks. But there are uncovered issue compared to full large file in this case, 
I'd like to further deep dive into the reason why recommend shard as default 
for oVirt? Especially from the reliability and performance perspective, 
sharding seems losing this end for ovirt/kvm workloads. Is it more appropriate 
to just tell ovirt user to ensure underlying single bricks shall be large 
enough to hold the largest chunk instead? Besides, anything i'm overlooked for 
the shard setting? I'm really doubt to enable sharding on the volume after 
disaster.
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/LFG6KMP7SQV6W3DEQ4AEFD5K2VX7L7AA/

Reply via email to