Hi All, I'm experiencing huge issues when working with big VMs on Gluster volumes. Doing a Snapshot or removing a big Disk lead to the effect that the SPM node is getting non responsive. Fencing is than kicking in and taking the node down with the hard reset/reboot.
My setup has three nodes with 10Gbit/s NICs for the Gluster network. The Bricks are on Raid-6 with a 1GB cache on the raid controller and the volumes are setup as follows: Volume Name: data Type: Replicate Volume ID: c734d678-91e3-449c-8a24-d26b73bef965 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ovirt-node01-gfs.storage.lan:/gluster/brick2/data Brick2: ovirt-node02-gfs.storage.lan:/gluster/brick2/data Brick3: ovirt-node03-gfs.storage.lan:/gluster/brick2/data Options Reconfigured: features.barrier: disable cluster.granular-entry-heal: enable performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: on cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on server.event-threads: 4 client.event-threads: 4 It feel like the System looks up during snapshotting or removing of a big disk and this delay triggers things to go wrong. Is there anything that is not setup right on my gluster or is this behavior normal with bigger disks (50GB+) ? Is there a reliable option for caching with SSDs ? Thank you, Sven
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

