I hadn’t revisited it yet, but it is possible to use cgroups to limit glusterfs’s cpu usage, might help you out.
Andrew Wklau has a blog post about it: http://www.andrewklau.com/controlling-glusterfsd-cpu-outbreaks-with-cgroups/ Careful about how far you throttle it down, if it’s your VMs disk it’s rebuilding, you’ll pause it anyway I’d expect. > On Apr 4, 2015, at 8:57 AM, Jorick Astrego <j.astr...@netbulae.eu> wrote: > > > > On 04/03/2015 10:04 PM, Alastair Neil wrote: >> Any follow up on this? >> >> Are there known issues using a replica 3 glsuter datastore with lvm thin >> provisioned bricks? >> >> On 20 March 2015 at 15:22, Alastair Neil <ajneil.t...@gmail.com >> <mailto:ajneil.t...@gmail.com>> wrote: >> CentOS 6.6 >> >> vdsm-4.16.10-8.gitc937927.el6 >> glusterfs-3.6.2-1.el6 >> 2.6.32 - 504.8.1.el6.x86_64 >> >> moved to 3.6 specifically to get the snapshotting feature, hence my desire >> to migrate to thinly provisioned lvm bricks. > > > Well on the glusterfs mailinglist there have been discussions: > > >> 3.6.2 is a major release and introduces some new features in cluster wide >> concept. Additionally it is not stable yet. > > > > >> >> >> On 20 March 2015 at 14:57, Darrell Budic <bu...@onholyground.com >> <mailto:bu...@onholyground.com>> wrote: >> What version of gluster are you running on these? >> >> I’ve seen high load during heals bounce my hosted engine around due to >> overall system load, but never pause anything else. Cent 7 combo >> storage/host systems, gluster 3.5.2. >> >> >>> On Mar 20, 2015, at 9:57 AM, Alastair Neil <ajneil.t...@gmail.com >>> <mailto:ajneil.t...@gmail.com>> wrote: >>> >>> Pranith >>> >>> I have run a pretty straightforward test. I created a two brick 50 G >>> replica volume with normal lvm bricks, and installed two servers, one >>> centos 6.6 and one centos 7.0. I kicked off bonnie++ on both to generate >>> some file system activity and then made the volume replica 3. I saw no >>> issues on the servers. >>> >>> Not clear if this is a sufficiently rigorous test and the Volume I have had >>> issues on is a 3TB volume with about 2TB used. >>> >>> -Alastair >>> >>> >>> On 19 March 2015 at 12:30, Alastair Neil <ajneil.t...@gmail.com >>> <mailto:ajneil.t...@gmail.com>> wrote: >>> I don't think I have the resources to test it meaningfully. I have about >>> 50 vms on my primary storage domain. I might be able to set up a small 50 >>> GB volume and provision 2 or 3 vms running test loads but I'm not sure it >>> would be comparable. I'll give it a try and let you know if I see similar >>> behaviour. >>> >>> On 19 March 2015 at 11:34, Pranith Kumar Karampuri <pkara...@redhat.com >>> <mailto:pkara...@redhat.com>> wrote: >>> Without thinly provisioned lvm. >>> >>> Pranith >>> >>> On 03/19/2015 08:01 PM, Alastair Neil wrote: >>>> do you mean raw partitions as bricks or simply with out thin provisioned >>>> lvm? >>>> >>>> >>>> >>>> On 19 March 2015 at 00:32, Pranith Kumar Karampuri <pkara...@redhat.com >>>> <mailto:pkara...@redhat.com>> wrote: >>>> Could you let me know if you see this problem without lvm as well? >>>> >>>> Pranith >>>> >>>> On 03/18/2015 08:25 PM, Alastair Neil wrote: >>>>> I am in the process of replacing the bricks with thinly provisioned lvs >>>>> yes. >>>>> >>>>> >>>>> >>>>> On 18 March 2015 at 09:35, Pranith Kumar Karampuri <pkara...@redhat.com >>>>> <mailto:pkara...@redhat.com>> wrote: >>>>> hi, >>>>> Are you using thin-lvm based backend on which the bricks are >>>>> created? >>>>> >>>>> Pranith >>>>> >>>>> On 03/18/2015 02:05 AM, Alastair Neil wrote: >>>>>> I have a Ovirt cluster with 6 VM hosts and 4 gluster nodes. There are >>>>>> two virtualisation clusters one with two nehelem nodes and one with >>>>>> four sandybridge nodes. My master storage domain is a GlusterFS backed >>>>>> by a replica 3 gluster volume from 3 of the gluster nodes. The engine >>>>>> is a hosted engine 3.5.1 on 3 of the sandybridge nodes, with storage >>>>>> broviede by nfs from a different gluster volume. All the hosts are >>>>>> CentOS 6.6. >>>>>> >>>>>> vdsm-4.16.10-8.gitc937927.el6 >>>>>> glusterfs-3.6.2-1.el6 >>>>>> 2.6.32 - 504.8.1.el6.x86_64 >>>>>> >>>>>> Problems happen when I try to add a new brick or replace a brick >>>>>> eventually the self heal will kill the VMs. In the VM's logs I see >>>>>> kernel hung task messages. >>>>>> >>>>>> Mar 12 23:05:16 static1 kernel: INFO: task nginx:1736 blocked for more >>>>>> than 120 seconds. >>>>>> Mar 12 23:05:16 static1 kernel: Not tainted >>>>>> 2.6.32-504.3.3.el6.x86_64 #1 >>>>>> Mar 12 23:05:16 static1 kernel: "echo 0 > >>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>>>> Mar 12 23:05:16 static1 kernel: nginx D 0000000000000001 0 >>>>>> 1736 1735 0x00000080 >>>>>> Mar 12 23:05:16 static1 kernel: ffff8800778b17a8 0000000000000082 >>>>>> 0000000000000000 00000000000126c0 >>>>>> Mar 12 23:05:16 static1 kernel: ffff88007e5c6500 ffff880037170080 >>>>>> 0006ce5c85bd9185 ffff88007e5c64d0 >>>>>> Mar 12 23:05:16 static1 kernel: ffff88007a614ae0 00000001722b64ba >>>>>> ffff88007a615098 ffff8800778b1fd8 >>>>>> Mar 12 23:05:16 static1 kernel: Call Trace: >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8152a885>] >>>>>> schedule_timeout+0x215/0x2e0 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8152a503>] >>>>>> wait_for_common+0x123/0x180 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff81064b90>] ? >>>>>> default_wake_function+0x0/0x20 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] ? >>>>>> _xfs_buf_read+0x46/0x60 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? >>>>>> xfs_trans_read_buf+0x197/0x410 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8152a61d>] >>>>>> wait_for_completion+0x1d/0x20 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa020ff5b>] >>>>>> xfs_buf_iowait+0x9b/0x100 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] ? >>>>>> xfs_trans_read_buf+0x197/0x410 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa0210a76>] >>>>>> _xfs_buf_read+0x46/0x60 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa0210b3b>] >>>>>> xfs_buf_read+0xab/0x100 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa02063c7>] >>>>>> xfs_trans_read_buf+0x197/0x410 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa01ee6a4>] >>>>>> xfs_imap_to_bp+0x54/0x130 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa01f077b>] >>>>>> xfs_iread+0x7b/0x1b0 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff811ab77e>] ? >>>>>> inode_init_always+0x11e/0x1c0 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa01eb5ee>] >>>>>> xfs_iget+0x27e/0x6e0 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae1d>] ? >>>>>> xfs_iunlock+0x5d/0xd0 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa0209366>] >>>>>> xfs_lookup+0xc6/0x110 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa0216024>] >>>>>> xfs_vn_lookup+0x54/0xa0 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8119dc65>] >>>>>> do_lookup+0x1a5/0x230 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8119e8f4>] >>>>>> __link_path_walk+0x7a4/0x1000 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff811738e7>] ? >>>>>> cache_grow+0x217/0x320 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8119f40a>] path_walk+0x6a/0xe0 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8119f61b>] >>>>>> filename_lookup+0x6b/0xc0 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff811a0747>] >>>>>> user_path_at+0x57/0xa0 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa0204e74>] ? >>>>>> _xfs_trans_commit+0x214/0x2a0 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffffa01eae3e>] ? >>>>>> xfs_iunlock+0x7e/0xd0 [xfs] >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff81193bc0>] >>>>>> vfs_fstatat+0x50/0xa0 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff811aaf5d>] ? >>>>>> touch_atime+0x14d/0x1a0 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff81193d3b>] vfs_stat+0x1b/0x20 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff81193d64>] >>>>>> sys_newstat+0x24/0x50 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff810e5c87>] ? >>>>>> audit_syscall_entry+0x1d7/0x200 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff810e5a7e>] ? >>>>>> __audit_syscall_exit+0x25e/0x290 >>>>>> Mar 12 23:05:16 static1 kernel: [<ffffffff8100b072>] >>>>>> system_call_fastpath+0x16/0x1b >>>>>> >>>>>> >>>>>> I am wondering if my volume settings are causing this. Can anyone with >>>>>> more knowledge take a look and let me know: >>>>>> >>>>>> network.remote-dio: on >>>>>> performance.stat-prefetch: off >>>>>> performance.io-cache: off >>>>>> performance.read-ahead: off >>>>>> performance.quick-read: off >>>>>> nfs.export-volumes: on >>>>>> network.ping-timeout: 20 >>>>>> cluster.self-heal-readdir-size: 64KB >>>>>> cluster.quorum-type: auto >>>>>> cluster.data-self-heal-algorithm: diff >>>>>> cluster.self-heal-window-size: 8 >>>>>> cluster.heal-timeout: 500 >>>>>> cluster.self-heal-daemon: on >>>>>> cluster.entry-self-heal: on >>>>>> cluster.data-self-heal: on >>>>>> cluster.metadata-self-heal: on >>>>>> cluster.readdir-optimize: on >>>>>> cluster.background-self-heal-count: 20 >>>>>> cluster.rebalance-stats: on >>>>>> cluster.min-free-disk: 5% >>>>>> cluster.eager-lock: enable >>>>>> storage.owner-uid: 36 >>>>>> storage.owner-gid: 36 >>>>>> auth.allow:* >>>>>> user.cifs: disable >>>>>> cluster.server-quorum-ratio: 51% >>>>>> >>>>>> Many Thanks, Alastair >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> Users@ovirt.org <mailto:Users@ovirt.org> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> <http://lists.ovirt.org/mailman/listinfo/users> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org <mailto:Users@ovirt.org> >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> <http://lists.ovirt.org/mailman/listinfo/users> >>>>> >>>>> >>>> >>>> >>> >>> >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org <mailto:Users@ovirt.org> >>> http://lists.ovirt.org/mailman/listinfo/users >>> <http://lists.ovirt.org/mailman/listinfo/users> >> >> >> > > >> >> >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org <mailto:Users@ovirt.org> >> http://lists.ovirt.org/mailman/listinfo/users >> <http://lists.ovirt.org/mailman/listinfo/users> > > > > > Met vriendelijke groet, With kind regards, > > Jorick Astrego > > Netbulae Virtualization Experts > Tel: 053 20 30 270 i...@netbulae.eu Staalsteden 4-3A KvK > 08198180 > Fax: 053 20 30 271 www.netbulae.eu 7547 TA Enschede BTW > NL821234584B01 > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users