Hello Jan, I want to test your pincpus I got from github.
I have a 2x CPU (X5550) with 4 core 16 threads system I have four OSD (4x WD1003FBYX) with SSD (SHFS37A) journal . I got three nodes like that. I am not sure how to configure prz-pincpus.conf # prz-pincpus.conf https://paste.debian.net/plainh/70d11f19 I do not have a /cgroup/cpuset/libvirt/cpuset.cpus or /etc/cgconfig.conf file. # mount | grep cg cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,size=12k) cgmfs on /run/cgmanager/fs type tmpfs (rw,relatime,size=100k,mode=755) systemd on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/x86_64-linux-gnu/systemd-shim-cgroup-release-agent,name=systemd) And some unrelated beachnmarks because I know you like them as well: https://paste.debian.net/plainh/0b5c159f # noop scheduler on all disks and ssd # write_cache = off on ssd # add_random =0 on ssd # performance governor # more ideas tips? Kind regards, Jelle On 27/07/15 14:21, Jan Schermer wrote: > Hi! > The /cgroup/* mount point is probably a RHEL6 thing, recent distributions > seem to use /sys/fs/cgroup like in your case (maybe because of systemd?). On > RHEL 6 the mount points are configured in /etc/cgconfig.conf and /cgroup is > the default. > > I also saw the pull request from you on github and I don’t think I’ll merge > it because creating the directory if the parent does not exist could mask the > non-existence of cgroups or a different mountpoint, so I think it’s better to > fail and leave it up to the admin to modify the script. > A more mature solution would probably be some sort of OS-specific integration > (automatic cgclassify rules, initscript-ed cgroup creation and such). When > this support is already in place maintainers only need to integrate it. In > newer distros a new kernel (scheduler) with more NUMA awareness and other > autotuning could do a better job than this script by default. > > And if any CEPH devs are listening: I saw an issue on CEPH tracker for cgroup > classification http://tracker.ceph.com/issues/12424 and I humbly advise you > not to do that - this will either turn into something distro-specific or it > will create an Inner Platform Effect on all distros that maintainers > downstream will need to replace with their own anyway. Of course since > Inktank is somewhat part of RedHat now it makes sense to integrate it into > RHOS, RHEV and CEPH packages for RHEL and make a profile for “tuned” or > whatever does the tuning magic. > > Btw has anybody else tried it? What are your results? We still use it and it > makes a big difference on NUMA systems, even bigger difference when mixed > with KVM guests on the same hardware. > > Thanks > Jan > > > >> On 27 Jul 2015, at 13:23, Saverio Proto <ziopr...@gmail.com> wrote: >> >> Hello Jan, >> >> I am testing your scripts, because we want also to test OSDs and VMs >> on the same server. >> >> I am new to cgroups, so this might be a very newbie question. >> In your script you always reference to the file >> /cgroup/cpuset/libvirt/cpuset.cpus >> >> but I have the file in /sys/fs/cgroup/cpuset/libvirt/cpuset.cpus >> >> I am working on Ubuntu 14.04 >> >> This difference comes from something special in your setup, or maybe >> because we are working on different Linux distributions ? >> >> Thanks for clarification. >> >> Saverio >> >> >> >> 2015-06-30 17:50 GMT+02:00 Jan Schermer <j...@schermer.cz>: >>> Hi all, >>> our script is available on GitHub >>> >>> https://github.com/prozeta/pincpus >>> >>> I haven’t had much time to do a proper README, but I hope the configuration >>> is self explanatory enough for now. >>> What it does is pin each OSD into the most “empty” cgroup assigned to a NUMA >>> node. >>> >>> Let me know how it works for you! >>> >>> Jan >>> >>> >>> On 30 Jun 2015, at 10:50, Huang Zhiteng <winsto...@gmail.com> wrote: >>> >>> >>> >>> On Tue, Jun 30, 2015 at 4:25 PM, Jan Schermer <j...@schermer.cz> wrote: >>>> >>>> Not having OSDs and KVMs compete against each other is one thing. >>>> But there are more reasons to do this >>>> >>>> 1) not moving the processes and threads between cores that much (better >>>> cache utilization) >>>> 2) aligning the processes with memory on NUMA systems (that means all >>>> modern dual socket systems) - you don’t want your OSD running on CPU1 with >>>> memory allocated to CPU2 >>>> 3) the same goes for other resources like NICs or storage controllers - >>>> but that’s less important and not always practical to do >>>> 4) you can limit the scheduling domain on linux if you limit the cpuset >>>> for your OSDs (I’m not sure how important this is, just best practice) >>>> 5) you can easily limit memory or CPU usage, set priority, with much >>>> greater granularity than without cgroups >>>> 6) if you have HyperThreading enabled you get the most gain when the >>>> workloads on the threads are dissimiliar - so to have the higher throughput >>>> you have to pin OSD to thread1 and KVM to thread2 on the same core. We’re >>>> not doing that because latency and performance of the core can vary >>>> depending on what the other thread is doing. But it might be useful to >>>> someone. >>>> >>>> Some workloads exhibit >100% performance gain when everything aligns in a >>>> NUMA system, compared to a SMP mode on the same hardware. You likely won’t >>>> notice it on light workloads, as the interconnects (QPI) are very fast and >>>> there’s a lot of bandwidth, but for stuff like big OLAP databases or other >>>> data-manipulation workloads there’s a huge difference. And with CEPH being >>>> CPU hungy and memory intensive, we’re seeing some big gains here just by >>>> co-locating the memory with the processes…. >>> >>> Could you elaborate a it on this? I'm interested to learn in what situation >>> memory locality helps Ceph to what extend. >>>> >>>> >>>> >>>> Jan >>>> >>>> >>>> >>>> On 30 Jun 2015, at 08:12, Ray Sun <xiaoq...@gmail.com> wrote: >>>> >>>> Sound great, any update please let me know. >>>> >>>> Best Regards >>>> -- Ray >>>> >>>> On Tue, Jun 30, 2015 at 1:46 AM, Jan Schermer <j...@schermer.cz> wrote: >>>>> >>>>> I promised you all our scripts for automatic cgroup assignment - they are >>>>> in our production already and I just need to put them on github, stay >>>>> tuned >>>>> tomorrow :-) >>>>> >>>>> Jan >>>>> >>>>> >>>>> On 29 Jun 2015, at 19:41, Somnath Roy <somnath....@sandisk.com> wrote: >>>>> >>>>> Presently, you have to do it by using tool like ‘taskset’ or ‘numactl’… >>>>> >>>>> Thanks & Regards >>>>> Somnath >>>>> >>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >>>>> Ray Sun >>>>> Sent: Monday, June 29, 2015 9:19 AM >>>>> To: ceph-users@lists.ceph.com >>>>> Subject: [ceph-users] How to use cgroup to bind ceph-osd to a specific >>>>> cpu core? >>>>> >>>>> Cephers, >>>>> I want to bind each of my ceph-osd to a specific cpu core, but I didn't >>>>> find any document to explain that, could any one can provide me some >>>>> detailed information. Thanks. >>>>> >>>>> Currently, my ceph is running like this: >>>>> >>>>> oot 28692 1 0 Jun23 ? 00:37:26 /usr/bin/ceph-mon -i >>>>> seed.econe.com --pid-file /var/run/ceph/mon.seed.econe.com.pid -c >>>>> /etc/ceph/ceph.conf --cluster ceph >>>>> root 40063 1 1 Jun23 ? 02:13:31 /usr/bin/ceph-osd -i 0 >>>>> --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>> root 42096 1 0 Jun23 ? 01:33:42 /usr/bin/ceph-osd -i 1 >>>>> --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>> root 43263 1 0 Jun23 ? 01:22:59 /usr/bin/ceph-osd -i 2 >>>>> --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>> root 44527 1 0 Jun23 ? 01:16:53 /usr/bin/ceph-osd -i 3 >>>>> --pid-file /var/run/ceph/osd.3.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>> root 45863 1 0 Jun23 ? 01:25:18 /usr/bin/ceph-osd -i 4 >>>>> --pid-file /var/run/ceph/osd.4.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>> root 47462 1 0 Jun23 ? 01:20:36 /usr/bin/ceph-osd -i 5 >>>>> --pid-file /var/run/ceph/osd.5.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>> >>>>> Best Regards >>>>> -- Ray >>>>> >>>>> ________________________________ >>>>> >>>>> PLEASE NOTE: The information contained in this electronic mail message is >>>>> intended only for the use of the designated recipient(s) named above. If >>>>> the >>>>> reader of this message is not the intended recipient, you are hereby >>>>> notified that you have received this message in error and that any review, >>>>> dissemination, distribution, or copying of this message is strictly >>>>> prohibited. If you have received this communication in error, please >>>>> notify >>>>> the sender by telephone or e-mail (as shown above) immediately and destroy >>>>> any and all copies of this message in your possession (whether hard copies >>>>> or electronically stored copies). >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >>> >>> >>> -- >>> Regards >>> Huang Zhiteng >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com