Re: [ceph-users] Ceph with Clos IP fabric
Hi, sorry for the delay, but in the meantime we were able to find a workaround. Inspired by this: > Side note: Configuring the loopback IP on the physical interfaces is > workable if you set it on **all** parallel links. Example with server1: > > > > “iface enp3s0f0 inet static > > address 10.10.100.21/32 > > iface enp3s0f1 inet static > > address 10.10.100.21/32 > > iface enp4s0f0 inet static > > address 10.10.100.21/32 > > iface enp4s0f1 inet static > > address 10.10.100.21/32” > > > > This should guarantee that the loopback ip is advertised if one of the 4 > links to switch1 and switch2 is up, but I am not sure if that’s workable > for ceph’s listening address. We added the loopback ip as well on lo as on dummy0. This solves the issue for us and the Ceph cluster works as intended. Regards Jan -- Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95 E-Mail: supp...@artfiles.de | Web: http://www.artfiles.de Geschäftsführer: Harald Oltmanns | Tim Evers Eingetragen im Handelsregister Hamburg - HRB 81478 signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] corrupted rbd filesystems since jewel
You are saying that you had v2 RBD images created against Hammer OSDs and client libraries where exclusive lock, object map, etc were never enabled. You then upgraded the OSDs and clients to Jewel and at some point enabled exclusive lock (and I'd assume object map) on these images -- or were the exclusive lock and object map features already enabled under Hammer? The fact that you encountered an object map error on an export operation is surprising to me. Does that error re-occur if you perform the export again? If you can repeat it, it would be very helpful if you could run the export with "--debug-rbd=20" and capture the generated logs. On Sat, May 6, 2017 at 2:38 PM, Stefan Priebe - Profihost AG wrote: > Hi, > > also i'm getting these errors only for pre jewel images: > > 2017-05-06 03:20:50.830626 7f7876a64700 -1 > librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating > object map in-memory > 2017-05-06 03:20:50.830634 7f7876a64700 -1 > librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating > object map on-disk > 2017-05-06 03:20:50.831250 7f7877265700 -1 > librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0 > > while running export-diff. > > Stefan > > Am 06.05.2017 um 07:37 schrieb Stefan Priebe - Profihost AG: >> Hello Json, >> >> while doing further testing it happens only with images created with >> hammer and that got upgraded to jewel AND got enabled exclusive lock. >> >> Greets, >> Stefan >> >> Am 04.05.2017 um 14:20 schrieb Jason Dillaman: >>> Odd. Can you re-run "rbd rm" with "--debug-rbd=20" added to the >>> command and post the resulting log to a new ticket at [1]? I'd also be >>> interested if you could re-create that >>> "librbd::object_map::InvalidateRequest" issue repeatably. >>>n >>> [1] http://tracker.ceph.com/projects/rbd/issues >>> >>> On Thu, May 4, 2017 at 3:45 AM, Stefan Priebe - Profihost AG >>> wrote: Example: # rbd rm cephstor2/vm-136-disk-1 Removing image: 99% complete... Stuck at 99% and never completes. This is an image which got corrupted for an unknown reason. Greets, Stefan Am 04.05.2017 um 08:32 schrieb Stefan Priebe - Profihost AG: > I'm not sure whether this is related but our backup system uses rbd > snapshots and reports sometimes messages like these: > 2017-05-04 02:42:47.661263 7f3316ffd700 -1 > librbd::object_map::InvalidateRequest: 0x7f3310002570 should_complete: r=0 > > Stefan > > > Am 04.05.2017 um 07:49 schrieb Stefan Priebe - Profihost AG: >> Hello, >> >> since we've upgraded from hammer to jewel 10.2.7 and enabled >> exclusive-lock,object-map,fast-diff we've problems with corrupting VM >> filesystems. >> >> Sometimes the VMs are just crashing with FS errors and a restart can >> solve the problem. Sometimes the whole VM is not even bootable and we >> need to import a backup. >> >> All of them have the same problem that you can't revert to an older >> snapshot. The rbd command just hangs at 99% forever. >> >> Is this a known issue - anythink we can check? >> >> Greets, >> Stefan >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] corrupted rbd filesystems since jewel
Hi, Am 08.05.2017 um 14:40 schrieb Jason Dillaman: > You are saying that you had v2 RBD images created against Hammer OSDs > and client libraries where exclusive lock, object map, etc were never > enabled. You then upgraded the OSDs and clients to Jewel and at some > point enabled exclusive lock (and I'd assume object map) on these > images Yes i did: for img in $(rbd -p cephstor5 ls -l | grep -v "@" | awk '{ print $1 }'); do rbd -p cephstor5 feature enable $img exclusive-lock,object-map,fast-diff || echo $img; done > -- or were the exclusive lock and object map features already > enabled under Hammer? No as they were not the rbd defaults. > The fact that you encountered an object map error on an export > operation is surprising to me. Does that error re-occur if you > perform the export again? If you can repeat it, it would be very > helpful if you could run the export with "--debug-rbd=20" and capture > the generated logs. No i can't repeat it. It happens every night but for different images. But i never saw it for a vm twice. If i do he export again it works fine. I'm doing an rbd export or an rbd export-diff --from-snap it depends on the VM and day since the last snapshot. Greets, Stefan > > On Sat, May 6, 2017 at 2:38 PM, Stefan Priebe - Profihost AG > wrote: >> Hi, >> >> also i'm getting these errors only for pre jewel images: >> >> 2017-05-06 03:20:50.830626 7f7876a64700 -1 >> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating >> object map in-memory >> 2017-05-06 03:20:50.830634 7f7876a64700 -1 >> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating >> object map on-disk >> 2017-05-06 03:20:50.831250 7f7877265700 -1 >> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0 >> >> while running export-diff. >> >> Stefan >> >> Am 06.05.2017 um 07:37 schrieb Stefan Priebe - Profihost AG: >>> Hello Json, >>> >>> while doing further testing it happens only with images created with >>> hammer and that got upgraded to jewel AND got enabled exclusive lock. >>> >>> Greets, >>> Stefan >>> >>> Am 04.05.2017 um 14:20 schrieb Jason Dillaman: Odd. Can you re-run "rbd rm" with "--debug-rbd=20" added to the command and post the resulting log to a new ticket at [1]? I'd also be interested if you could re-create that "librbd::object_map::InvalidateRequest" issue repeatably. n [1] http://tracker.ceph.com/projects/rbd/issues On Thu, May 4, 2017 at 3:45 AM, Stefan Priebe - Profihost AG wrote: > Example: > # rbd rm cephstor2/vm-136-disk-1 > Removing image: 99% complete... > > Stuck at 99% and never completes. This is an image which got corrupted > for an unknown reason. > > Greets, > Stefan > > Am 04.05.2017 um 08:32 schrieb Stefan Priebe - Profihost AG: >> I'm not sure whether this is related but our backup system uses rbd >> snapshots and reports sometimes messages like these: >> 2017-05-04 02:42:47.661263 7f3316ffd700 -1 >> librbd::object_map::InvalidateRequest: 0x7f3310002570 should_complete: >> r=0 >> >> Stefan >> >> >> Am 04.05.2017 um 07:49 schrieb Stefan Priebe - Profihost AG: >>> Hello, >>> >>> since we've upgraded from hammer to jewel 10.2.7 and enabled >>> exclusive-lock,object-map,fast-diff we've problems with corrupting VM >>> filesystems. >>> >>> Sometimes the VMs are just crashing with FS errors and a restart can >>> solve the problem. Sometimes the whole VM is not even bootable and we >>> need to import a backup. >>> >>> All of them have the same problem that you can't revert to an older >>> snapshot. The rbd command just hangs at 99% forever. >>> >>> Is this a known issue - anythink we can check? >>> >>> Greets, >>> Stefan >>> > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph memory overhead when used with KVM
Thanks. One more question: was the image a clone or a stand-alone image? On Fri, May 5, 2017 at 2:42 AM, nick wrote: > Hi, > I used one of the fio example files and changed it a bit: > > """ > # This job file tries to mimic the Intel IOMeter File Server Access Pattern > [global] > description=Emulation of Intel IOmeter File Server Access Pattern > randrepeat=0 > filename=/root/test.dat > # IOMeter defines the server loads as the following: > # iodepth=1 Linear > # iodepth=4 Very Light > # iodepth=8 Light > # iodepth=64Moderate > # iodepth=256 Heavy > iodepth=8 > size=80g > direct=0 > ioengine=libaio > > [iometer] > stonewall > bs=4M > rw=randrw > > [iometer_just_write] > stonewall > bs=4M > rw=write > > [iometer_just_read] > stonewall > bs=4M > rw=read > """ > > Then let it run: > $> while true; do fio stress.fio; rm /root/test.dat; done > > I had this running over a weekend. > > Cheers > Sebastian > > On Tuesday, May 02, 2017 02:51:06 PM Jason Dillaman wrote: >> Can you share the fio job file that you utilized so I can attempt to >> repeat locally? >> >> On Tue, May 2, 2017 at 2:51 AM, nick wrote: >> > Hi Jason, >> > thanks for your feedback. I did now some tests over the weekend to verify >> > the memory overhead. >> > I was using qemu 2.8 (taken from the Ubuntu Cloud Archive) with librbd >> > 10.2.7 on Ubuntu 16.04 hosts. I suspected the ceph rbd cache to be the >> > cause of the overhead so I just generated a lot of IO with the help of >> > fio in the VMs (with a datasize of 80GB) . All VMs had 3GB of memory. I >> > had to run fio multiple times, before reaching high RSS values. >> > I also noticed that when using larger blocksizes during writes (like 4M) >> > the memory overhead in the KVM process increased faster. >> > I ran several fio tests (one after another) and the results are: >> > >> > KVM with writeback RBD cache: max. 85% memory overhead (2.5 GB overhead) >> > KVM with writethrough RBD cache: max. 50% memory overhead >> > KVM without RBD caching: less than 10% overhead all the time >> > KVM with local storage (logical volume used): 8% overhead all the time >> > >> > I did not reach those >200% memory overhead results that we see on our >> > live >> > cluster, but those virtual machines have a way longer uptime as well. >> > >> > I also tried to reduce the RSS memory value with cache dropping on the >> > physical host and in the VM. Both did not lead to any change. A reboot of >> > the VM also does not change anything (reboot in the VM, not a new KVM >> > process). The only way to reduce the RSS memory value is a live migration >> > so far. Might this be a bug? The memory overhead sounds a bit too much >> > for me. >> > >> > Best Regards >> > Sebastian >> > >> > On Thursday, April 27, 2017 10:08:36 AM you wrote: >> >> I know we noticed high memory usage due to librados in the Ceph >> >> multipathd checker [1] -- the order of hundreds of megabytes. That >> >> client was probably nearly as trivial as an application can get and I >> >> just assumed it was due to large monitor maps being sent to the client >> >> for whatever reason. Since we changed course on our RBD iSCSI >> >> implementation, unfortunately the investigation into this high memory >> >> usage fell by the wayside. >> >> >> >> [1] >> >> http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=blob;f=libmult >> >> ip >> >> ath/checkers/rbd.c;h=9ea0572f2b5bd41b80bf2601137b74f92bdc7278;hb=HEAD >> >> >> >> On Thu, Apr 27, 2017 at 5:26 AM, nick wrote: >> >> > Hi Christian, >> >> > thanks for your answer. >> >> > The highest value I can see for a local storage VM in our >> >> > infrastructure >> >> > is a memory overhead of 39%. This is big, but the majority (>90%) of >> >> > our >> >> > local storage VMs are using less than 10% memory overhead. >> >> > For ceph storage based VMs this looks quite different. The highest >> >> > value I >> >> > can see currently is 244% memory overhead. So that specific allocated >> >> > 3GB >> >> > memory VM is using now 10.3 GB RSS memory on the physical host. This is >> >> > a >> >> > really huge value. In general I can see that the majority of the ceph >> >> > based VMs has more than 60% memory overhead. >> >> > >> >> > Maybe this is also a bug related to qemu+librbd. It would be just nice >> >> > to >> >> > know if other people are seeing those high values as well. >> >> > >> >> > Cheers >> >> > Sebastian >> >> > >> >> > On Thursday, April 27, 2017 06:10:48 PM you wrote: >> >> >> Hello, >> >> >> >> >> >> Definitely seeing about 20% overhead with Hammer as well, so not >> >> >> version >> >> >> specific from where I'm standing. >> >> >> >> >> >> While non-RBD storage VMs by and large tend to be closer the specified >> >> >> size, I've seen them exceed things by few % at times, too. >> >> >> For example a 4317968KB RSS one that ought to be 4GB. >> >> >> >> >> >> Regards, >> >> >> >> >> >> Christian >> >> >> >> >> >> On Thu, 27 Apr 2017 09:56:48 +0200 nick wrote: >> >> >> > Hi, >>
Re: [ceph-users] corrupted rbd filesystems since jewel
On 05/08/17 14:50, Stefan Priebe - Profihost AG wrote: > Hi, > Am 08.05.2017 um 14:40 schrieb Jason Dillaman: >> You are saying that you had v2 RBD images created against Hammer OSDs >> and client libraries where exclusive lock, object map, etc were never >> enabled. You then upgraded the OSDs and clients to Jewel and at some >> point enabled exclusive lock (and I'd assume object map) on these >> images > Yes i did: > for img in $(rbd -p cephstor5 ls -l | grep -v "@" | awk '{ print $1 }'); > do rbd -p cephstor5 feature enable $img > exclusive-lock,object-map,fast-diff || echo $img; done If that's all you did, and not also rbd object-map rebuild $img Then it won't have a proper object map, which is probably more like being disabled rather than more bugged. And in my testing, this randomly doesn't work, often saying the object map is invalid after some point (in rbd info output), unless you do something... maybe disconnect all clients, then rebuild it, then clients can connect again, or just repeat a few times until it stays. I haven't tested that well. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] corrupted rbd filesystems since jewel
Hi Peter, Am 08.05.2017 um 15:23 schrieb Peter Maloney: > On 05/08/17 14:50, Stefan Priebe - Profihost AG wrote: >> Hi, >> Am 08.05.2017 um 14:40 schrieb Jason Dillaman: >>> You are saying that you had v2 RBD images created against Hammer OSDs >>> and client libraries where exclusive lock, object map, etc were never >>> enabled. You then upgraded the OSDs and clients to Jewel and at some >>> point enabled exclusive lock (and I'd assume object map) on these >>> images >> Yes i did: >> for img in $(rbd -p cephstor5 ls -l | grep -v "@" | awk '{ print $1 }'); >> do rbd -p cephstor5 feature enable $img >> exclusive-lock,object-map,fast-diff || echo $img; done > If that's all you did, and not also > rbd object-map rebuild $img sorry we did a rbd object-map rebuild as well in a 2nd run. I just missed this command. Sorry. > And in my testing, this randomly doesn't work, often saying the object > map is invalid after some point (in rbd info output), unless you do > something... maybe disconnect all clients, then rebuild it, then clients > can connect again, or just repeat a few times until it stays. I haven't > tested that well. no i've not disconnected the clients. I've seen no information that the clients need to do a reconnect. I also have no idea how to force a reconnect while running qemu. Greets, Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] EXT: Re: Intel power tuning - 30% throughput performance increase
We also noticed a tremendous gain in latency performance by setting cstates to processor.max_cstate=1 intel_idle.max_cstate=0. We went from being over 1ms latency for 4KB writes to well under (.7ms? going off mem). I will note that we did not have as much of a problem on Intel v3 procs, but on v4 procs, our low QD, single threaded write perf dropped tremendously. I don’t recall now, but it was much worse than just a 30% loss in perf compared to a v3 proc that had default C states set. We only saw a small bump in power usage as well. Bumping the CPU frequency up also offered a small performance change as well. Warren Wang Walmart ✻ On 5/3/17, 3:43 AM, "ceph-users on behalf of Dan van der Ster" wrote: Hi Blair, We use cpu_dma_latency=1, because it was in the latency-performance profile. And indeed by setting cpu_dma_latency=0 on one of our OSD servers, powertop now shows the package as 100% in turbo mode. So I suppose we'll pay for this performance boost in energy. But more importantly, can the CPU survive being in turbo 100% of the time? -- Dan On Wed, May 3, 2017 at 9:13 AM, Blair Bethwaite wrote: > Hi all, > > We recently noticed that despite having BIOS power profiles set to > performance on our RHEL7 Dell R720 Ceph OSD nodes, that CPU frequencies > never seemed to be getting into the top of the range, and in fact spent a > lot of time in low C-states despite that BIOS option supposedly disabling > C-states. > > After some investigation this C-state issue seems to be relatively common, > apparently the BIOS setting is more of a config option that the OS can > choose to ignore. You can check this by examining > /sys/module/intel_idle/parameters/max_cstate - if this is >1 and you *think* > C-states are disabled then your system is messing with you. > > Because the contemporary Intel power management driver > (https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt) now > limits the proliferation of OS level CPU power profiles/governors, the only > way to force top frequencies is to either set kernel boot command line > options or use the /dev/cpu_dma_latency, aka pmqos, interface. > > We did the latter using the pmqos_static.py, which was previously part of > the RHEL6 tuned latency-performance profile, but seems to have been dropped > in RHEL7 (don't yet know why), and in any case the default tuned profile is > throughput-performance (which does not change cpu_dma_latency). You can find > the pmqos-static.py script here > https://github.com/NetSys/NetBricks/blob/master/scripts/tuning/pmqos-static.py. > > After setting `./pmqos-static.py cpu_dma_latency=0` across our OSD nodes we > saw a conservative 30% increase in backfill and recovery throughput - now > when our main RBD pool of 900+ OSDs is backfilling we expect to see ~22GB/s, > previously that was ~15GB/s. > > We have just got around to opening a case with Red Hat regarding this as at > minimum Ceph should probably be actively using the pmqos interface and tuned > should be setting this with recommendations for the latency-performance > profile in the RHCS install guide. We have done no characterisation of it on > Ubuntu yet, however anecdotally it looks like it has similar issues on the > same hardware. > > Merry xmas. > > Cheers, > Blair > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reg: Ceph-deploy install - failing
Hello, I don't use Cent, but I've seen the same thing with Ubuntu, I'm going to assume it's the same problem. The repo url should be download.ceph.com, instead of just ceph.com, which is uses when it add's it to the repo. My solution usually is, correct the repo URL to point to download.ceph.com and in the ceph-deploy add the tag --no-adjust-repo. Cheers On Sun, May 7, 2017 at 3:51 PM, psuresh wrote: > Hi, > > When i run "ceph-deploy install lceph-mon2" from admin node i'm getting > following error. Any clue! > > [cide-lceph-mon2][DEBUG ] connected to host: cide-lceph-mon2 > [cide-lceph-mon2][DEBUG ] detect platform information from remote host > [cide-lceph-mon2][DEBUG ] detect machine type > [ceph_deploy.install][INFO ] Distro info: CentOS Linux 7.3.1611 Core > [cide-lceph-mon2][INFO ] installing ceph on cide-lceph-mon2 > [cide-lceph-mon2][INFO ] Running command: yum clean all > [cide-lceph-mon2][DEBUG ] Loaded plugins: fastestmirror, langpacks, > priorities > [cide-lceph-mon2][DEBUG ] Cleaning repos: Ceph Ceph-noarch base > ceph-source epel extras updates > [cide-lceph-mon2][DEBUG ] Cleaning up everything > [cide-lceph-mon2][INFO ] adding EPEL repository > [cide-lceph-mon2][INFO ] Running command: yum -y install epel-release > [cide-lceph-mon2][DEBUG ] Loaded plugins: fastestmirror, langpacks, > priorities > [cide-lceph-mon2][DEBUG ] Determining fastest mirrors > [cide-lceph-mon2][DEBUG ] * base: centos.excellmedia.net > [cide-lceph-mon2][DEBUG ] * epel: ftp.cuhk.edu.hk > [cide-lceph-mon2][DEBUG ] * extras: centos.excellmedia.net > [cide-lceph-mon2][DEBUG ] * updates: centos.excellmedia.net > [cide-lceph-mon2][DEBUG ] Package epel-release-7-9.noarch already > installed and latest version > [cide-lceph-mon2][DEBUG ] Nothing to do > [cide-lceph-mon2][INFO ] Running command: yum -y install yum-priorities > [cide-lceph-mon2][DEBUG ] Loaded plugins: fastestmirror, langpacks, > priorities > [cide-lceph-mon2][DEBUG ] Loading mirror speeds from cached hostfile > [cide-lceph-mon2][DEBUG ] * base: centos.excellmedia.net > [cide-lceph-mon2][DEBUG ] * epel: ftp.cuhk.edu.hk > [cide-lceph-mon2][DEBUG ] * extras: centos.excellmedia.net > [cide-lceph-mon2][DEBUG ] * updates: centos.excellmedia.net > [cide-lceph-mon2][DEBUG ] Package yum-plugin-priorities-1.1.31-40.el7.noarch > already installed and latest version > [cide-lceph-mon2][DEBUG ] Nothing to do > [cide-lceph-mon2][DEBUG ] Configure Yum priorities to include obsoletes > [cide-lceph-mon2][WARNIN] check_obsoletes has been enabled for Yum > priorities plugin > [cide-lceph-mon2][INFO ] Running command: rpm --import > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc > [cide-lceph-mon2][INFO ] Running command: rpm -Uvh --replacepkgs > http://ceph.com/rpm-hammer/el7/noarch/ceph-release-1-0.el7.noarch.rpm > [cide-lceph-mon2][WARNIN] error: open of failed: No such file or > directory > [cide-lceph-mon2][WARNIN] error: open of Index failed: No > such file or directory > [cide-lceph-mon2][WARNIN] error: open of of failed: No such file or > directory > [cide-lceph-mon2][WARNIN] error: open of /rpm-hammer/ > failed: No such file or directory > [cide-lceph-mon2][WARNIN] error: open of directory > [cide-lceph-mon2][WARNIN] error: open of bgcolor=white> failed: No such > file or directory > [cide-lceph-mon2][WARNIN] error: open of Index failed: No such file or > directory > [cide-lceph-mon2][WARNIN] error: open of of failed: No such file or > directory > [cide-lceph-mon2][WARNIN] error: open of /rpm-hammer/ failed: No such file or directory > [cide-lceph-mon2][WARNIN] error: open of href=../>../ failed: No such > file or directory > [cide-lceph-mon2][WARNIN] error: open of directory > [cide-lceph-mon2][WARNIN] error: open of href=el6/>el6/ failed: No > such file or directory > [cide-lceph-mon2][WARNIN] error: open of 24-Apr-2016 failed: No such file > or directory > [cide-lceph-mon2][WARNIN] error: open of 00:05 failed: No such file or > directory > [cide-lceph-mon2][WARNIN] error: -: not an rpm package (or package > manifest): > [cide-lceph-mon2][WARNIN] error: open of directory > [cide-lceph-mon2][WARNIN] error: open of href=el7/>el7/ failed: No > such file or directory > [cide-lceph-mon2][WARNIN] error: open of 29-Aug-2016 failed: No such file > or directory > [cide-lceph-mon2][WARNIN] error: open of 11:53 failed: No such file or > directory > [cide-lceph-mon2][WARNIN] error: -: not an rpm package (or package > manifest): > [cide-lceph-mon2][WARNIN] error: open of directory > [cide-lceph-mon2][WARNIN] error: open of href=fc20/>fc20/ failed: No > such file or directory > [cide-lceph-mon2][WARNIN] error: open of 07-Apr-2015 failed: No such file > or directory > [cide-lceph-mon2][WARNIN] error: open of 19:21 failed: No such file or > directory > [cide-lceph-mon2][WARNIN] error: -: not an rpm package (or package > manifest): > [cide-lceph-mon2][WARNIN] error: open of directory > [cide-lceph-mon2][WARNIN] error: open of href=rhel6/
[ceph-users] Read from Replica Osds?
Hi, I thought that Clients do also reads from ceph replicas. Sometimes i Read in the web that this does only happens from the primary pg like how ceph handle writes... so what is True? Greetz Mehmet___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Read from Replica Osds?
Reads will always happen on the Primary OSD for the PG. Writes are initially written to the primary OSD, but the write is not ack'd until the write is completed on ALL secondaries. I make that distinction because if you have size 3 and min_size 2, the write will not come back unless all 3 OSDs have written the data. The min_size is there for security so that writes do not happen in your cluster if you do not have at least min_size OSDs active in your PG. It is not a setting to dictate how many copies need to be written before a write is successful. On Mon, May 8, 2017 at 12:04 PM Mehmet wrote: > Hi, > > I thought that Clients do also reads from ceph replicas. Sometimes i Read > in the web that this does only happens from the primary pg like how ceph > handle writes... so what is True? > > Greetz > Mehmet___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Read from Replica Osds?
librbd can optionally read from replicas for snapshots and parent images (i.e. known read-only data). This is controlled via the following configuration options: rbd_balance_snap_reads rbd_localize_snap_reads rbd_balance_parent_reads rbd_localize_parent_reads Direct users of the librados API can also utilize the LIBRADOS_OPERATION_BALANCE_READS and LIBRADOS_OPERATION_LOCALIZE_READS flags to control this behavior. On Mon, May 8, 2017 at 12:04 PM, Mehmet wrote: > Hi, > > I thought that Clients do also reads from ceph replicas. Sometimes i Read in > the web that this does only happens from the primary pg like how ceph handle > writes... so what is True? > > Greetz > Mehmet > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Performance after adding a node
Our ceph system performs very poorly or not even at all while the remapping procedure is underway. We are using replica 2 and the following ceph tweaks while it is in process: 1013 ceph tell osd.* injectargs '--osd-recovery-max-active 20' 1014 ceph tell osd.* injectargs '--osd-recovery-threads 20' 1015 ceph tell osd.* injectargs '--osd-max-backfills 20' 1016 ceph -w 1017 ceph osd set noscrub 1018 ceph osd set nodeep-scrub After the remapping finishes, we set these back to default. Are any of these causing our problems or is there another way to limit the impact of the remapping so that users do not think the system is down while we add more storage? thanks, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Performance after adding a node
WOW!!! Those are some awfully high backfilling settings you have there. They are 100% the reason that your customers think your system is down. You're telling each OSD to be able to have 20 backfill operations running at the exact same time. I bet if you were watching iostat -x 1 on one of your nodes before you inject those settings and then after you inject those settings, the disk usage will go from a decent amount of 40-70% and jump all the way up to 100% as soon as those settings are injected. When you are backfilling, you are copying data from one drive to another. Each osd-max-backfill you set it to is another file it tries to copy at the same time. These can be receiving data (writing to the disk) or moving data off (reading from the disk followed by a delete). So by having 20 backfills happening at a time, you are telling each disk to allow 20 files to be written and/or read from it at the same time. What happens to a disk when you are copying 20 large files to it at a time? all of them move slower (a lot to do with disk thrashing having 20 threads all reading and writing to different parts of the disk). What you want to find is the point where your disks are usually around 80-90% utilized while backfilling, but not consistently 100%. The easy way to do that is to increase your osd-max-backfills by 1 or 2 at a time until you see it go too high, and then back off. I don't know many people that go above 5 max backfills in a production cluster on spinning disks. Usually the ones that do, do it temporarily while they know their cluster isn't being utilized by customers much. Personally I have used osd-recover-threads ands osd-recover-max-active, I've been able to tune my clusters only using osd-max-backfills. The lower you leave these the longer the backfill will take, but the less impact your customers will notice. I've found 3 to be a generally safe number if customer IO is your priority, 5 works well if your customers can be ok with it being slow (but still usable)... but all of this depends on your hardware and software use-cases. Test it while watching your disk utilizations and test your application while finding the right number for your environment. Good Luck :) On Mon, May 8, 2017 at 5:43 PM Daniel Davidson wrote: > Our ceph system performs very poorly or not even at all while the > remapping procedure is underway. We are using replica 2 and the > following ceph tweaks while it is in process: > > 1013 ceph tell osd.* injectargs '--osd-recovery-max-active 20' > 1014 ceph tell osd.* injectargs '--osd-recovery-threads 20' > 1015 ceph tell osd.* injectargs '--osd-max-backfills 20' > 1016 ceph -w > 1017 ceph osd set noscrub > 1018 ceph osd set nodeep-scrub > > After the remapping finishes, we set these back to default. > > Are any of these causing our problems or is there another way to limit > the impact of the remapping so that users do not think the system is > down while we add more storage? > > > thanks, > > Dan > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reg: Ceph-deploy install - failing
the latest stable version is jewel(lts) and kraken, http://docs.ceph.com/docs/master/releases/ If you want to install stable version use --stable=jewel flag with ceph-deploy install command and it will get the packages from download.ceph.com, It is well tested on latest CentOS and Ubuntu. On Mon, May 8, 2017 at 8:14 AM, Curt wrote: > Hello, > > I don't use Cent, but I've seen the same thing with Ubuntu, I'm going to > assume it's the same problem. The repo url should be download.ceph.com, > instead of just ceph.com, which is uses when it add's it to the repo. My > solution usually is, correct the repo URL to point to download.ceph.com > and in the ceph-deploy add the tag --no-adjust-repo. > > Cheers > > On Sun, May 7, 2017 at 3:51 PM, psuresh wrote: > >> Hi, >> >> When i run "ceph-deploy install lceph-mon2" from admin node i'm getting >> following error. Any clue! >> >> [cide-lceph-mon2][DEBUG ] connected to host: cide-lceph-mon2 >> [cide-lceph-mon2][DEBUG ] detect platform information from remote host >> [cide-lceph-mon2][DEBUG ] detect machine type >> [ceph_deploy.install][INFO ] Distro info: CentOS Linux 7.3.1611 Core >> [cide-lceph-mon2][INFO ] installing ceph on cide-lceph-mon2 >> [cide-lceph-mon2][INFO ] Running command: yum clean all >> [cide-lceph-mon2][DEBUG ] Loaded plugins: fastestmirror, langpacks, >> priorities >> [cide-lceph-mon2][DEBUG ] Cleaning repos: Ceph Ceph-noarch base >> ceph-source epel extras updates >> [cide-lceph-mon2][DEBUG ] Cleaning up everything >> [cide-lceph-mon2][INFO ] adding EPEL repository >> [cide-lceph-mon2][INFO ] Running command: yum -y install epel-release >> [cide-lceph-mon2][DEBUG ] Loaded plugins: fastestmirror, langpacks, >> priorities >> [cide-lceph-mon2][DEBUG ] Determining fastest mirrors >> [cide-lceph-mon2][DEBUG ] * base: centos.excellmedia.net >> [cide-lceph-mon2][DEBUG ] * epel: ftp.cuhk.edu.hk >> [cide-lceph-mon2][DEBUG ] * extras: centos.excellmedia.net >> [cide-lceph-mon2][DEBUG ] * updates: centos.excellmedia.net >> [cide-lceph-mon2][DEBUG ] Package epel-release-7-9.noarch already >> installed and latest version >> [cide-lceph-mon2][DEBUG ] Nothing to do >> [cide-lceph-mon2][INFO ] Running command: yum -y install yum-priorities >> [cide-lceph-mon2][DEBUG ] Loaded plugins: fastestmirror, langpacks, >> priorities >> [cide-lceph-mon2][DEBUG ] Loading mirror speeds from cached hostfile >> [cide-lceph-mon2][DEBUG ] * base: centos.excellmedia.net >> [cide-lceph-mon2][DEBUG ] * epel: ftp.cuhk.edu.hk >> [cide-lceph-mon2][DEBUG ] * extras: centos.excellmedia.net >> [cide-lceph-mon2][DEBUG ] * updates: centos.excellmedia.net >> [cide-lceph-mon2][DEBUG ] Package yum-plugin-priorities-1.1.31-40.el7.noarch >> already installed and latest version >> [cide-lceph-mon2][DEBUG ] Nothing to do >> [cide-lceph-mon2][DEBUG ] Configure Yum priorities to include obsoletes >> [cide-lceph-mon2][WARNIN] check_obsoletes has been enabled for Yum >> priorities plugin >> [cide-lceph-mon2][INFO ] Running command: rpm --import >> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc >> [cide-lceph-mon2][INFO ] Running command: rpm -Uvh --replacepkgs >> http://ceph.com/rpm-hammer/el7/noarch/ceph-release-1-0.el7.noarch.rpm >> [cide-lceph-mon2][WARNIN] error: open of failed: No such file or >> directory >> [cide-lceph-mon2][WARNIN] error: open of Index failed: No >> such file or directory >> [cide-lceph-mon2][WARNIN] error: open of of failed: No such file or >> directory >> [cide-lceph-mon2][WARNIN] error: open of /rpm-hammer/ >> failed: No such file or directory >> [cide-lceph-mon2][WARNIN] error: open of > directory >> [cide-lceph-mon2][WARNIN] error: open of bgcolor=white> failed: No such >> file or directory >> [cide-lceph-mon2][WARNIN] error: open of Index failed: No such file >> or directory >> [cide-lceph-mon2][WARNIN] error: open of of failed: No such file or >> directory >> [cide-lceph-mon2][WARNIN] error: open of /rpm-hammer/> failed: No such file or directory >> [cide-lceph-mon2][WARNIN] error: open of href=../>../ failed: No such >> file or directory >> [cide-lceph-mon2][WARNIN] error: open of > directory >> [cide-lceph-mon2][WARNIN] error: open of href=el6/>el6/ failed: No >> such file or directory >> [cide-lceph-mon2][WARNIN] error: open of 24-Apr-2016 failed: No such file >> or directory >> [cide-lceph-mon2][WARNIN] error: open of 00:05 failed: No such file or >> directory >> [cide-lceph-mon2][WARNIN] error: -: not an rpm package (or package >> manifest): >> [cide-lceph-mon2][WARNIN] error: open of > directory >> [cide-lceph-mon2][WARNIN] error: open of href=el7/>el7/ failed: No >> such file or directory >> [cide-lceph-mon2][WARNIN] error: open of 29-Aug-2016 failed: No such file >> or directory >> [cide-lceph-mon2][WARNIN] error: open of 11:53 failed: No such file or >> directory >> [cide-lceph-mon2][WARNIN] error: -: not an rpm package (or package >> manifest): >> [cide-lceph-mon2][WARNIN] error: open of > directory >> [cide-lceph-m
Re: [ceph-users] Read from Replica Osds?
We write many millions of keys into RGW which will never be changed (until they are deleted) -- it would be interesting if we could somehow indicate this to RGW and enable reading those from the replicas as well. -Ben On Mon, May 8, 2017 at 10:18 AM, Jason Dillaman wrote: > librbd can optionally read from replicas for snapshots and parent > images (i.e. known read-only data). This is controlled via the > following configuration options: > > rbd_balance_snap_reads > rbd_localize_snap_reads > rbd_balance_parent_reads > rbd_localize_parent_reads > > Direct users of the librados API can also utilize the > LIBRADOS_OPERATION_BALANCE_READS and LIBRADOS_OPERATION_LOCALIZE_READS > flags to control this behavior. > > On Mon, May 8, 2017 at 12:04 PM, Mehmet wrote: > > Hi, > > > > I thought that Clients do also reads from ceph replicas. Sometimes i > Read in > > the web that this does only happens from the primary pg like how ceph > handle > > writes... so what is True? > > > > Greetz > > Mehmet > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > -- > Jason > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph memory overhead when used with KVM
Hi, I was using a standalone rbd image. Cheers Nick On Monday, May 08, 2017 08:55:55 AM Jason Dillaman wrote: > Thanks. One more question: was the image a clone or a stand-alone image? > > On Fri, May 5, 2017 at 2:42 AM, nick wrote: > > Hi, > > I used one of the fio example files and changed it a bit: > > > > """ > > # This job file tries to mimic the Intel IOMeter File Server Access > > Pattern > > [global] > > description=Emulation of Intel IOmeter File Server Access Pattern > > randrepeat=0 > > filename=/root/test.dat > > # IOMeter defines the server loads as the following: > > # iodepth=1 Linear > > # iodepth=4 Very Light > > # iodepth=8 Light > > # iodepth=64Moderate > > # iodepth=256 Heavy > > iodepth=8 > > size=80g > > direct=0 > > ioengine=libaio > > > > [iometer] > > stonewall > > bs=4M > > rw=randrw > > > > [iometer_just_write] > > stonewall > > bs=4M > > rw=write > > > > [iometer_just_read] > > stonewall > > bs=4M > > rw=read > > """ > > > > Then let it run: > > $> while true; do fio stress.fio; rm /root/test.dat; done > > > > I had this running over a weekend. > > > > Cheers > > Sebastian > > > > On Tuesday, May 02, 2017 02:51:06 PM Jason Dillaman wrote: > >> Can you share the fio job file that you utilized so I can attempt to > >> repeat locally? > >> > >> On Tue, May 2, 2017 at 2:51 AM, nick wrote: > >> > Hi Jason, > >> > thanks for your feedback. I did now some tests over the weekend to > >> > verify > >> > the memory overhead. > >> > I was using qemu 2.8 (taken from the Ubuntu Cloud Archive) with librbd > >> > 10.2.7 on Ubuntu 16.04 hosts. I suspected the ceph rbd cache to be the > >> > cause of the overhead so I just generated a lot of IO with the help of > >> > fio in the VMs (with a datasize of 80GB) . All VMs had 3GB of memory. I > >> > had to run fio multiple times, before reaching high RSS values. > >> > I also noticed that when using larger blocksizes during writes (like > >> > 4M) > >> > the memory overhead in the KVM process increased faster. > >> > I ran several fio tests (one after another) and the results are: > >> > > >> > KVM with writeback RBD cache: max. 85% memory overhead (2.5 GB > >> > overhead) > >> > KVM with writethrough RBD cache: max. 50% memory overhead > >> > KVM without RBD caching: less than 10% overhead all the time > >> > KVM with local storage (logical volume used): 8% overhead all the time > >> > > >> > I did not reach those >200% memory overhead results that we see on our > >> > live > >> > cluster, but those virtual machines have a way longer uptime as well. > >> > > >> > I also tried to reduce the RSS memory value with cache dropping on the > >> > physical host and in the VM. Both did not lead to any change. A reboot > >> > of > >> > the VM also does not change anything (reboot in the VM, not a new KVM > >> > process). The only way to reduce the RSS memory value is a live > >> > migration > >> > so far. Might this be a bug? The memory overhead sounds a bit too much > >> > for me. > >> > > >> > Best Regards > >> > Sebastian > >> > > >> > On Thursday, April 27, 2017 10:08:36 AM you wrote: > >> >> I know we noticed high memory usage due to librados in the Ceph > >> >> multipathd checker [1] -- the order of hundreds of megabytes. That > >> >> client was probably nearly as trivial as an application can get and I > >> >> just assumed it was due to large monitor maps being sent to the client > >> >> for whatever reason. Since we changed course on our RBD iSCSI > >> >> implementation, unfortunately the investigation into this high memory > >> >> usage fell by the wayside. > >> >> > >> >> [1] > >> >> http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=blob;f=libm > >> >> ult > >> >> ip > >> >> ath/checkers/rbd.c;h=9ea0572f2b5bd41b80bf2601137b74f92bdc7278;hb=HEAD > >> >> > >> >> On Thu, Apr 27, 2017 at 5:26 AM, nick wrote: > >> >> > Hi Christian, > >> >> > thanks for your answer. > >> >> > The highest value I can see for a local storage VM in our > >> >> > infrastructure > >> >> > is a memory overhead of 39%. This is big, but the majority (>90%) of > >> >> > our > >> >> > local storage VMs are using less than 10% memory overhead. > >> >> > For ceph storage based VMs this looks quite different. The highest > >> >> > value I > >> >> > can see currently is 244% memory overhead. So that specific > >> >> > allocated > >> >> > 3GB > >> >> > memory VM is using now 10.3 GB RSS memory on the physical host. This > >> >> > is > >> >> > a > >> >> > really huge value. In general I can see that the majority of the > >> >> > ceph > >> >> > based VMs has more than 60% memory overhead. > >> >> > > >> >> > Maybe this is also a bug related to qemu+librbd. It would be just > >> >> > nice > >> >> > to > >> >> > know if other people are seeing those high values as well. > >> >> > > >> >> > Cheers > >> >> > Sebastian > >> >> > > >> >> > On Thursday, April 27, 2017 06:10:48 PM you wrote: > >> >> >> Hello, > >> >> >> > >> >> >> Definitely s