I find graphs really help here. One screen that has all the disk I/O and latency for all OSDs makes it easy to pin point the bottleneck.
If you don't have that, I'd go low tech: Watch the blinky lights. It's really easy to see which disk is the hotspot. On Thu, Aug 14, 2014 at 6:56 AM, Mariusz Gronczewski <mariusz.gronczew...@efigence.com> wrote: > Actual OSD (/var/log/ceph/ceph-osd.$id) logs would be more useful. > > Few ideas: > > * do 'ceph health detail' to get detail of which OSD is stalling > * 'ceph osd perf' to see latency of each osd > * 'ceph --admin-daemon /var/run/ceph/ceph-osd.$id.asok dump_historic_ops' > shows "recent slow" ops > > I actually have very similiar problem, cluster goes full speed (sometimes > even for hours) and suddenly everything stops for a minute or 5, no disk IO, > no IO wait (so disks are fine), no IO errors in kernel log, and OSDs only > complain that other OSD subop is slow (but on that OSD everything looks fine > too) > > On Wed, 13 Aug 2014 16:04:30 -0400, German Anders > <gand...@despegar.com> wrote: > >> Also, even a "ls -ltr" could be done inside the /mnt of the RBD that >> it freeze the prompt. Any ideas? I've attach some syslogs from one of >> the OSD servers and also from the client. Both are running Ubuntu >> 14.04LTS with Kernel 3.15.8. >> The cluster is not usable at this point, since I can't run a "ls" on >> the rbd. >> >> Thanks in advance, >> >> Best regards, >> >> >> German Anders >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > --- Original message --- >> > Asunto: Re: [ceph-users] Performance really drops from 700MB/s to >> > 10MB/s >> > De: German Anders <gand...@despegar.com> >> > Para: Mark Nelson <mark.nel...@inktank.com> >> > Cc: <ceph-users@lists.ceph.com> >> > Fecha: Wednesday, 13/08/2014 11:09 >> > >> > >> > Actually is very strange, since if i run the fio test on the client, >> > and also un parallel run a iostat on all the OSD servers, i don't see >> > any workload going on over the disks, I mean... nothing! 0.00....and >> > also the fio script on the client is reacting very rare too: >> > >> > >> > $ sudo fio --filename=/dev/rbd1 --direct=1 --rw=write --bs=4m >> > --size=10G --iodepth=16 --ioengine=libaio --runtime=60 >> > --group_reporting --name=file99 >> > file99: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, >> > iodepth=16 >> > fio-2.1.3 >> > Starting 1 process >> > Jobs: 1 (f=1): [W] [2.1% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta >> > 01h:26m:43s] >> > >> > It's seems like is doing nothing.. >> > >> > >> > >> > German Anders >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >> --- Original message --- >> >> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to >> >> 10MB/s >> >> De: Mark Nelson <mark.nel...@inktank.com> >> >> Para: <ceph-users@lists.ceph.com> >> >> Fecha: Wednesday, 13/08/2014 11:00 >> >> >> >> On 08/13/2014 08:19 AM, German Anders wrote: >> >>> >> >>> Hi to all, >> >>> >> >>> I'm having a particular behavior on a new Ceph cluster. >> >>> I've map >> >>> a RBD to a client and issue some performance tests with fio, at this >> >>> point everything goes just fine (also the results :) ), but then I try >> >>> to run another new test on a new RBD on the same client, and suddenly >> >>> the performance goes below 10MB/s and it took almost 10 minutes to >> >>> complete a 10G file test, if I issue a *ceph -w* I don't see anything >> >>> suspicious, any idea what can be happening here? >> >> >> >> When things are going fast, are your disks actually writing data out >> >> as >> >> fast as your client IO would indicate? (don't forgot to count >> >> replication!) It may be that the great speed is just writing data >> >> into >> >> the tmpfs journals (if the test is only 10GB and spread across 36 >> >> OSDs, >> >> it could finish pretty quickly writing to tmpfs!). FWIW, tmpfs >> >> journals >> >> aren't very safe. It's not something you want to use outside of >> >> testing >> >> except in unusual circumstances. >> >> >> >> In your tests, when things are bad: it's generally worth checking to >> >> see >> >> if any one disk/osd is backed up relative to the others. There are a >> >> couple of ways to accomplish this. the Ceph admin socket can tell you >> >> information about each OSD ie how many outstanding IOs and a history >> >> of >> >> slow ops. You can also look at per-disk statistics with something >> >> like >> >> iostat or collectl. >> >> >> >> Hope this helps! >> >> >> >>> >> >>> >> >>> The cluster is made of: >> >>> >> >>> 3 x MON Servers >> >>> 4 x OSD Servers (3TB SAS 6G disks for OSD daemons & tmpfs for Journal >> >>> -> >> >>> there's one tmpfs of 36GB that is share by 9 OSD daemons, on each >> >>> server) >> >>> 2 x Network SW (Cluster and Public) >> >>> 10GbE speed on both networks >> >>> >> >>> The ceph.conf file is the following: >> >>> >> >>> [global] >> >>> fsid = 56e56e4c-ea59-4157-8b98-acae109bebe1 >> >>> mon_initial_members = cephmon01, cephmon02, cephmon03 >> >>> mon_host = 10.97.10.1,10.97.10.2,10.97.10.3 >> >>> auth_client_required = cephx >> >>> auth_cluster_required = cephx >> >>> auth_service_required = cephx >> >>> filestore_xattr_use_omap = true >> >>> public_network = 10.97.0.0/16 >> >>> cluster_network = 192.168.10.0/24 >> >>> osd_pool_default_size = 2 >> >>> glance_api_version = 2 >> >>> >> >>> [mon] >> >>> debug_optracker = 0 >> >>> >> >>> [mon.cephmon01] >> >>> host = cephmon01 >> >>> mon_addr = 10.97.10.1:6789 >> >>> >> >>> [mon.cephmon02] >> >>> host = cephmon02 >> >>> mon_addr = 10.97.10.2:6789 >> >>> >> >>> [mon.cephmon03] >> >>> host = cephmon03 >> >>> mon_addr = 10.97.10.3:6789 >> >>> >> >>> [osd] >> >>> journal_dio = false >> >>> osd_journal_size = 4096 >> >>> fstype = btrfs >> >>> debug_optracker = 0 >> >>> >> >>> [osd.0] >> >>> host = cephosd01 >> >>> devs = /dev/sdc1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.1] >> >>> host = cephosd01 >> >>> devs = /dev/sdd1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.2] >> >>> host = cephosd01 >> >>> devs = /dev/sdf1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.3] >> >>> host = cephosd01 >> >>> devs = /dev/sdg1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.4] >> >>> host = cephosd01 >> >>> devs = /dev/sdi1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.5] >> >>> host = cephosd01 >> >>> devs = /dev/sdj1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.6] >> >>> host = cephosd01 >> >>> devs = /dev/sdl1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.7] >> >>> host = cephosd01 >> >>> devs = /dev/sdm1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.8] >> >>> host = cephosd01 >> >>> devs = /dev/sdn1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.9] >> >>> host = cephosd02 >> >>> devs = /dev/sdc1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.10] >> >>> host = cephosd02 >> >>> devs = /dev/sdd1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.11] >> >>> host = cephosd02 >> >>> devs = /dev/sdf1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.12] >> >>> host = cephosd02 >> >>> devs = /dev/sdg1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.13] >> >>> host = cephosd02 >> >>> devs = /dev/sdi1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.14] >> >>> host = cephosd02 >> >>> devs = /dev/sdj1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.15] >> >>> host = cephosd02 >> >>> devs = /dev/sdl1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.16] >> >>> host = cephosd02 >> >>> devs = /dev/sdm1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.17] >> >>> host = cephosd02 >> >>> devs = /dev/sdn1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.18] >> >>> host = cephosd03 >> >>> devs = /dev/sdc1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.19] >> >>> host = cephosd03 >> >>> devs = /dev/sdd1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.20] >> >>> host = cephosd03 >> >>> devs = /dev/sdf1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.21] >> >>> host = cephosd03 >> >>> devs = /dev/sdg1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.22] >> >>> host = cephosd03 >> >>> devs = /dev/sdi1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.23] >> >>> host = cephosd03 >> >>> devs = /dev/sdj1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.24] >> >>> host = cephosd03 >> >>> devs = /dev/sdl1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.25] >> >>> host = cephosd03 >> >>> devs = /dev/sdm1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.26] >> >>> host = cephosd03 >> >>> devs = /dev/sdn1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.27] >> >>> host = cephosd04 >> >>> devs = /dev/sdc1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.28] >> >>> host = cephosd04 >> >>> devs = /dev/sdd1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.29] >> >>> host = cephosd04 >> >>> devs = /dev/sdf1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.30] >> >>> host = cephosd04 >> >>> devs = /dev/sdg1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.31] >> >>> host = cephosd04 >> >>> devs = /dev/sdi1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.32] >> >>> host = cephosd04 >> >>> devs = /dev/sdj1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.33] >> >>> host = cephosd04 >> >>> devs = /dev/sdl1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.34] >> >>> host = cephosd04 >> >>> devs = /dev/sdm1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [osd.35] >> >>> host = cephosd04 >> >>> devs = /dev/sdn1 >> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal >> >>> >> >>> [client.volumes] >> >>> keyring = /etc/ceph/ceph.client.volumes.keyring >> >>> >> >>> >> >>> Thanks in advance, >> >>> >> >>> Best regards, >> >>> >> >>> *German Anders >> >>> * >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> ceph-users mailing list >> >>> ceph-users@lists.ceph.com >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>> >> >> >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@lists.ceph.com >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > > > > -- > Mariusz Gronczewski, Administrator > > Efigence S. A. > ul. WoĊoska 9a, 02-583 Warszawa > T: [+48] 22 380 13 13 > F: [+48] 22 380 13 14 > E: mariusz.gronczew...@efigence.com > <mailto:mariusz.gronczew...@efigence.com> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com