Re: [ceph-users] Performance really drops from 700MB/s to 10MB/s

Craig Lewis Thu, 14 Aug 2014 11:43:11 -0700

I find graphs really help here.  One screen that has all the disk I/O
and latency for all OSDs makes it easy to pin point the bottleneck.


If you don't have that, I'd go low tech: Watch the blinky lights. It's
really easy to see which disk is the hotspot.



On Thu, Aug 14, 2014 at 6:56 AM, Mariusz Gronczewski
<mariusz.gronczew...@efigence.com> wrote:
> Actual OSD (/var/log/ceph/ceph-osd.$id) logs would be more useful.
>
> Few ideas:
>
> * do 'ceph health detail' to get detail of which OSD is stalling
> * 'ceph osd perf' to see latency of each osd
> * 'ceph --admin-daemon /var/run/ceph/ceph-osd.$id.asok dump_historic_ops' 
> shows "recent slow" ops
>
> I actually have very similiar problem, cluster goes full speed (sometimes 
> even for hours) and suddenly everything stops for a minute or 5, no disk IO, 
> no IO wait (so disks are fine), no IO errors in kernel log, and OSDs only 
> complain that other OSD subop is slow (but on that OSD everything looks fine 
> too)
>
> On Wed, 13 Aug 2014 16:04:30 -0400, German Anders
> <gand...@despegar.com> wrote:
>
>> Also, even a "ls -ltr" could be done inside the /mnt of the RBD that
>> it freeze the prompt. Any ideas? I've attach some syslogs from one of
>> the OSD servers and also from the client. Both are running Ubuntu
>> 14.04LTS with Kernel  3.15.8.
>> The cluster is not usable at this point, since I can't run a "ls" on
>> the rbd.
>>
>> Thanks in advance,
>>
>> Best regards,
>>
>>
>> German Anders
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> > --- Original message ---
>> > Asunto: Re: [ceph-users] Performance really drops from 700MB/s to
>> > 10MB/s
>> > De: German Anders <gand...@despegar.com>
>> > Para: Mark Nelson <mark.nel...@inktank.com>
>> > Cc: <ceph-users@lists.ceph.com>
>> > Fecha: Wednesday, 13/08/2014 11:09
>> >
>> >
>> > Actually is very strange, since if i run the fio test on the client,
>> > and also un parallel run a iostat on all the OSD servers, i don't see
>> > any workload going on over the disks, I mean... nothing! 0.00....and
>> > also the fio script on the client is reacting very rare too:
>> >
>> >
>> > $ sudo fio --filename=/dev/rbd1 --direct=1 --rw=write --bs=4m
>> > --size=10G --iodepth=16 --ioengine=libaio --runtime=60
>> > --group_reporting --name=file99
>> > file99: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio,
>> > iodepth=16
>> > fio-2.1.3
>> > Starting 1 process
>> > Jobs: 1 (f=1): [W] [2.1% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>> > 01h:26m:43s]
>> >
>> > It's seems like is doing nothing..
>> >
>> >
>> >
>> > German Anders
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >> --- Original message ---
>> >> Asunto: Re: [ceph-users] Performance really drops from 700MB/s to
>> >> 10MB/s
>> >> De: Mark Nelson <mark.nel...@inktank.com>
>> >> Para: <ceph-users@lists.ceph.com>
>> >> Fecha: Wednesday, 13/08/2014 11:00
>> >>
>> >> On 08/13/2014 08:19 AM, German Anders wrote:
>> >>>
>> >>> Hi to all,
>> >>>
>> >>>                I'm having a particular behavior on a new Ceph cluster.
>> >>> I've map
>> >>> a RBD to a client and issue some performance tests with fio, at this
>> >>> point everything goes just fine (also the results :) ), but then I try
>> >>> to run another new test on a new RBD on the same client, and suddenly
>> >>> the performance goes below 10MB/s and it took almost 10 minutes to
>> >>> complete a 10G file test, if I issue a *ceph -w* I don't see anything
>> >>> suspicious, any idea what can be happening here?
>> >>
>> >> When things are going fast, are your disks actually writing data out
>> >> as
>> >> fast as your client IO would indicate? (don't forgot to count
>> >> replication!)  It may be that the great speed is just writing data
>> >> into
>> >> the tmpfs journals (if the test is only 10GB and spread across 36
>> >> OSDs,
>> >> it could finish pretty quickly writing to tmpfs!).  FWIW, tmpfs
>> >> journals
>> >> aren't very safe.  It's not something you want to use outside of
>> >> testing
>> >> except in unusual circumstances.
>> >>
>> >> In your tests, when things are bad: it's generally worth checking to
>> >> see
>> >> if any one disk/osd is backed up relative to the others.  There are a
>> >> couple of ways to accomplish this.  the Ceph admin socket can tell you
>> >> information about each OSD ie how many outstanding IOs and a history
>> >> of
>> >> slow ops.  You can also look at per-disk statistics with something
>> >> like
>> >> iostat or collectl.
>> >>
>> >> Hope this helps!
>> >>
>> >>>
>> >>>
>> >>>                The cluster is made of:
>> >>>
>> >>> 3 x MON Servers
>> >>> 4 x OSD Servers (3TB SAS 6G disks for OSD daemons & tmpfs for Journal
>> >>> ->
>> >>> there's one tmpfs of 36GB that is share by 9 OSD daemons, on each
>> >>> server)
>> >>> 2 x Network SW (Cluster and Public)
>> >>> 10GbE speed on both networks
>> >>>
>> >>>                The ceph.conf file is the following:
>> >>>
>> >>> [global]
>> >>> fsid = 56e56e4c-ea59-4157-8b98-acae109bebe1
>> >>> mon_initial_members = cephmon01, cephmon02, cephmon03
>> >>> mon_host = 10.97.10.1,10.97.10.2,10.97.10.3
>> >>> auth_client_required = cephx
>> >>> auth_cluster_required = cephx
>> >>> auth_service_required = cephx
>> >>> filestore_xattr_use_omap = true
>> >>> public_network = 10.97.0.0/16
>> >>> cluster_network = 192.168.10.0/24
>> >>> osd_pool_default_size = 2
>> >>> glance_api_version = 2
>> >>>
>> >>> [mon]
>> >>> debug_optracker = 0
>> >>>
>> >>> [mon.cephmon01]
>> >>> host = cephmon01
>> >>> mon_addr = 10.97.10.1:6789
>> >>>
>> >>> [mon.cephmon02]
>> >>> host = cephmon02
>> >>> mon_addr = 10.97.10.2:6789
>> >>>
>> >>> [mon.cephmon03]
>> >>> host = cephmon03
>> >>> mon_addr = 10.97.10.3:6789
>> >>>
>> >>> [osd]
>> >>> journal_dio = false
>> >>> osd_journal_size = 4096
>> >>> fstype = btrfs
>> >>> debug_optracker = 0
>> >>>
>> >>> [osd.0]
>> >>> host = cephosd01
>> >>> devs = /dev/sdc1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.1]
>> >>> host = cephosd01
>> >>> devs = /dev/sdd1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.2]
>> >>> host = cephosd01
>> >>> devs = /dev/sdf1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.3]
>> >>> host = cephosd01
>> >>> devs = /dev/sdg1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.4]
>> >>> host = cephosd01
>> >>> devs = /dev/sdi1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.5]
>> >>> host = cephosd01
>> >>> devs = /dev/sdj1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.6]
>> >>> host = cephosd01
>> >>> devs = /dev/sdl1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.7]
>> >>> host = cephosd01
>> >>> devs = /dev/sdm1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.8]
>> >>> host = cephosd01
>> >>> devs = /dev/sdn1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.9]
>> >>> host = cephosd02
>> >>> devs = /dev/sdc1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.10]
>> >>> host = cephosd02
>> >>> devs = /dev/sdd1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.11]
>> >>> host = cephosd02
>> >>> devs = /dev/sdf1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.12]
>> >>> host = cephosd02
>> >>> devs = /dev/sdg1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.13]
>> >>> host = cephosd02
>> >>> devs = /dev/sdi1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.14]
>> >>> host = cephosd02
>> >>> devs = /dev/sdj1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.15]
>> >>> host = cephosd02
>> >>> devs = /dev/sdl1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.16]
>> >>> host = cephosd02
>> >>> devs = /dev/sdm1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.17]
>> >>> host = cephosd02
>> >>> devs = /dev/sdn1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.18]
>> >>> host = cephosd03
>> >>> devs = /dev/sdc1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.19]
>> >>> host = cephosd03
>> >>> devs = /dev/sdd1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.20]
>> >>> host = cephosd03
>> >>> devs = /dev/sdf1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.21]
>> >>> host = cephosd03
>> >>> devs = /dev/sdg1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.22]
>> >>> host = cephosd03
>> >>> devs = /dev/sdi1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.23]
>> >>> host = cephosd03
>> >>> devs = /dev/sdj1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.24]
>> >>> host = cephosd03
>> >>> devs = /dev/sdl1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.25]
>> >>> host = cephosd03
>> >>> devs = /dev/sdm1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.26]
>> >>> host = cephosd03
>> >>> devs = /dev/sdn1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.27]
>> >>> host = cephosd04
>> >>> devs = /dev/sdc1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.28]
>> >>> host = cephosd04
>> >>> devs = /dev/sdd1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.29]
>> >>> host = cephosd04
>> >>> devs = /dev/sdf1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.30]
>> >>> host = cephosd04
>> >>> devs = /dev/sdg1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.31]
>> >>> host = cephosd04
>> >>> devs = /dev/sdi1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.32]
>> >>> host = cephosd04
>> >>> devs = /dev/sdj1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.33]
>> >>> host = cephosd04
>> >>> devs = /dev/sdl1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.34]
>> >>> host = cephosd04
>> >>> devs = /dev/sdm1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [osd.35]
>> >>> host = cephosd04
>> >>> devs = /dev/sdn1
>> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
>> >>>
>> >>> [client.volumes]
>> >>> keyring = /etc/ceph/ceph.client.volumes.keyring
>> >>>
>> >>>
>> >>> Thanks in advance,
>> >>>
>> >>> Best regards,
>> >>>
>> >>> *German Anders
>> >>> *
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> ceph-users@lists.ceph.com
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>
>> >>
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>
>
>
> --
> Mariusz Gronczewski, Administrator
>
> Efigence S. A.
> ul. Wołoska 9a, 02-583 Warszawa
> T: [+48] 22 380 13 13
> F: [+48] 22 380 13 14
> E: mariusz.gronczew...@efigence.com
> <mailto:mariusz.gronczew...@efigence.com>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Performance really drops from 700MB/s to 10MB/s

Reply via email to