Re: [ceph-users] Slow requests from bluestore osds

Brett Chancellor Wed, 05 Sep 2018 13:32:01 -0700

Mine is currently at 1000 due to the high number of pgs we had coming from
Jewel. I do find it odd that only the bluestore OSDs have this issue.
Filestore OSDs seem to be unaffected.


On Wed, Sep 5, 2018, 3:43 PM Samuel Taylor Liston <sam.lis...@utah.edu>
wrote:

> Just a thought - have you looked at increasing your "—mon_max_pg_per_osd”
> both on the mons and osds?  I was having a similar issue while trying to
> add more OSDs to my cluster (12.2.27, CentOS7.5,
> 3.10.0-862.9.1.el7.x86_64).   I increased mine to 300 temporarily while
> adding OSDs and stopped having blocked requests.
> --
> Sam Liston (sam.lis...@utah.edu)
> ========================================
> Center for High Performance Computing
> 155 S. 1452 E. Rm 405
> Salt Lake City, Utah 84112 (801)232-6932
> ========================================
>
>
>
>
> On Sep 5, 2018, at 12:46 PM, Daniel Pryor <dpr...@parchment.com> wrote:
>
> I've experienced the same thing during scrubbing and/or any kind of
> expansion activity.
>
> *Daniel Pryor*
>
> On Mon, Sep 3, 2018 at 2:13 AM Marc Schöchlin <m...@256bit.org> wrote:
>
>> Hi,
>>
>> we are also experiencing this type of behavior for some weeks on our not
>> so performance critical hdd pools.
>> We haven't spent so much time on this problem, because there are
>> currently more important tasks - but here are a few details:
>>
>> Running the following loop results in the following output:
>>
>> while true; do ceph health|grep -q HEALTH_OK || (date;  ceph health
>> detail); sleep 2; done
>>
>> Sun Sep  2 20:59:47 CEST 2018
>> HEALTH_WARN 4 slow requests are blocked > 32 sec
>> REQUEST_SLOW 4 slow requests are blocked > 32 sec
>>     4 ops are blocked > 32.768 sec
>>     osd.43 has blocked requests > 32.768 sec
>> Sun Sep  2 20:59:50 CEST 2018
>> HEALTH_WARN 4 slow requests are blocked > 32 sec
>> REQUEST_SLOW 4 slow requests are blocked > 32 sec
>>     4 ops are blocked > 32.768 sec
>>     osd.43 has blocked requests > 32.768 sec
>> Sun Sep  2 20:59:52 CEST 2018
>> HEALTH_OK
>> Sun Sep  2 21:00:28 CEST 2018
>> HEALTH_WARN 1 slow requests are blocked > 32 sec
>> REQUEST_SLOW 1 slow requests are blocked > 32 sec
>>     1 ops are blocked > 32.768 sec
>>     osd.41 has blocked requests > 32.768 sec
>> Sun Sep  2 21:00:31 CEST 2018
>> HEALTH_WARN 7 slow requests are blocked > 32 sec
>> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>>     7 ops are blocked > 32.768 sec
>>     osds 35,41 have blocked requests > 32.768 sec
>> Sun Sep  2 21:00:33 CEST 2018
>> HEALTH_WARN 7 slow requests are blocked > 32 sec
>> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>>     7 ops are blocked > 32.768 sec
>>     osds 35,51 have blocked requests > 32.768 sec
>> Sun Sep  2 21:00:35 CEST 2018
>> HEALTH_WARN 7 slow requests are blocked > 32 sec
>> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>>     7 ops are blocked > 32.768 sec
>>     osds 35,51 have blocked requests > 32.768 sec
>>
>> Our details:
>>
>>   * system details:
>>     * Ubuntu 16.04
>>      * Kernel 4.13.0-39
>>      * 30 * 8 TB Disk (SEAGATE/ST8000NM0075)
>>      * 3* Dell Power Edge R730xd (Firmware 2.50.50.50)
>>        * Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
>>        * 2*10GBITS SFP+ Network Adapters
>>        * 192GB RAM
>>      * Pools are using replication factor 3, 2MB object size,
>>        85% write load, 1700 write IOPS/sec
>>        (ops mainly between 4k and 16k size), 300 read IOPS/sec
>>   * we have the impression that this appears on deepscrub/scrub activity.
>>   * Ceph 12.2.5, we alread played with the osd settings OSD Settings
>>     (our assumtion was that the problem is related to rocksdb compaction)
>>     bluestore cache kv max = 2147483648
>>     bluestore cache kv ratio = 0.9
>>     bluestore cache meta ratio = 0.1
>>     bluestore cache size hdd = 10737418240
>>   * this type problem only appears on hdd/bluestore osds, ssd/bluestore
>>     osds did never experienced that problem
>>   * the system is healthy, no swapping, no high load, no errors in dmesg
>>
>> I attached a log excerpt of osd.35 - probably this is useful for
>> investigating the problem is someone owns deeper bluestore knowledge.
>> (slow requests appeared on Sun Sep  2 21:00:35)
>>
>> Regards
>> Marc
>>
>>
>> Am 02.09.2018 um 15:50 schrieb Brett Chancellor:
>> > The warnings look like this.
>> >
>> > 6 ops are blocked > 32.768 sec on osd.219
>> > 1 osds have slow requests
>> >
>> > On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza <ad...@redhat.com
>> > <mailto:ad...@redhat.com>> wrote:
>> >
>> >     On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
>> >     <bchancel...@salesforce.com <mailto:bchancel...@salesforce.com>>
>> >     wrote:
>> >     > Hi Cephers,
>> >     >   I am in the process of upgrading a cluster from Filestore to
>> >     bluestore,
>> >     > but I'm concerned about frequent warnings popping up against the
>> new
>> >     > bluestore devices. I'm frequently seeing messages like this,
>> >     although the
>> >     > specific osd changes, it's always one of the few hosts I've
>> >     converted to
>> >     > bluestore.
>> >     >
>> >     > 6 ops are blocked > 32.768 sec on osd.219
>> >     > 1 osds have slow requests
>> >     >
>> >     > I'm running 12.2.4, have any of you seen similar issues? It
>> >     seems as though
>> >     > these messages pop up more frequently when one of the bluestore
>> >     pgs is
>> >     > involved in a scrub.  I'll include my bluestore creation process
>> >     below, in
>> >     > case that might cause an issue. (sdb, sdc, sdd are SATA, sde and
>> >     sdf are
>> >     > SSD)
>> >
>> >     Would be useful to include what those warnings say. The ceph-volume
>> >     commands look OK to me
>> >
>> >     >
>> >     >
>> >     > ## Process used to create osds
>> >     > sudo ceph-disk zap /dev/sdb /dev/sdc /dev/sdd /dev/sdd /dev/sde
>> >     /dev/sdf
>> >     > sudo ceph-volume lvm zap /dev/sdb
>> >     > sudo ceph-volume lvm zap /dev/sdc
>> >     > sudo ceph-volume lvm zap /dev/sdd
>> >     > sudo ceph-volume lvm zap /dev/sde
>> >     > sudo ceph-volume lvm zap /dev/sdf
>> >     > sudo sgdisk -n 0:2048:+133GiB -t 0:FFFF -c 1:"ceph block.db sdb"
>> >     /dev/sdf
>> >     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 2:"ceph block.db sdc"
>> >     /dev/sdf
>> >     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 3:"ceph block.db sdd"
>> >     /dev/sdf
>> >     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 4:"ceph block.db sde"
>> >     /dev/sdf
>> >     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>> >     --data
>> >     > /dev/sdb --block.db /dev/sdf1
>> >     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>> >     --data
>> >     > /dev/sdc --block.db /dev/sdf2
>> >     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>> >     --data
>> >     > /dev/sdd --block.db /dev/sdf3
>> >     >
>> >     >
>> >     > _______________________________________________
>> >     > ceph-users mailing list
>> >     > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> >     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >     >
>> >
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow requests from bluestore osds

Reply via email to