[ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-22 Thread mart.v
Hello everyone, I'm experiencing a strange behaviour. My cluster is relatively small (43 OSDs, 11 nodes), running Ceph 12.2.10 (and Proxmox 5). Nodes are connected via 10 Gbit network (Nexus 6000). Cluster is mixed (SSD and HDD), but with different pools. Descibed error is only on the SSD par

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
Yep...these are setting already in place. And also followed all recommendations to get performance, but still impacts with osd down..even we have 2000+ osd. And using 3 pools with diff. HW nodes for each pool. One pool's OSD down, also impacts other pools performance... which not expected with Ceph

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
No seen the CPU limitation because we are using the 4 cores per osd daemon. But still using "ms_crc_data = true and ms_crc_header = true". Will disable these and try the performance. And using the filestore + leveldB only. filestore_op_threads = 2. Rest of recovery and backfill settings done mini

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
Debug setting defaults are using..like 1/5 and 0/5 for almost.. Shall I try with 0 for all debug settings? On Wed, Feb 20, 2019 at 9:17 PM Darius Kasparavičius wrote: > > Hello, > > > Check your CPU usage when you are doing those kind of operations. We > had a similar issue where our CPU monitori

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread David Turner
What about the system stats on your mons during recovery? If they are having a hard time keeping up with requests during a recovery, I could see that impacting client io. What disks are they running on? CPU? Etc. On Fri, Feb 22, 2019, 6:01 AM M Ranga Swami Reddy wrote: > Debug setting defaults a

Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-22 Thread David Turner
Can you correlate the times to scheduled tasks inside of any VMs? For instance if you have several Linux VMs with the updatedb command installed that by default they will all be scanning their disks at the same time each day to see where files are. Other common culprits could be scheduled backups,

Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow storage for db - why?

2019-02-22 Thread Nick Fisk
>On 2/16/19 12:33 AM, David Turner wrote: >> The answer is probably going to be in how big your DB partition is vs >> how big your HDD disk is. From your output it looks like you have a >> 6TB HDD with a 28GB Blocks.DB partition. Even though the DB used >> size isn't currently full, I would gu

Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-22 Thread Nick Fisk
>Yes and no... bluestore seems to not work really optimal. For example, >it has no filestore-like journal waterlining and flushes the deferred >write queue just every 32 writes (deferred_batch_ops). And when it does >that it's basically waiting for the HDD to commit and slowing down all >further wr

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
ceph mons looks fine during the recovery. Using HDD with SSD journals. with recommeded CPU and RAM numbers. On Fri, Feb 22, 2019 at 4:40 PM David Turner wrote: > > What about the system stats on your mons during recovery? If they are having > a hard time keeping up with requests during a recov

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread David Turner
Mon disks don't have journals, they're just a folder on a filesystem on a disk. On Fri, Feb 22, 2019, 6:40 AM M Ranga Swami Reddy wrote: > ceph mons looks fine during the recovery. Using HDD with SSD > journals. with recommeded CPU and RAM numbers. > > On Fri, Feb 22, 2019 at 4:40 PM David Tur

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
ceph-mon disk with 500G with HDD (not journals/SSDs). Yes, mon use folder on FS on a disk On Fri, Feb 22, 2019 at 5:13 PM David Turner wrote: > > Mon disks don't have journals, they're just a folder on a filesystem on a > disk. > > On Fri, Feb 22, 2019, 6:40 AM M Ranga Swami Reddy > wrote: >>

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread Darius Kasparavičius
If your using hdd for monitor servers. Check their load. It might be the issue there. On Fri, Feb 22, 2019 at 1:50 PM M Ranga Swami Reddy wrote: > > ceph-mon disk with 500G with HDD (not journals/SSDs). Yes, mon use > folder on FS on a disk > > On Fri, Feb 22, 2019 at 5:13 PM David Turner wrote

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread Janne Johansson
Den fre 22 feb. 2019 kl 12:35 skrev M Ranga Swami Reddy < swamire...@gmail.com>: > No seen the CPU limitation because we are using the 4 cores per osd daemon. > But still using "ms_crc_data = true and ms_crc_header = true". Will > disable these and try the performance. > I am a bit sceptical to c

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
But ceph recommendation is to use VM (not even the HW node recommended). will try to change the mon disk as SSD and HW node. On Fri, Feb 22, 2019 at 5:25 PM Darius Kasparavičius wrote: > > If your using hdd for monitor servers. Check their load. It might be > the issue there. > > On Fri, Feb 22,

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
Opps...is this really impact...will righ-away change this and test it. On Fri, Feb 22, 2019 at 5:29 PM Janne Johansson wrote: > > Den fre 22 feb. 2019 kl 12:35 skrev M Ranga Swami Reddy > : >> >> No seen the CPU limitation because we are using the 4 cores per osd daemon. >> But still using "ms_c

Re: [ceph-users] RBD image format v1 EOL ...

2019-02-22 Thread koukou73gr
On 2019-02-20 17:38, Mykola Golub wrote: Note, if even rbd supported live (without any downtime) migration you would still need to restart the client after the upgrate to a new librbd with migration support. You could probably get away with executing the client with a new librbd version by li

Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow storage for db - why?

2019-02-22 Thread Serkan Çoban
>>These sizes are roughly 3GB,30GB,300GB. Anything in-between those sizes are >>pointless. Only ~3GB of SSD will ever be used out of a 28GB partition. Likewise a 240GB partition is also pointless as only ~30GB will be used. Where did you get those numbers? I would like to read more if you can poi

[ceph-users] thread bstore_kv_sync - high disk utilization

2019-02-22 Thread Benjamin Zapiec
Hello, I couldn't find anything satisfying that could clearly describe what this thread does. And if the average IO wait for the block device (ca. 60%) is normal on a SSD device. Even though when there is no/not much client workload. Output from iotop: --- 9890 be/4 ceph0.00 B/s 817.1

[ceph-users] Experiences with the Samsung SM/PM883 disk?

2019-02-22 Thread Paul Emmerich
Hi, it looks like the beloved Samsung SM/PM863a is no longer available and the replacement is the new SM/PM883. We got an 960GB PM883 (MZ7LH960HAJR-5) here and I ran the usual fio benchmark... and got horrible results :( fio --filename=/dev/sdX --direct=1 --sync=1 --rw=write --bs=4k --numjob

Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-22 Thread Massimo Sgaravatto
A couple of hints to debug the issue (since I had to recently debug a problem with the same symptoms): - As far as I understand the reported 'implicated osds' are only the primary ones. In the log of the osds you should find also the relevant pg number, and with this information you can get all th

Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-22 Thread Paul Emmerich
Bad SSDs can also cause this. Which SSD are you using? Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, Feb 22, 2019 at 2:53 PM Massimo Sgaravatto wrote: > > A

Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow storage for db - why?

2019-02-22 Thread Serkan Çoban
>Where did you get those numbers? I would like to read more if you can point to a link. Just found the link: https://github.com/facebook/rocksdb/wiki/Leveled-Compaction On Fri, Feb 22, 2019 at 4:22 PM Serkan Çoban wrote: > > >>These sizes are roughly 3GB,30GB,300GB. Anything in-between those siz

Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow storage for db - why?

2019-02-22 Thread Konstantin Shalygin
Bluestore/RocksDB will only put the next level up size of DB on flash if the whole size will fit. These sizes are roughly 3GB,30GB,300GB. Anything in-between those sizes are pointless. Only ~3GB of SSD will ever be used out of a 28GB partition. Likewise a 240GB partition is also pointless as onl

Re: [ceph-users] OSD after OS reinstallation.

2019-02-22 Thread Marco Gaiarin
Mandi! Alfredo Deza In chel di` si favelave... > The problem is that if there is no PARTUUID ceph-volume can't ensure > what device is the one actually pointing to data/journal. Being 'GPT' > alone will not be enough here :( Ok. There's some way to 'force' a PARTUUID, in a GPT or non-GPT partit

Re: [ceph-users] RBD image format v1 EOL ...

2019-02-22 Thread Mykola Golub
On Fri, Feb 22, 2019 at 02:43:36PM +0200, koukou73gr wrote: > On 2019-02-20 17:38, Mykola Golub wrote: > > > Note, if even rbd supported live (without any downtime) migration you > > would still need to restart the client after the upgrate to a new > > librbd with migration support. > > > > > Yo

Re: [ceph-users] Prevent rebalancing in the same host?

2019-02-22 Thread Marco Gaiarin
Mandi! Christian Balzer In chel di` si favelave... > You pretty much answered your question, as in a limit of "osd' would do > the trick, though not just for intra-host. Oh, documentation does not list the possible values... good to know. > But of course everybody will (rightly) tell you that

Re: [ceph-users] OSD after OS reinstallation.

2019-02-22 Thread Alfredo Deza
On Fri, Feb 22, 2019 at 9:38 AM Marco Gaiarin wrote: > > Mandi! Alfredo Deza > In chel di` si favelave... > > > The problem is that if there is no PARTUUID ceph-volume can't ensure > > what device is the one actually pointing to data/journal. Being 'GPT' > > alone will not be enough here :( > >

Re: [ceph-users] Experiences with the Samsung SM/PM883 disk?

2019-02-22 Thread Oliver Schmitz
Hi Paul, i don't have any of those disks, but maybe the SM/PM863 has it's write cache enabled by default and PM883 hasn't. You should check that... I recognized horrible IOPS results on lots of MLC/V-NAND SSDs as soon as they don't use their DWC. Did anything else change in your testing environme

Re: [ceph-users] Experiences with the Samsung SM/PM883 disk?

2019-02-22 Thread Jacob DeGlopper
What are you connecting it to?  We just got the exact same drive for testing, and I'm seeing much higher performance, connected to a motherboard 6 Gb SATA port on a Supermicro X9 board. [root@centos7 jacob]# smartctl -a /dev/sda Device Model: Samsung SSD 883 DCT 960GB Firmware Version: HXT

[ceph-users] debian packages on download.ceph.com

2019-02-22 Thread Ronny Aasen
the 2019-02-12 Debian Buster went into soft freeze https://release.debian.org/buster/freeze_policy.html So all the debian developers are hard at work getting buster ready for release. It would be really awesome if we could get debian buster packages built on http://download.ceph.com/ both for

[ceph-users] redirect log to syslog and disable log to stderr

2019-02-22 Thread Alex Litvak
Hello everyone, I am running mimic 13.2.2 cluster in containers and noticed that docker logs ate all of my local disk space after a while. So I changed some debugging of rock, level, and mem db to 1/5 (default 4/5) and changed mon logging as such ceph tell mon.* injectargs --log-to-syslog=

Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow storage for db - why?

2019-02-22 Thread solarflow99
Aren't you undersized at only 30GB? I thought you should have 4% of your OSDs On Fri, Feb 22, 2019 at 3:10 PM Nick Fisk wrote: > >On 2/16/19 12:33 AM, David Turner wrote: > >> The answer is probably going to be in how big your DB partition is vs > >> how big your HDD disk is. From your output

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread Anthony D'Atri
? Did we start recommending that production mons run on a VM? I'd be very hesitant to do that, though probably some folks do. I can say for sure that in the past (Firefly) I experienced outages related to mons running on HDDs. That was a cluster of 450 HDD OSDs with colo journals and hundred