Hello everyone,
I'm experiencing a strange behaviour. My cluster is relatively small (43
OSDs, 11 nodes), running Ceph 12.2.10 (and Proxmox 5). Nodes are connected
via 10 Gbit network (Nexus 6000). Cluster is mixed (SSD and HDD), but with
different pools. Descibed error is only on the SSD par
Yep...these are setting already in place. And also followed all
recommendations to get performance, but still impacts with osd
down..even we have 2000+ osd.
And using 3 pools with diff. HW nodes for each pool. One pool's OSD
down, also impacts other pools performance...
which not expected with Ceph
No seen the CPU limitation because we are using the 4 cores per osd daemon.
But still using "ms_crc_data = true and ms_crc_header = true". Will
disable these and try the performance.
And using the filestore + leveldB only. filestore_op_threads = 2.
Rest of recovery and backfill settings done mini
Debug setting defaults are using..like 1/5 and 0/5 for almost..
Shall I try with 0 for all debug settings?
On Wed, Feb 20, 2019 at 9:17 PM Darius Kasparavičius wrote:
>
> Hello,
>
>
> Check your CPU usage when you are doing those kind of operations. We
> had a similar issue where our CPU monitori
What about the system stats on your mons during recovery? If they are
having a hard time keeping up with requests during a recovery, I could see
that impacting client io. What disks are they running on? CPU? Etc.
On Fri, Feb 22, 2019, 6:01 AM M Ranga Swami Reddy
wrote:
> Debug setting defaults a
Can you correlate the times to scheduled tasks inside of any VMs? For
instance if you have several Linux VMs with the updatedb command installed
that by default they will all be scanning their disks at the same time each
day to see where files are. Other common culprits could be scheduled
backups,
>On 2/16/19 12:33 AM, David Turner wrote:
>> The answer is probably going to be in how big your DB partition is vs
>> how big your HDD disk is. From your output it looks like you have a
>> 6TB HDD with a 28GB Blocks.DB partition. Even though the DB used
>> size isn't currently full, I would gu
>Yes and no... bluestore seems to not work really optimal. For example,
>it has no filestore-like journal waterlining and flushes the deferred
>write queue just every 32 writes (deferred_batch_ops). And when it does
>that it's basically waiting for the HDD to commit and slowing down all
>further wr
ceph mons looks fine during the recovery. Using HDD with SSD
journals. with recommeded CPU and RAM numbers.
On Fri, Feb 22, 2019 at 4:40 PM David Turner wrote:
>
> What about the system stats on your mons during recovery? If they are having
> a hard time keeping up with requests during a recov
Mon disks don't have journals, they're just a folder on a filesystem on a
disk.
On Fri, Feb 22, 2019, 6:40 AM M Ranga Swami Reddy
wrote:
> ceph mons looks fine during the recovery. Using HDD with SSD
> journals. with recommeded CPU and RAM numbers.
>
> On Fri, Feb 22, 2019 at 4:40 PM David Tur
ceph-mon disk with 500G with HDD (not journals/SSDs). Yes, mon use
folder on FS on a disk
On Fri, Feb 22, 2019 at 5:13 PM David Turner wrote:
>
> Mon disks don't have journals, they're just a folder on a filesystem on a
> disk.
>
> On Fri, Feb 22, 2019, 6:40 AM M Ranga Swami Reddy
> wrote:
>>
If your using hdd for monitor servers. Check their load. It might be
the issue there.
On Fri, Feb 22, 2019 at 1:50 PM M Ranga Swami Reddy
wrote:
>
> ceph-mon disk with 500G with HDD (not journals/SSDs). Yes, mon use
> folder on FS on a disk
>
> On Fri, Feb 22, 2019 at 5:13 PM David Turner wrote
Den fre 22 feb. 2019 kl 12:35 skrev M Ranga Swami Reddy <
swamire...@gmail.com>:
> No seen the CPU limitation because we are using the 4 cores per osd daemon.
> But still using "ms_crc_data = true and ms_crc_header = true". Will
> disable these and try the performance.
>
I am a bit sceptical to c
But ceph recommendation is to use VM (not even the HW node
recommended). will try to change the mon disk as SSD and HW node.
On Fri, Feb 22, 2019 at 5:25 PM Darius Kasparavičius wrote:
>
> If your using hdd for monitor servers. Check their load. It might be
> the issue there.
>
> On Fri, Feb 22,
Opps...is this really impact...will righ-away change this and test it.
On Fri, Feb 22, 2019 at 5:29 PM Janne Johansson wrote:
>
> Den fre 22 feb. 2019 kl 12:35 skrev M Ranga Swami Reddy
> :
>>
>> No seen the CPU limitation because we are using the 4 cores per osd daemon.
>> But still using "ms_c
On 2019-02-20 17:38, Mykola Golub wrote:
Note, if even rbd supported live (without any downtime) migration you
would still need to restart the client after the upgrate to a new
librbd with migration support.
You could probably get away with executing the client with a new librbd
version by li
>>These sizes are roughly 3GB,30GB,300GB. Anything in-between those sizes are
>>pointless. Only ~3GB of SSD will ever be used out of a
28GB partition. Likewise a 240GB partition is also pointless as only
~30GB will be used.
Where did you get those numbers? I would like to read more if you can
poi
Hello,
I couldn't find anything satisfying that could
clearly describe what this thread does.
And if the average IO wait for the block device
(ca. 60%) is normal on a SSD device. Even though
when there is no/not much client workload.
Output from iotop:
---
9890 be/4 ceph0.00 B/s 817.1
Hi,
it looks like the beloved Samsung SM/PM863a is no longer available and
the replacement is the new SM/PM883.
We got an 960GB PM883 (MZ7LH960HAJR-5) here and I ran the usual
fio benchmark... and got horrible results :(
fio --filename=/dev/sdX --direct=1 --sync=1 --rw=write --bs=4k
--numjob
A couple of hints to debug the issue (since I had to recently debug a
problem with the same symptoms):
- As far as I understand the reported 'implicated osds' are only the
primary ones. In the log of the osds you should find also the relevant pg
number, and with this information you can get all th
Bad SSDs can also cause this. Which SSD are you using?
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Fri, Feb 22, 2019 at 2:53 PM Massimo Sgaravatto
wrote:
>
> A
>Where did you get those numbers? I would like to read more if you can
point to a link.
Just found the link:
https://github.com/facebook/rocksdb/wiki/Leveled-Compaction
On Fri, Feb 22, 2019 at 4:22 PM Serkan Çoban wrote:
>
> >>These sizes are roughly 3GB,30GB,300GB. Anything in-between those siz
Bluestore/RocksDB will only put the next level up size of DB on flash if the
whole size will fit.
These sizes are roughly 3GB,30GB,300GB. Anything in-between those sizes are
pointless. Only ~3GB of SSD will ever be used out of a
28GB partition. Likewise a 240GB partition is also pointless as onl
Mandi! Alfredo Deza
In chel di` si favelave...
> The problem is that if there is no PARTUUID ceph-volume can't ensure
> what device is the one actually pointing to data/journal. Being 'GPT'
> alone will not be enough here :(
Ok. There's some way to 'force' a PARTUUID, in a GPT or non-GPT
partit
On Fri, Feb 22, 2019 at 02:43:36PM +0200, koukou73gr wrote:
> On 2019-02-20 17:38, Mykola Golub wrote:
>
> > Note, if even rbd supported live (without any downtime) migration you
> > would still need to restart the client after the upgrate to a new
> > librbd with migration support.
> >
> >
> Yo
Mandi! Christian Balzer
In chel di` si favelave...
> You pretty much answered your question, as in a limit of "osd' would do
> the trick, though not just for intra-host.
Oh, documentation does not list the possible values... good to know.
> But of course everybody will (rightly) tell you that
On Fri, Feb 22, 2019 at 9:38 AM Marco Gaiarin wrote:
>
> Mandi! Alfredo Deza
> In chel di` si favelave...
>
> > The problem is that if there is no PARTUUID ceph-volume can't ensure
> > what device is the one actually pointing to data/journal. Being 'GPT'
> > alone will not be enough here :(
>
>
Hi Paul,
i don't have any of those disks, but maybe the SM/PM863 has it's write
cache enabled by default and PM883 hasn't. You should check that...
I recognized horrible IOPS results on lots of MLC/V-NAND SSDs as soon
as they don't use their DWC.
Did anything else change in your testing environme
What are you connecting it to? We just got the exact same drive for
testing, and I'm seeing much higher performance, connected to a
motherboard 6 Gb SATA port on a Supermicro X9 board.
[root@centos7 jacob]# smartctl -a /dev/sda
Device Model: Samsung SSD 883 DCT 960GB
Firmware Version: HXT
the 2019-02-12 Debian Buster went into soft freeze
https://release.debian.org/buster/freeze_policy.html
So all the debian developers are hard at work getting buster ready for
release.
It would be really awesome if we could get debian buster packages built
on http://download.ceph.com/ both for
Hello everyone,
I am running mimic 13.2.2 cluster in containers and noticed that docker
logs ate all of my local disk space after a while. So I changed some
debugging of rock, level, and mem db to 1/5 (default 4/5) and changed
mon logging as such
ceph tell mon.* injectargs --log-to-syslog=
Aren't you undersized at only 30GB? I thought you should have 4% of your
OSDs
On Fri, Feb 22, 2019 at 3:10 PM Nick Fisk wrote:
> >On 2/16/19 12:33 AM, David Turner wrote:
> >> The answer is probably going to be in how big your DB partition is vs
> >> how big your HDD disk is. From your output
? Did we start recommending that production mons run on a VM? I'd be very
hesitant to do that, though probably some folks do.
I can say for sure that in the past (Firefly) I experienced outages related to
mons running on HDDs. That was a cluster of 450 HDD OSDs with colo journals
and hundred
33 matches
Mail list logo