date:20230221

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread mailing-lists

Dear Michael, I don't have an explanation for your problem unfortunately, but I just wondered that you experience a drop in performance, that this SSD shouldn't have. Your SSDs drives (Samsung 870 EVO) should not get slower on large writes. You can verify this on the post you've attached [1] o

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Michael Wodniok

Hi Ken, thank you for your hint - any input is appreciated. Please note that Ceph does highly random IO (especially when having small object sizes), AnandTech also states: "Some of our other tests have shown a few signs that the 870 EVO's write performance can drop when the SLC cache runs out,

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Phil Regnauld

Michael Wodniok (wodniok) writes: > Hi all, > > digging around debugging, why our (small: 10 Hosts/~60 OSDs) cluster is so > slow even while recovering I found out one of our key issues are some SSDs > with SLC cache (in our case Samsung SSD 870 EVO) - which we just recycled > from other use ca

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Marc

What fio test would indicate this behaviour up front? I guess something like this, but with a duration larger than this disk cache? [randwrite-4k-seq] stonewall bs=4k rw=randwrite fsync=1 > thank you for your hint - any input is appreciated. Please note that > Ceph does highly random IO (espec

[ceph-users] Re: kernel client osdc ops stuck and mds slow reqs

2023-02-21 Thread Ilya Dryomov

On Tue, Feb 21, 2023 at 1:01 AM Xiubo Li wrote: > > > On 20/02/2023 22:28, Kuhring, Mathias wrote: > > Hey Dan, hey Ilya > > > > I know this issue is two years old already, but we are having similar > > issues. > > > > Do you know, if the fixes got ever backported to RHEL kernels? > > It's already

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Michael Wodniok

Hi Marc, I would try something the near your example, yes. You could add "size=" for testing until once written the whole disk (by io, not by all available cells) independent of time. If the result is either high deviation or the results are far from the specification it's very likely there is

[ceph-users] Re: Stuck OSD service specification - can't remove

2023-02-21 Thread Eugen Block

Hi, did you ever resolve that? I'm stuck with the same "deleting" service in 'ceph orch ls' and found your thread. Thanks, Eugen ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Sven Kieske

Hi, I'm just writing to share some old knowledge, which is: never, ever use consumer ssd for ceph! see e.g. https://old.reddit.com/r/Proxmox/comments/izg6e5/questions_on_running_ceph_using_consumer_ssds/g6it9uv/ -- Mit freundlichen Grüßen / Regards Sven Kieske Systementwickler / systems engin

[ceph-users] increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior

2023-02-21 Thread Boris Behrens

Hi, today I wanted to increase the PGs from 2k -> 4k and random OSDs went offline in the cluster. After some investigation we saw, that the OSDs got OOM killed (I've seen a host that went from 90GB used memory to 190GB before OOM kills happen). We have around 24 SSD OSDs per host and 128GB/190GB/2

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Alvaro Soto

Hey I have seen that kind of behavior in the past and we manage to flash the firmware to increase the cache size, which will kill the drive a little bit faster so only to use in lab environments. I'm unaware if the Samsung magician software can do that. btw, a few evo's 9xx are listed by Samsung to

[ceph-users] Re: increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior

2023-02-21 Thread Josh Baergen

Hi Boris, This sounds a bit like https://tracker.ceph.com/issues/53729. https://tracker.ceph.com/issues/53729#note-65 might help you diagnose whether this is the case. Josh On Tue, Feb 21, 2023 at 9:29 AM Boris Behrens wrote: > > Hi, > today I wanted to increase the PGs from 2k -> 4k and random

[ceph-users] Re: increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior

2023-02-21 Thread Boris Behrens

Thanks a lot Josh. That really seems like my problem. That does not look healthy in the cluster. oof. ~# ceph tell osd.* perf dump |grep 'osd_pglog\|^osd\.[0-9]' osd.0: { "osd_pglog_bytes": 459617868, "osd_pglog_items": 2955043, osd.1: { "osd_pglog_bytes": 598414548,

[ceph-users] Re: Undo "radosgw-admin bi purge"

2023-02-21 Thread J. Eric Ivancich

When the admin runs “bi purge” they have the option of supplying a bucket_id with the “--bucket-id” command-line argument. This was useful back when resharding did not automatically remove the older bucket index shards (which it now does), which had a different bucket_id from the current bucket

[ceph-users] Re: Undo "radosgw-admin bi purge"

2023-02-21 Thread Richard Bade

Hi Robert, A colleague and I ran into this a few weeks ago. The way we managed to get access back to delete the bucket properly (using radosgw-admin bucket rm) was to reshard the bucket. This created a new bucket index and therefore it was then possible to delete it. If you are looking to get acces

[ceph-users] Re: [ext] Re: Re: kernel client osdc ops stuck and mds slow reqs

2023-02-21 Thread Kuhring, Mathias

Hey Li, thank you for the quick reply. So the kernel on the cluster nodes might be the issue here? I thought the client kernel is the only relevant one (since we cephadm). Anyhow, we plan to upgrade the cluster nodes to Rocky 8 soon. We'll see if this helps with the issue. Best, Mathias On 2/2

[ceph-users] Strange behavior when using storage classes

2023-02-21 Thread Michal Strnad

Hi all, we encountered some strange behavior when using storage classes for S3 protocol. Some objects end up in a different pool than we would expect. Below is a list of commands used for create an account with replicated storage class, upload some files to the bucket and checked that they

[ceph-users] Re: Do not use SSDs with (small) SLC cache

[ceph-users] Re: Do not use SSDs with (small) SLC cache

[ceph-users] Re: Do not use SSDs with (small) SLC cache

[ceph-users] Re: Do not use SSDs with (small) SLC cache

[ceph-users] Re: kernel client osdc ops stuck and mds slow reqs

[ceph-users] Re: Do not use SSDs with (small) SLC cache

[ceph-users] Re: Stuck OSD service specification - can't remove

[ceph-users] Re: Do not use SSDs with (small) SLC cache

[ceph-users] increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior

[ceph-users] Re: Do not use SSDs with (small) SLC cache

[ceph-users] Re: increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior

[ceph-users] Re: increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior

[ceph-users] Re: Undo "radosgw-admin bi purge"

[ceph-users] Re: Undo "radosgw-admin bi purge"

[ceph-users] Re: [ext] Re: Re: kernel client osdc ops stuck and mds slow reqs

[ceph-users] Strange behavior when using storage classes

16 matches

Site Navigation

Mail list logo

Footer information