Dear Michael,
I don't have an explanation for your problem unfortunately, but I just
wondered that you experience a drop in performance, that this SSD
shouldn't have. Your SSDs drives (Samsung 870 EVO) should not get slower
on large writes. You can verify this on the post you've attached [1] o
Hi Ken,
thank you for your hint - any input is appreciated. Please note that Ceph does
highly random IO (especially when having small object sizes), AnandTech also
states:
"Some of our other tests have shown a few signs that the 870 EVO's write
performance can drop when the SLC cache runs out,
Michael Wodniok (wodniok) writes:
> Hi all,
>
> digging around debugging, why our (small: 10 Hosts/~60 OSDs) cluster is so
> slow even while recovering I found out one of our key issues are some SSDs
> with SLC cache (in our case Samsung SSD 870 EVO) - which we just recycled
> from other use ca
What fio test would indicate this behaviour up front? I guess something like
this, but with a duration larger than this disk cache?
[randwrite-4k-seq]
stonewall
bs=4k
rw=randwrite
fsync=1
> thank you for your hint - any input is appreciated. Please note that
> Ceph does highly random IO (espec
On Tue, Feb 21, 2023 at 1:01 AM Xiubo Li wrote:
>
>
> On 20/02/2023 22:28, Kuhring, Mathias wrote:
> > Hey Dan, hey Ilya
> >
> > I know this issue is two years old already, but we are having similar
> > issues.
> >
> > Do you know, if the fixes got ever backported to RHEL kernels?
>
> It's already
Hi Marc,
I would try something the near your example, yes. You could add
"size=" for testing until once written the whole disk (by io, not
by all available cells) independent of time.
If the result is either high deviation or the results are far from the
specification it's very likely there is
Hi, did you ever resolve that? I'm stuck with the same "deleting"
service in 'ceph orch ls' and found your thread.
Thanks,
Eugen
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hi,
I'm just writing to share some old knowledge, which is:
never, ever use consumer ssd for ceph!
see e.g.
https://old.reddit.com/r/Proxmox/comments/izg6e5/questions_on_running_ceph_using_consumer_ssds/g6it9uv/
--
Mit freundlichen Grüßen / Regards
Sven Kieske
Systementwickler / systems engin
Hi,
today I wanted to increase the PGs from 2k -> 4k and random OSDs went
offline in the cluster.
After some investigation we saw, that the OSDs got OOM killed (I've seen a
host that went from 90GB used memory to 190GB before OOM kills happen).
We have around 24 SSD OSDs per host and 128GB/190GB/2
Hey
I have seen that kind of behavior in the past and we manage to flash the
firmware to increase the cache size, which will kill the drive a little bit
faster so only to use in lab environments. I'm unaware if the Samsung
magician software can do that.
btw, a few evo's 9xx are listed by Samsung to
Hi Boris,
This sounds a bit like https://tracker.ceph.com/issues/53729.
https://tracker.ceph.com/issues/53729#note-65 might help you diagnose
whether this is the case.
Josh
On Tue, Feb 21, 2023 at 9:29 AM Boris Behrens wrote:
>
> Hi,
> today I wanted to increase the PGs from 2k -> 4k and random
Thanks a lot Josh. That really seems like my problem.
That does not look healthy in the cluster. oof.
~# ceph tell osd.* perf dump |grep 'osd_pglog\|^osd\.[0-9]'
osd.0: {
"osd_pglog_bytes": 459617868,
"osd_pglog_items": 2955043,
osd.1: {
"osd_pglog_bytes": 598414548,
When the admin runs “bi purge” they have the option of supplying a bucket_id
with the “--bucket-id” command-line argument. This was useful back when
resharding did not automatically remove the older bucket index shards (which it
now does), which had a different bucket_id from the current bucket
Hi Robert,
A colleague and I ran into this a few weeks ago. The way we managed to
get access back to delete the bucket properly (using radosgw-admin
bucket rm) was to reshard the bucket.
This created a new bucket index and therefore it was then possible to delete it.
If you are looking to get acces
Hey Li,
thank you for the quick reply.
So the kernel on the cluster nodes might be the issue here?
I thought the client kernel is the only relevant one (since we cephadm).
Anyhow, we plan to upgrade the cluster nodes to Rocky 8 soon.
We'll see if this helps with the issue.
Best,
Mathias
On 2/2
Hi all,
we encountered some strange behavior when using storage classes for S3
protocol. Some objects end up in a different pool than we would expect.
Below is a list of commands used for create an account with replicated
storage class, upload some files to the bucket and checked that they
16 matches
Mail list logo