[ceph-users] help me understand ceph snapshot sizes

2024-02-22 Thread garcetto
good morning, i am trying to understand ceph snapshot sizing. For example if i have 2.7 GB volume and i create a snap on it, the sizing says: (BEFORE SNAP) rbd du volumes/volume-d954915c-1dc1-41cb-8bf0-0c67e7b6e080 NAME PROVISIONED USED volume-d954915c-1dc1-41cb-8bf0-0c67e7b6e080 10 GiB 2.7 Gi

[ceph-users] Size return by df

2024-02-22 Thread Albert Shih
Hi, I got one cephfs with one volume and subvolumes with a erasure coding. If I don't set any quota when I run df on the client I got 0ccbc438-d109-4c5f-b47b-70f8df707c2c/vo 5,8P 78T 5,8P 2% /vo The 78T seem to be the size use by ceph on disk (on the hardware I mean). And I find th

[ceph-users] [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
We have 6 node ( 3 OSD-node and 3 service node), t2/3 OSD nodes was powered off and we got big problem pls check ceph-s result below now we cannot start mds service, ( we tried to start but it stopped after 2 minute) Now my application cannot access to NFS exported Folder What should we do [roo

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread Eugen Block
What does the MDS log when it crashes? Zitat von nguyenvand...@baoviet.com.vn: We have 6 node ( 3 OSD-node and 3 service node), t2/3 OSD nodes was powered off and we got big problem pls check ceph-s result below now we cannot start mds service, ( we tried to start but it stopped after 2 min

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
How can we get log of MDS, pls guide me T_T ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread Eugen Block
There a couple of ways, find your MDS daemon with: ceph fs status -> should show you the to-be-active MDS On that host run: cephadm logs --name mds.{MDS} or alternatively: cephadm ls --no-detail | grep mds journalctl -u ceph-{FSID}@mds.{MDS} --no-pager > {MDS}.log Zitat von nguyenvand...@b

[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Eugen Block
Hm, I wonder if setting (and unsetting after a while) noscrub and nodeep-scrub has any effect. Have you tried that? Zitat von Cedric : Update: we have run fsck and re-shard on all bluestore volume, seems sharding were not applied. Unfortunately scrubs and deep-scrubs are still stuck on PGs

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
it suck too long log, could you pls guide me how to grep/filter important things in logs ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Some questions about cephadm

2024-02-22 Thread Eugen Block
Hi, just responding to the last questions: - After the bootstrap, the Web interface was accessible : - How can I access the wizard page again? If I don't use it the first time I could not find another way to get it. I don't know how to recall the wizard, but you should be able

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
Feb 22 13:39:43 cephgw02 conmon[1340927]: log_file /var/lib/ceph/crash/2024-02-22T06:39:43.618845Z_78ee38bc-9115-4bc6-8c3a-4bf42284c970/log Feb 22 13:39:43 cephgw02 conmon[1340927]: --- end dump of recent events --- Feb 22 13:39:45 cephgw02 systemd[1]: ceph-258af72a-cff3-11eb-a261-d4f5ef25154c@

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread Eugen Block
If it crashes after two minutes you have your time window to look for. Restart the mds daemon and capture everything after that until the crash. Zitat von nguyenvand...@baoviet.com.vn: it suck too long log, could you pls guide me how to grep/filter important things in logs ? _

[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Cedric
Thanks Eugen for the suggestion, yes we have tried, also repeering concerned PGs, still the same issue. Looking at the code it seems the split-mode message is triggered when the PG as ""stats_invalid": true,", here is the result of a query: "stats_invalid": true, "dirty_stats_inva

[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Eugen Block
I found a config to force scrub invalid PGs, what is your current setting on that? ceph config get osd osd_scrub_invalid_stats true The config reference states: Forces extra scrub to fix stats marked as invalid. But the default seems to be true, so I'd expect it's true in your case as we

[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Cedric
Yes the osd_scrub_invalid_stats is set to true. We are thinking about the use of "ceph pg_mark_unfound_lost revert" action, but we wonder if there is a risk of data loss. On Thu, Feb 22, 2024 at 11:50 AM Eugen Block wrote: > > I found a config to force scrub invalid PGs, what is your current > s

[ceph-users] Sharing our "Containerized Ceph and Radosgw Playground"

2024-02-22 Thread Ansgar Jazdzewski
Hi Folks, We are excited to announce plans for building a larger Ceph-S3 setup. To ensure its success, extensive testing is needed in advance. Some of these tests don't need a full-blown Ceph cluster on hardware but still require meeting specific logical requirements, such as a multi-site S3 setu

[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Eugen Block
We are thinking about the use of "ceph pg_mark_unfound_lost revert" action, but we wonder if there is a risk of data loss. You don't seem to have unfound objects so I don't think that command would make sense. You haven't told yet if you changed the hit_set_count to 0. Have you already tried

[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Cedric
On Thu, Feb 22, 2024 at 12:37 PM Eugen Block wrote: > You haven't told yet if you changed the hit_set_count to 0. Not yet, we will give it a try ASAP > Have you already tried to set the primary PG out and wait for the > backfill to finish? No, we will try also > And another question, are all s

[ceph-users] Cannot start ceph after maintenence

2024-02-22 Thread Schweiss, Chip
I had to temporarily disconnect the network on my entire Ceph cluster, so I prepared the cluster by following what appears to be some incomplete advice. I did the following before disconnecting the network: #ceph osd set noout #ceph osd set norecover #ceph osd set norebalance #ceph osd set nobackf

[ceph-users] Re: Cannot start ceph after maintenence

2024-02-22 Thread Stephan Hohn
Hi Chip, Looks like not all mons are up or couldn't reach each other via network to form quorum. Make sure all nodes can reach each other and check the mon logs. Furthermore some info about pvecm status pveceph status or just ceph status would be helpful Cheers Stephan Am Do., 22. F

[ceph-users] Re: Size return by df

2024-02-22 Thread Konstantin Shalygin
Hi, Yes you can, this controlled by option client quota df = false k Sent from my iPhone > On Feb 22, 2024, at 11:17, Albert Shih wrote: > > Is they are any way to keep the first answer ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubsc

[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Eugen Block
Hi, Have you already tried to set the primary PG out and wait for the backfill to finish? Of course I meant the primary OSD for that PG, I hope that was clear. ;-) We are thinking about the use of "ceph pg_mark_unfound_lost revert" I'm not a developer, but how I read the code [2] is that s

[ceph-users] Re: Cannot start ceph after maintenence

2024-02-22 Thread Schweiss, Chip
The problem turns out to be burning the candle at both ends. I have been checking network communication for the past few hours and haven't realized I was using my 1Gb IPs, not the 100Gb IPs. The 100Gb got connected to the wrong ports on the cable move. Thanks for the attempted assists. Focusi

[ceph-users] High IO utilization for bstore_kv_sync

2024-02-22 Thread Work Ceph
Hello guys, We are running Ceph Octopus on Ubuntu 18.04, and we are noticing spikes of IO utilization for bstore_kv_sync thread during processes such as adding a new pool and increasing/reducing the number of PGs in a pool. It is funny though that the IO utilization (reported with IOTOP) is 99.99%

[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Mark Nelson
Most likely you are seeing time spent waiting on fdatsync in bstore_kv_sync if the drives you are using don't have power loss protection and can't perform flushes quickly.  Some consumer grade drives are actually slower at this than HDDs. Mark On 2/22/24 11:04, Work Ceph wrote: Hello guys,

[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Work Ceph
Thanks for the prompt response! I see, and indeed some of them are consumer SSD disks. Is there any parameter that we can change/tune to better handle the call "fdatsync"? Maybe using NVMEs for the RocksDB? On Thu, Feb 22, 2024 at 2:24 PM Mark Nelson wrote: > Most likely you are seeing time sp

[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Mark Nelson
The biggest improvement would be to put all of the OSDs on SSDs with PLP.  Next would be to put the WAL/DB on drives with PLP.  If price is a concern,  you can sometimes find really good older drives like Intel P4510s on ebay for reasonable prices.  Just watch out for how much write wear they h

[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Anthony D'Atri
> you can sometimes find really good older drives like Intel P4510s on ebay > for reasonable prices. Just watch out for how much write wear they have on > them. Also be sure to update to the latest firmware before use, then issue a Secure Erase. > _

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
Could you pls help me explain the status of volume: recovering ? what is it ? and do we need to wait for volume recovery progress finished ?? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: FS down - mds degraded

2024-02-22 Thread nguyenvandiep
HI Mr Patrick, We are in same situation with Sake, now my MDS is crashed , NFS service is down with CEPHFS not responding. with my "ceph -s" result health: HEALTH_WARN 3 failed cephadm daemon(s) 1 filesystem is degraded insufficient standby MDS daemons availab