[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-06-25 Thread Dhairya Parmar
Hi Ivan,

This looks to be similar to the issue [0] that we're already addressing at
[1]. So basically there is some out-of-sync event that led the client to
make use of the inodes that MDS wasn't aware of/isn't tracking and hence
the crash. It'd be really helpful if you can provide us more logs.

CC @Rishabh Dave  @Venky Shankar
 @Patrick
Donnelly  @Xiubo Li 

[0] https://tracker.ceph.com/issues/61009
[1] https://tracker.ceph.com/issues/66251
--
*Dhairya Parmar*

Associate Software Engineer, CephFS

<https://www.redhat.com/>IBM, Inc.

On Mon, Jun 24, 2024 at 8:54 PM Ivan Clayson  wrote:

> Hello,
>
> We have been experiencing a serious issue with our CephFS backup cluster
> running quincy (version 17.2.7) on a RHEL8-derivative Linux kernel
> (Alma8.9, 4.18.0-513.9.1 kernel) where our MDSes for our filesystem are
> constantly in a "replay" or "replay(laggy)" state and keep crashing.
>
> We have a single MDS filesystem called "ceph_backup" with 2 standby
> MDSes along with a 2nd unused filesystem "ceph_archive" (this holds
> little to no data) where we are using our "ceph_backup" filesystem to
> backup our data and this is the one which is currently broken. The Ceph
> health outputs currently are:
>
> root@pebbles-s1 14:05 [~]: ceph -s
>cluster:
>  id: e3f7535e-d35f-4a5d-88f0-a1e97abcd631
>  health: HEALTH_WARN
>  1 filesystem is degraded
>  insufficient standby MDS daemons available
>  1319 pgs not deep-scrubbed in time
>  1054 pgs not scrubbed in time
>
>services:
>  mon: 4 daemons, quorum
> pebbles-s1,pebbles-s2,pebbles-s3,pebbles-s4 (age 36m)
>  mgr: pebbles-s2(active, since 36m), standbys: pebbles-s4,
> pebbles-s3, pebbles-s1
>  mds: 2/2 daemons up
>  osd: 1380 osds: 1380 up (since 29m), 1379 in (since 3d); 37
> remapped pgs
>
>data:
>  volumes: 1/2 healthy, 1 recovering
>  pools:   7 pools, 2177 pgs
>  objects: 3.55G objects, 7.0 PiB
>  usage:   8.9 PiB used, 14 PiB / 23 PiB avail
>  pgs: 83133528/30006841533 objects misplaced (0.277%)
>   2090 active+clean
>   47   active+clean+scrubbing+deep
>   29   active+remapped+backfilling
>   8active+remapped+backfill_wait
>   2active+clean+scrubbing
>   1active+clean+snaptrim
>
>io:
>  recovery: 1.9 GiB/s, 719 objects/s
>
> root@pebbles-s1 14:09 [~]: ceph fs status
> ceph_backup - 0 clients
> ===
> RANK  STATE MDS  ACTIVITY   DNSINOS   DIRS CAPS
>   0replay(laggy)  pebbles-s3   0  0 0  0
>  POOLTYPE USED  AVAIL
> mds_backup_fs  metadata  1255G  2780G
> ec82_primary_fs_datadata   0   2780G
>ec82pool  data8442T  3044T
> ceph_archive - 2 clients
> 
> RANK  STATE  MDS ACTIVITY DNSINOS   DIRS CAPS
>   0active  pebbles-s2  Reqs:0 /s  13.4k  7105118 2
>  POOLTYPE USED  AVAIL
> mds_archive_fs metadata  5184M  2780G
> ec83_primary_fs_datadata   0   2780G
>ec83pool  data 138T  2767T
> MDS version: ceph version 17.2.7
> (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
> root@pebbles-s1 14:09 [~]: ceph health detail | head
> HEALTH_WARN 1 filesystem is degraded; insufficient standby MDS
> daemons available; 1319 pgs not deep-scrubbed in time; 1054 pgs not
> scrubbed in time
> [WRN] FS_DEGRADED: 1 filesystem is degraded
>  fs ceph_backup is degraded
> [WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons
> available
>  have 0; want 1 more
>
> When our cluster first ran after a reboot, Ceph ran through the 2
> standby MDSes, crashing them all, until it reached the final MDS and is
> now stuck in this "replay(laggy)" state. Putting our MDSes into
> debugging mode, we can see that this MDS crashed when replaying the
> journal for a particular inode (this is the same for all the MDSes and
> they all crash on the same object):
>
> ...
> 2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
> EMetaBlob.replay for [521,head] had [inode 0x1005ba89481
> [...539,head]
>
> /cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/
> auth fragtree_t(*^2 00*^3 0*^
> 4 1*^3 00010*^4 00011*^4 

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-06-25 Thread Dhairya Parmar
On Tue, Jun 25, 2024 at 6:38 PM Ivan Clayson  wrote:

> Hi Dhairya,
>
> Thank you for your rapid reply. I tried recovering the dentries for the
> file just before the crash I mentioned before and then splicing the
> transactions from the journal which seemed to remove that issue for that
> inode but resulted in the MDS crashing on the next inode in the journal
> when performing replay.
>
The MDS delegates a range of preallocated inodes (in form of a set -
interval_set preallocated_inos) to the clients, so it can be one
inode that is untracked or some inodes from the range or in worst case
scenario - ALL, and this is something that even the `cephfs-journal-tool`
would not be able to tell (since we're talking about MDS internals which
aren't exposed to such tools). That is the reason why you see "MDS crashing
on the next inode in the journal when performing replay".

An option could be to expose the inode set to some tool or asok cmd to
identify such inodes ranges, which needs to be discussed. For now, we're
trying to address this in [0], you can follow the discussion there.

[0] https://tracker.ceph.com/issues/66251

> Removing all the transactions involving the directory housing the files
> that seemed to cause these crashes from the journal only caused the MDS to
> fail to even start replay.
>
I've rolled back our journal to our original version when the crash first
> happened and the entire MDS log for the crash can be found here:
> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s3.flush_journal.log-25-06-24
>
Awesome, this would help us a ton. Apart from this, would it be possible to
send us client logs?

> Please let us know if you would like any other logs file as we can easily
> induce this crash.
>
Since you can easily induce the crash, can you share the reproducer please
i.e. what all action you take in order to hit this?

> Kindest regards,
>
> Ivan
> On 25/06/2024 09:58, Dhairya Parmar wrote:
>
> CAUTION: This email originated from outside of the LMB:
> *.-dpar...@redhat.com-.*
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> If you think this is a phishing email, please forward it to
> phish...@mrc-lmb.cam.ac.uk
>
>
> --
> Hi Ivan,
>
> This looks to be similar to the issue [0] that we're already addressing at
> [1]. So basically there is some out-of-sync event that led the client to
> make use of the inodes that MDS wasn't aware of/isn't tracking and hence
> the crash. It'd be really helpful if you can provide us more logs.
>
> CC @Rishabh Dave  @Venky Shankar  
> @Patrick
> Donnelly  @Xiubo Li 
>
> [0] https://tracker.ceph.com/issues/61009
> [1] https://tracker.ceph.com/issues/66251
> --
> *Dhairya Parmar*
>
> Associate Software Engineer, CephFS
>
> <https://www.redhat.com/>IBM, Inc.
>
> On Mon, Jun 24, 2024 at 8:54 PM Ivan Clayson 
> wrote:
>
>> Hello,
>>
>> We have been experiencing a serious issue with our CephFS backup cluster
>> running quincy (version 17.2.7) on a RHEL8-derivative Linux kernel
>> (Alma8.9, 4.18.0-513.9.1 kernel) where our MDSes for our filesystem are
>> constantly in a "replay" or "replay(laggy)" state and keep crashing.
>>
>> We have a single MDS filesystem called "ceph_backup" with 2 standby
>> MDSes along with a 2nd unused filesystem "ceph_archive" (this holds
>> little to no data) where we are using our "ceph_backup" filesystem to
>> backup our data and this is the one which is currently broken. The Ceph
>> health outputs currently are:
>>
>> root@pebbles-s1 14:05 [~]: ceph -s
>>cluster:
>>  id: e3f7535e-d35f-4a5d-88f0-a1e97abcd631
>>  health: HEALTH_WARN
>>  1 filesystem is degraded
>>  insufficient standby MDS daemons available
>>  1319 pgs not deep-scrubbed in time
>>  1054 pgs not scrubbed in time
>>
>>services:
>>  mon: 4 daemons, quorum
>> pebbles-s1,pebbles-s2,pebbles-s3,pebbles-s4 (age 36m)
>>  mgr: pebbles-s2(active, since 36m), standbys: pebbles-s4,
>> pebbles-s3, pebbles-s1
>>  mds: 2/2 daemons up
>>  osd: 1380 osds: 1380 up (since 29m), 1379 in (since 3d); 37
>> remapped pgs
>>
>>data:
>>  volumes: 1/2 healthy, 1 recovering
>>  pools:   7 pools, 2177 pgs
>>  objects: 3.55G objects, 7.0 PiB
>>  usage:   8.9 PiB used, 14 PiB / 23 PiB avail
>>  pgs: 83133528/30006841533 objects misplaced (0.2

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-06-27 Thread Dhairya Parmar
otal 588368, rss 308304, heap 207132, baseline 182556, 0 / 15149
> inodes have caps, 0 caps, 0 caps per inode
> -1> 2024-06-22T05:41:44.642+0100 7f1846675700 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/include/interval_set.h:
> In function 'void interval_set::erase(T, T, std::function T)>) [with T = inodeno_t; C = std::map]' thread 7f1846675700 time
> 2024-06-22T05:41:44.643146+0100
>
>  ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x135) [0x7f18568b64a3]
>  2: /usr/lib64/ceph/libceph-common.so.2(+0x269669) [0x7f18568b6669]
>  3: (interval_set::erase(inodeno_t, inodeno_t,
> std::function)+0x2e5) [0x5592e5027885]
>  4: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4377)
> [0x5592e532c7b7]
>  5: (EUpdate::replay(MDSRank*)+0x61) [0x5592e5330bd1]
>  6: (MDLog::_replay_thread()+0x7bb) [0x5592e52b754b]
>  7: (MDLog::ReplayThread::entry()+0x11) [0x5592e4f6a041]
>  8: /lib64/libpthread.so.0(+0x81ca) [0x7f18558a41ca]
>  9: clone()
>
>  0> 2024-06-22T05:41:44.643+0100 7f1846675700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f1846675700 thread_name:md_log_replay
>
>  ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> (stable)
>  1: /lib64/libpthread.so.0(+0x12cf0) [0x7f18558aecf0]
>  2: gsignal()
>  3: abort()
>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x18f) [0x7f18568b64fd]
>  5: /usr/lib64/ceph/libceph-common.so.2(+0x269669) [0x7f18568b6669]
>  6: (interval_set::erase(inodeno_t, inodeno_t,
> std::function)+0x2e5) [0x5592e5027885]
>  7: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4377)
> [0x5592e532c7b7]
>  8: (EUpdate::replay(MDSRank*)+0x61) [0x5592e5330bd1]
>  9: (MDLog::_replay_thread()+0x7bb) [0x5592e52b754b]
>  10: (MDLog::ReplayThread::entry()+0x11) [0x5592e4f6a041]
>  11: /lib64/libpthread.so.0(+0x81ca) [0x7f18558a41ca]
>  12: clone()
>
> We have a relatively low debug setting normally so I don't think many
> details of the initial crash were captured unfortunately and the MDS logs
> before the above (i.e. "-60" and older) are just beacon messages and
> _check_auth_rotating checks.
>
> I was wondering whether you have any recommendations in terms of what
> actions we could take to bring our filesystem back into a working state
> short of rebuilding the entire metadata pool? We are quite keen to bring
> our backup back into service urgently as we currently do not have any
> accessible backups for our Ceph clusters.
>
> Kindest regards,
>
> Ivan
> On 25/06/2024 19:18, Dhairya Parmar wrote:
>
> CAUTION: This email originated from outside of the LMB:
> *.-dpar...@redhat.com-.*
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> If you think this is a phishing email, please forward it to
> phish...@mrc-lmb.cam.ac.uk
>
>
> --
>
>
> On Tue, Jun 25, 2024 at 6:38 PM Ivan Clayson 
> wrote:
>
>> Hi Dhairya,
>>
>> Thank you for your rapid reply. I tried recovering the dentries for the
>> file just before the crash I mentioned before and then splicing the
>> transactions from the journal which seemed to remove that issue for that
>> inode but resulted in the MDS crashing on the next inode in the journal
>> when performing replay.
>>
> The MDS delegates a range of preallocated inodes (in form of a set -
> interval_set preallocated_inos) to the clients, so it can be one
> inode that is untracked or some inodes from the range or in worst case
> scenario - ALL, and this is something that even the `cephfs-journal-tool`
> would not be able to tell (since we're talking about MDS internals which
> aren't exposed to such tools). That is the reason why you see "MDS crashing
> on the next inode in the journal when performing replay".
>
> An option could be to expose the inode set to some tool or asok cmd to
> identify such inodes ranges, which needs to be discussed. For now, we're
> trying to address this in [0], you can follow the discussion there.
>
> [0] https://tracker.ceph.com/issues/66251
>
>> Removing all the transactions involving the directory housing the files
>> that seemed to cause these crashes from the journal only caused the MDS to
>> fail to even start replay.
>>
> I've rolled back our journal to our original version when the crash first
>> happened and the entire MDS log for the crash can be found here:
>>

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-06-27 Thread Dhairya Parmar
Ivan, before resetting the journal, could you take the backup of your
journal using `cephfs-journal-tool export` [0] and send it to us through
`ceph-post-file` [1] or any other means you're comfortable with?

[0]
https://docs.ceph.com/en/latest/cephfs/cephfs-journal-tool/#example-journal-import-export
[1] https://docs.ceph.com/en/latest/man/8/ceph-post-file

On Thu, Jun 27, 2024 at 5:09 PM Dhairya Parmar  wrote:

> Hi Ivan,
>
> The solution (which has been successful for us in the past) is to reset
> the journal. This would bring the fs back online and return the MDSes to a
> stable state, but some data would be lost—the data in the journal that
> hasn't been flushed to the backing store would be gone. Therefore, you
> should try to flush out as much journal data as possible before resetting
> the journal.
>
> Here are the steps for this entire process:
>
> 1) Bring the FS offline
> $ ceph fs fail 
>
> 2) Recover dentries from journal (run it with every MDS Rank)
> $ cephfs-journal-tool --rank=: event recover_dentries
> summary
>
> 3) Reset the journal (again with every MDS Rank)
> $ cephfs-journal-tool --rank=: journal reset
>
> 4) Bring the FS online
> $ cephfs fs set  joinable true
>
> 5) Restart the MDSes
>
> 6) Perform scrub to ensure consistency of fs
> $ ceph tell mds.:0 scrub start  [scrubopts] [tag]
> # you could try a recursive scrub maybe `ceph tell mds.:0 scrub
> start / recursive`
>
> Some important notes to keep in mind:
> * Recovering dentries will take time (generally, rank 0 is the most
> time-consuming, but the rest should be quick).
> * cephfs-journal-tool and metadata OSDs are bound to use a significant CPU
> percentage. This is because cephfs-journal-tool has to swig the journal
> data and flush it out to the backing store, which also makes the metadata
> operations go rampant, resulting in OSDs taking a significant percentage of
> CPU.
>
> Do let me know how this goes.
>
> On Thu, Jun 27, 2024 at 3:44 PM Ivan Clayson 
> wrote:
>
>> Hi Dhairya,
>>
>> We can induce the crash by simply restarting the MDS and the crash seems
>> to happen when an MDS goes from up:standby to up:replay. The MDS works
>> through a few files in the log before eventually crashing where I've
>> included the logs for this here (this is after I imported the backed up
>> journal which I hope was successful but please let me know if you suspect
>> it wasn't!):
>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s3.mds_restart_crash.log
>>
>> With respect to the client logs, are you referring to the clients who are
>> writing to the filesystem? We don't typically run them in any sort of debug
>> mode and we have quite a few machines running our backup system but we can
>> look an hour or so before the first MDS crash (though I don't know if this
>> is when the de-sync occurred). Here are some MDS logs with regards to the
>> initial crash on Saturday morning though which may be helpful:
>>
>>-59> 2024-06-22T05:41:43.090+0100 7f184ce82700 10 monclient: tick
>>-58> 2024-06-22T05:41:43.090+0100 7f184ce82700 10 monclient:
>> _check_auth_rotating have uptodate secrets (they expire after
>> 2024-06-22T05:41:13.091556+0100)
>>-57> 2024-06-22T05:41:43.208+0100 7f184de84700  1 mds.pebbles-s2
>> Updating MDS map to version 2529650 from mon.3
>>-56> 2024-06-22T05:41:43.208+0100 7f184de84700  4 mds.0.purge_queue
>> operator():  data pool 6 not found in OSDMap
>>-55> 2024-06-22T05:41:43.208+0100 7f184de84700  4 mds.0.purge_queue
>> operator():  data pool 3 not found in OSDMap
>>-54> 2024-06-22T05:41:43.209+0100 7f184de84700  5 asok(0x5592e7968000)
>> register_command objecter_requests hook 0x5592e78f8800
>>-53> 2024-06-22T05:41:43.209+0100 7f184de84700 10 monclient:
>> _renew_subs
>>-52> 2024-06-22T05:41:43.209+0100 7f184de84700 10 monclient:
>> _send_mon_message to mon.pebbles-s4 at v2:10.1.5.134:3300/0
>>-51> 2024-06-22T05:41:43.209+0100 7f184de84700 10 log_channel(cluster)
>> update_config to_monitors: true to_syslog: false syslog_facility:  prio:
>> info to_graylog: false graylog_host: 127.0.0.1 graylog_port: 12201)
>>-50> 2024-06-22T05:41:43.209+0100 7f184de84700  4 mds.0.purge_queue
>> operator():  data pool 6 not found in OSDMap
>>-49> 2024-06-22T05:41:43.209+0100 7f184de84700  4 mds.0.purge_queue
>> operator():  data pool 3 not found in OSDMap
>>-48> 2024-06-22T05:41:43.209+0100 7f184de84700  4 mds.0.0
>> apply_blocklist: killed 0, blocklisted sessions (0 blocklist entries, 0)
>>-47> 2024-06-22

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-06-28 Thread Dhairya Parmar
On Fri, Jun 28, 2024 at 6:02 PM Ivan Clayson  wrote:

> Hi Dhairya,
>
> I would be more than happy to share our corrupted journal. Has the host
> key changed for drop.ceph.com? The fingerprint I'm being sent is
> 7T6dSMcUUa5refV147WEZR99UgW8Y1qYEXZr8ppvog4 which is different to the one
> in our /usr/share/ceph/known_hosts_drop.ceph.com.
>
Ah, strange. Let me get in touch with folks who might know about this, will
revert back to you ASAP

> Thank you for your advice as well. We've reset our MDS' journal and are
> currently in the process of a full filesystem scrub which understandably is
> taking quite a bit of time but seems to be progressing through the objects
> fine.
>
YAY!

> Thank you ever so much for all your help and please do feel free to follow
> up with us if you would like any further details about our crash!
>
Glad to hear it went well, this bug is being worked on with high priority
and once the patch is ready, it will be backported.

The root cause of this issue is the `nowsync` (async dirops) being enabled
by default with kclient [0]. This feature allows asynchronous creation and
deletion of files, optimizing performance by avoiding round-trip latency
for these system calls. However, in very rare cases (like yours :D), it can
affect the system's consistency and stability hence if this kind of
optimization is not a priority for your workload, I recommend turning it
off by switching the mount points to `wsync` and also set the MDS config
`mds_client_delegate_inos_pct` to `0` so that you don't end up in this
situation again (until the bug fix arrives :)).

[0]
https://github.com/ceph/ceph-client/commit/f7a67b463fb83a4b9b11ceaa8ec4950b8fb7f902

> Kindest regards,
>
> Ivan
> On 27/06/2024 12:39, Dhairya Parmar wrote:
>
> CAUTION: This email originated from outside of the LMB:
> *.-dpar...@redhat.com-.*
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> If you think this is a phishing email, please forward it to
> phish...@mrc-lmb.cam.ac.uk
>
>
> --
> Hi Ivan,
>
> The solution (which has been successful for us in the past) is to reset
> the journal. This would bring the fs back online and return the MDSes to a
> stable state, but some data would be lost—the data in the journal that
> hasn't been flushed to the backing store would be gone. Therefore, you
> should try to flush out as much journal data as possible before resetting
> the journal.
>
> Here are the steps for this entire process:
>
> 1) Bring the FS offline
> $ ceph fs fail 
>
> 2) Recover dentries from journal (run it with every MDS Rank)
> $ cephfs-journal-tool --rank=: event recover_dentries
> summary
>
> 3) Reset the journal (again with every MDS Rank)
> $ cephfs-journal-tool --rank=: journal reset
>
> 4) Bring the FS online
> $ cephfs fs set  joinable true
>
> 5) Restart the MDSes
>
> 6) Perform scrub to ensure consistency of fs
> $ ceph tell mds.:0 scrub start  [scrubopts] [tag]
> # you could try a recursive scrub maybe `ceph tell mds.:0 scrub
> start / recursive`
>
> Some important notes to keep in mind:
> * Recovering dentries will take time (generally, rank 0 is the most
> time-consuming, but the rest should be quick).
> * cephfs-journal-tool and metadata OSDs are bound to use a significant CPU
> percentage. This is because cephfs-journal-tool has to swig the journal
> data and flush it out to the backing store, which also makes the metadata
> operations go rampant, resulting in OSDs taking a significant percentage of
> CPU.
>
> Do let me know how this goes.
>
> On Thu, Jun 27, 2024 at 3:44 PM Ivan Clayson 
> wrote:
>
>> Hi Dhairya,
>>
>> We can induce the crash by simply restarting the MDS and the crash seems
>> to happen when an MDS goes from up:standby to up:replay. The MDS works
>> through a few files in the log before eventually crashing where I've
>> included the logs for this here (this is after I imported the backed up
>> journal which I hope was successful but please let me know if you suspect
>> it wasn't!):
>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s3.mds_restart_crash.log
>>
>> With respect to the client logs, are you referring to the clients who are
>> writing to the filesystem? We don't typically run them in any sort of debug
>> mode and we have quite a few machines running our backup system but we can
>> look an hour or so before the first MDS crash (though I don't know if this
>> is when the de-sync occurred). Here are some MDS logs with regards to the
>> initial crash on Saturday morning though which may be helpful:
>>
>>-59> 2024-0

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-08 Thread Dhairya Parmar
on_autoclose300
> max_file_size10993418240
> required_client_features{}
> last_failure0
> last_failure_osd_epoch494515
> compatcompat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
> data,8=no anchor table,9=file layout v2,10=snaprealm v2}
> max_mds1
> in0
> up{}
> failed
> damaged0
> stopped
> data_pools[6,3]
> metadata_pool2
> inline_datadisabled
> balancer
> standby_count_wanted1
>
>
> Kindest regards,
>
> Ivan
> On 28/06/2024 15:17, Dhairya Parmar wrote:
>
> CAUTION: This email originated from outside of the LMB:
> *.-dpar...@redhat.com-.*
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> If you think this is a phishing email, please forward it to
> phish...@mrc-lmb.cam.ac.uk
>
>
> --
>
>
> On Fri, Jun 28, 2024 at 6:02 PM Ivan Clayson 
> wrote:
>
>> Hi Dhairya,
>>
>> I would be more than happy to share our corrupted journal. Has the host
>> key changed for drop.ceph.com? The fingerprint I'm being sent is
>> 7T6dSMcUUa5refV147WEZR99UgW8Y1qYEXZr8ppvog4 which is different to the one
>> in our /usr/share/ceph/known_hosts_drop.ceph.com.
>>
> Ah, strange. Let me get in touch with folks who might know about this,
> will revert back to you ASAP
>
>> Thank you for your advice as well. We've reset our MDS' journal and are
>> currently in the process of a full filesystem scrub which understandably is
>> taking quite a bit of time but seems to be progressing through the objects
>> fine.
>>
> YAY!
>
>> Thank you ever so much for all your help and please do feel free to
>> follow up with us if you would like any further details about our crash!
>>
> Glad to hear it went well, this bug is being worked on with high priority
> and once the patch is ready, it will be backported.
>
> The root cause of this issue is the `nowsync` (async dirops) being enabled
> by default with kclient [0]. This feature allows asynchronous creation and
> deletion of files, optimizing performance by avoiding round-trip latency
> for these system calls. However, in very rare cases (like yours :D), it can
> affect the system's consistency and stability hence if this kind of
> optimization is not a priority for your workload, I recommend turning it
> off by switching the mount points to `wsync` and also set the MDS config
> `mds_client_delegate_inos_pct` to `0` so that you don't end up in this
> situation again (until the bug fix arrives :)).
>
> [0]
> https://github.com/ceph/ceph-client/commit/f7a67b463fb83a4b9b11ceaa8ec4950b8fb7f902
>
>> Kindest regards,
>>
>> Ivan
>> On 27/06/2024 12:39, Dhairya Parmar wrote:
>>
>> CAUTION: This email originated from outside of the LMB:
>> *.-dpar...@redhat.com-.*
>> Do not click links or open attachments unless you recognize the sender
>> and know the content is safe.
>> If you think this is a phishing email, please forward it to
>> phish...@mrc-lmb.cam.ac.uk
>>
>>
>> --
>> Hi Ivan,
>>
>> The solution (which has been successful for us in the past) is to reset
>> the journal. This would bring the fs back online and return the MDSes to a
>> stable state, but some data would be lost—the data in the journal that
>> hasn't been flushed to the backing store would be gone. Therefore, you
>> should try to flush out as much journal data as possible before resetting
>> the journal.
>>
>> Here are the steps for this entire process:
>>
>> 1) Bring the FS offline
>> $ ceph fs fail 
>>
>> 2) Recover dentries from journal (run it with every MDS Rank)
>> $ cephfs-journal-tool --rank=: event recover_dentries
>> summary
>>
>> 3) Reset the journal (again with every MDS Rank)
>> $ cephfs-journal-tool --rank=: journal reset
>>
>> 4) Bring the FS online
>> $ cephfs fs set  joinable true
>>
>> 5) Restart the MDSes
>>
>> 6) Perform scrub to ensure consistency of fs
>> $ ceph tell mds.:0 scrub start  [scrubopts] [tag]
>> # you could try a recursive scrub maybe `ceph tell mds.:0 scrub
>> start / recursive`
>>
>> Some important notes to keep in mind:
>> * Recovering dentries will take time (generally, rank 0 is the most
>> time-consuming, but the rest should be quick).
>> * cephfs-journal-tool and metadata OSDs are bound to use a significant
>> CPU percentage. This is because

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-09 Thread Dhairya Parmar
Hey Ivan,

This is a relatively new MDS crash, so this would require some
investigation but I was instructed to recommend disaster-recovery steps [0]
(except session reset) to you to get the FS up again.

This crash is being discussed on upstream CephFS slack channel [1] with @Venky
Shankar  and other CephFS devs. I'd encourage you to
join the conversation, we can discuss this in detail and maybe go through
the incident step by step which should help analyse the crash better.

[0]
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts
[1] https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1720443057919519

On Mon, Jul 8, 2024 at 7:37 PM Ivan Clayson  wrote:

> Hi Dhairya,
>
> Thank you ever so much for having another look at this so quickly. I don't
> think I have any logs similar to the ones you referenced this time as my
> MDSs don't seem to enter the replay stage when they crash (or at least
> don't now after I've thrown the logs away) but those errors do crop up in
> the prior logs I shared when the system first crashed.
>
> Kindest regards,
>
> Ivan
> On 08/07/2024 14:08, Dhairya Parmar wrote:
>
> CAUTION: This email originated from outside of the LMB:
> *.-dpar...@redhat.com-.*
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> If you think this is a phishing email, please forward it to
> phish...@mrc-lmb.cam.ac.uk
>
>
> --
> Ugh, something went horribly wrong. I've downloaded the MDS logs that
> contain assertion failure and it looks relevant to this [0]. Do you have
> client logs for this?
>
> The other log that you shared is being downloaded right now, once that's
> done and I'm done going through it, I'll update you.
>
> [0] https://tracker.ceph.com/issues/54546
>
> On Mon, Jul 8, 2024 at 4:49 PM Ivan Clayson 
> wrote:
>
>> Hi Dhairya,
>>
>> Sorry to resurrect this thread again, but we still unfortunately have an
>> issue with our filesystem after we attempted to write new backups to it.
>>
>> We finished the scrub of the filesystem on Friday and ran a repair scrub
>> on the 1 directory which had metadata damage. After doing so and rebooting,
>> the cluster reported no issues and data was accessible again.
>>
>> We re-started the backups to run over the weekend and unfortunately the
>> filesystem crashed again where the log of the failure is here:
>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s2.log-20240708.gz.
>> We ran the backups on kernel mounts of the filesystem without the nowsync
>> option this time to avoid the out-of-sync write problems..
>>
>> I've tried resetting the journal again after recovering the dentries but
>> unfortunately the filesystem is still in a failed state despite setting
>> joinable to true. The log of this crash is here:
>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s4.log-20240708
>> .
>>
>> I'm not sure how to proceed as I can't seem to get any MDS to take over
>> the first rank. I would like to do a scrub of the filesystem and preferably
>> overwrite the troublesome files with the originals on the live filesystem.
>> Do you have any advice on how to make the filesystem leave its failed
>> state? I have a backup of the journal before I reset it so I can roll back
>> if necessary.
>>
>> Here are some details about the filesystem at present:
>>
>> root@pebbles-s2 11:49 [~]: ceph -s; ceph fs status
>>   cluster:
>> id: e3f7535e-d35f-4a5d-88f0-a1e97abcd631
>> health: HEALTH_ERR
>> 1 filesystem is degraded
>> 1 large omap objects
>> 1 filesystem is offline
>> 1 mds daemon damaged
>>
>> nobackfill,norebalance,norecover,noscrub,nodeep-scrub,nosnaptrim flag(s) set
>> 1750 pgs not deep-scrubbed in time
>> 1612 pgs not scrubbed in time
>>
>>   services:
>> mon: 4 daemons, quorum pebbles-s1,pebbles-s2,pebbles-s3,pebbles-s4
>> (age 50m)
>> mgr: pebbles-s2(active, since 77m), standbys: pebbles-s1, pebbles-s3,
>> pebbles-s4
>> mds: 1/2 daemons up, 3 standby
>> osd: 1380 osds: 1380 up (since 76m), 1379 in (since 10d); 10 remapped
>> pgs
>>  flags
>> nobackfill,norebalance,norecover,noscrub,nodeep-scrub,nosnaptrim
>>
>>   data:
>> volumes: 1/2 healthy, 1 recovering; 1 damaged
>> pools:   7 pools, 2177 pgs
>> objects: 3.24G objects, 6.7 PiB
>> usage:   8.6 PiB used, 14 PiB / 23 PiB avail
>>

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-09 Thread Dhairya Parmar
On Tue, Jul 9, 2024 at 3:46 PM Ivan Clayson  wrote:

> Hi Dhairya,
>
> I would be more than happy to try and give as many details as possible but
> the slack channel is private and requires my email to have an account/
> access to it.
>
You're right in the context that you're required to have an account on
slack; it isn't private at all. The slack channel is open for all, (it's
upstream slack channel :D) it's just that you need to access it with an
email but again it's all your choice, not mandatory. I'd ask @Venky Shankar
 @Patrick Donnelly  to add their
input since they've been working on similar issues and can provide better
insights.

> Wouldn't taking the discussion about this error to a private channel also
> stop other users who experience this error from learning about how and why
> this happened as  well as possibly not be able to view the solution? Would
> it not be possible to discuss this more publicly for the benefit of the
> other users on the mailing list?
>
Kindest regards,
>
> Ivan
> On 09/07/2024 10:44, Dhairya Parmar wrote:
>
> CAUTION: This email originated from outside of the LMB:
> *.-dpar...@redhat.com-.*
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> If you think this is a phishing email, please forward it to
> phish...@mrc-lmb.cam.ac.uk
>
>
> --
> Hey Ivan,
>
> This is a relatively new MDS crash, so this would require some
> investigation but I was instructed to recommend disaster-recovery steps [0]
> (except session reset) to you to get the FS up again.
>
> This crash is being discussed on upstream CephFS slack channel [1] with @Venky
> Shankar  and other CephFS devs. I'd encourage you to
> join the conversation, we can discuss this in detail and maybe go through
> the incident step by step which should help analyse the crash better.
>
> [0]
> https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts
> [1] https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1720443057919519
>
> On Mon, Jul 8, 2024 at 7:37 PM Ivan Clayson 
> wrote:
>
>> Hi Dhairya,
>>
>> Thank you ever so much for having another look at this so quickly. I
>> don't think I have any logs similar to the ones you referenced this time as
>> my MDSs don't seem to enter the replay stage when they crash (or at least
>> don't now after I've thrown the logs away) but those errors do crop up in
>> the prior logs I shared when the system first crashed.
>>
>> Kindest regards,
>>
>> Ivan
>> On 08/07/2024 14:08, Dhairya Parmar wrote:
>>
>> CAUTION: This email originated from outside of the LMB:
>> *.-dpar...@redhat.com-.*
>> Do not click links or open attachments unless you recognize the sender
>> and know the content is safe.
>> If you think this is a phishing email, please forward it to
>> phish...@mrc-lmb.cam.ac.uk
>>
>>
>> --
>> Ugh, something went horribly wrong. I've downloaded the MDS logs that
>> contain assertion failure and it looks relevant to this [0]. Do you have
>> client logs for this?
>>
>> The other log that you shared is being downloaded right now, once that's
>> done and I'm done going through it, I'll update you.
>>
>> [0] https://tracker.ceph.com/issues/54546
>>
>> On Mon, Jul 8, 2024 at 4:49 PM Ivan Clayson 
>> wrote:
>>
>>> Hi Dhairya,
>>>
>>> Sorry to resurrect this thread again, but we still unfortunately have an
>>> issue with our filesystem after we attempted to write new backups to it.
>>>
>>> We finished the scrub of the filesystem on Friday and ran a repair scrub
>>> on the 1 directory which had metadata damage. After doing so and rebooting,
>>> the cluster reported no issues and data was accessible again.
>>>
>>> We re-started the backups to run over the weekend and unfortunately the
>>> filesystem crashed again where the log of the failure is here:
>>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s2.log-20240708.gz.
>>> We ran the backups on kernel mounts of the filesystem without the nowsync
>>> option this time to avoid the out-of-sync write problems..
>>>
>>> I've tried resetting the journal again after recovering the dentries but
>>> unfortunately the filesystem is still in a failed state despite setting
>>> joinable to true. The log of this crash is here:
>>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s4.log-20240708
>>> .
>>>
>&

[ceph-users] Re: Ceph MDS failing because of corrupted dentries in lost+found after update from 17.2.7 to 18.2.0

2024-08-02 Thread Dhairya Parmar
Hi Justin,

You should able to delete inodes from the lost+found dirs just by simply
`sudo rm -rf lost+found/`

What do you get when you try to delete? Do you get `EROFS`?

On Fri, Aug 2, 2024 at 8:42 AM Justin Lee  wrote:

> After we updated our ceph cluster from 17.2.7 to 18.2.0 the MDS kept being
> marked as damaged and stuck in up:standby with these errors in the log.
>
> debug-12> 2024-07-14T21:22:19.962+ 7f020cf3a700  1
> mds.0.cache.den(0x4 1000b3bcfea) loaded already corrupt dentry:
> [dentry #0x1/lost+found/1000b3bcfea [head,head] rep@0.0 NULL (dversion
> lock) pv=0 v=2 ino=(nil) state=0 0x558ca63b6500]
> debug-11> 2024-07-14T21:22:19.962+ 7f020cf3a700 10
> mds.0.cache.dir(0x4) go_bad_dentry 1000b3bcfea
>
> these log lines are repeated a bunch of times in our MDS logs, all on
> dentries that are within the lost+found directory. After reading this
> mailing
> list post , we
> tried setting ceph config set mds mds_go_bad_corrupt_dentry false. This
> seemed to successfully circumvent the issue, however, after a few seconds
> our MDS crashes. Our 3 MDS are now stuck in a cycle of active -> crash ->
> standby -> back to active. Because of this our actual ceph fs is extremely
> laggy.
>
> We read here  that
> reef now makes it possible to delete the lost+found directory, which might
> solve our problem, but it is inaccessible, to cd, ls, rm, etc.
>
> Has anyone seen this type of issue or know how to solve it? Thanks!
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph MDS failing because of corrupted dentries in lost+found after update from 17.2.7 to 18.2.0

2024-08-02 Thread Dhairya Parmar
So the mount hung? Can you see anything suspicious in the logs?

On Fri, Aug 2, 2024 at 7:17 PM Justin Lee  wrote:

> Hi Dhairya,
>
> Thanks for the response! We tried removing it as you suggested with `rm
> -rf` but the command just hangs indefinitely with no output. We are also
> unable to `ls lost_found`, or otherwise interact with the directory's
> contents.
>
> Best,
> Justin lee
>
> On Fri, Aug 2, 2024 at 8:24 AM Dhairya Parmar  wrote:
>
>> Hi Justin,
>>
>> You should able to delete inodes from the lost+found dirs just by simply
>> `sudo rm -rf lost+found/`
>>
>> What do you get when you try to delete? Do you get `EROFS`?
>>
>> On Fri, Aug 2, 2024 at 8:42 AM Justin Lee 
>> wrote:
>>
>>> After we updated our ceph cluster from 17.2.7 to 18.2.0 the MDS kept
>>> being
>>> marked as damaged and stuck in up:standby with these errors in the log.
>>>
>>> debug-12> 2024-07-14T21:22:19.962+ 7f020cf3a700  1
>>> mds.0.cache.den(0x4 1000b3bcfea) loaded already corrupt dentry:
>>> [dentry #0x1/lost+found/1000b3bcfea [head,head] rep@0.0 NULL (dversion
>>> lock) pv=0 v=2 ino=(nil) state=0 0x558ca63b6500]
>>> debug-11> 2024-07-14T21:22:19.962+ 7f020cf3a700 10
>>> mds.0.cache.dir(0x4) go_bad_dentry 1000b3bcfea
>>>
>>> these log lines are repeated a bunch of times in our MDS logs, all on
>>> dentries that are within the lost+found directory. After reading this
>>> mailing
>>> list post <https://www.spinics.net/lists/ceph-users/msg77325.html>, we
>>> tried setting ceph config set mds mds_go_bad_corrupt_dentry false. This
>>> seemed to successfully circumvent the issue, however, after a few seconds
>>> our MDS crashes. Our 3 MDS are now stuck in a cycle of active -> crash ->
>>> standby -> back to active. Because of this our actual ceph fs is
>>> extremely
>>> laggy.
>>>
>>> We read here <https://docs.ceph.com/en/latest/releases/reef/#cephfs>
>>> that
>>> reef now makes it possible to delete the lost+found directory, which
>>> might
>>> solve our problem, but it is inaccessible, to cd, ls, rm, etc.
>>>
>>> Has anyone seen this type of issue or know how to solve it? Thanks!
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph MDS randomly hangs when pg nums reduced

2024-02-25 Thread Dhairya Parmar
Hi,

First thing that comes to my mind looking at the I/O section is the
increased load of metadata distribution causing bottlenecks to the MDSs,
but you also have 2 active ranks which could avert the problem but then it
could be possible that certain files/dirs get to do unreasonably high
amount of metadata I/O which could've lead to uneven distribution of
workload among those active MDSs therefore making one of them hang at
times. If the MDS is hanging it seems like yours is a metadata intensive
environment(like CephFS) and reducing PGs might not be a good idea here.
You could also share MDS logs to see what's going on exactly and if there
is something that needs attention.

--
*Dhairya Parmar*

Associate Software Engineer, CephFS
IBM, Inc.


On Fri, Feb 23, 2024 at 8:27 PM  wrote:

> Hi,
>
> I have a CephFS cluster
> ```
> > ceph -s
>
>   cluster:
> id: e78987f2-ef1c-11ed-897d-cf8c255417f0
> health: HEALTH_WARN
> 85 pgs not deep-scrubbed in time
> 85 pgs not scrubbed in time
>
>   services:
> mon: 5 daemons, quorum
> datastone05,datastone06,datastone07,datastone10,datastone09 (age 2w)
> mgr: datastone05.iitngk(active, since 2w), standbys: datastone06.wjppdy
> mds: 2/2 daemons up, 1 hot standby
> osd: 22 osds: 22 up (since 3d), 22 in (since 4w); 8 remapped pgs
>
>   data:
> volumes: 1/1 healthy
> pools:   4 pools, 115 pgs
> objects: 49.08M objects, 16 TiB
> usage:   35 TiB used, 2.0 PiB / 2.1 PiB avail
> pgs: 3807933/98160678 objects misplaced (3.879%)
>  107 active+clean
>  8   active+remapped+backfilling
>
>   io:
> client:   224 MiB/s rd, 79 MiB/s wr, 844 op/s rd, 33 op/s wr
> recovery: 8.8 MiB/s, 24 objects/s
> ```
>
> The pool and pg status
>
> ```
> > ceph osd pool autoscale-status
>
> POOLSIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET
> RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
> cephfs.myfs.meta  28802M2.0 2119T  0.
> 4.0  16  on False
> cephfs.myfs.data  16743G2.0 2119T  0.0154
> 1.0  32  on False
> rbd  19 2.0 2119T  0.
> 1.0  32  on False
> .mgr   3840k2.0 2119T  0.
> 1.0   1  on False
> ```
>
> The pool detail
>
> ```
> > ceph osd pool ls detail
>
> pool 1 'cephfs.myfs.meta' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change
> 3639 lfor 0/3639/3637 flags hashpspool stripe_width 0 pg_autoscale_bias 4
> pg_num_min 16 recovery_priority 5 application cephfs
> pool 2 'cephfs.myfs.data' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 66 pgp_num 58 pg_num_target 32 pgp_num_target
> 32 autoscale_mode on last_change 5670 lfor 0/5661/5659 flags
> hashpspool,selfmanaged_snaps stripe_width 0 application cephfs
> pool 3 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash
> rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 486 lfor
> 0/486/478 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
> pool 4 '.mgr' replicated size 2 min_size 1 crush_rule 0 object_hash
> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 39 flags
> hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
> ```
>
> When pg numbers reduce, the mds server would have a chance to hang.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ambigous mds behind on trimming and slowops (ceph 17.2.5 and rook operator 1.10.8)

2024-02-26 Thread Dhairya Parmar
Hi,

May I know which version is being used in the cluster?

It was started after 2 hours of one of the active mds was crashed

Do we know the reason for the crash?

Please share more info, `ceph -s` and MDS logs should reveal more insights.

--
*Dhairya Parmar*

Associate Software Engineer, CephFS

IBM, Inc.



On Fri, Feb 23, 2024 at 8:13 PM  wrote:

> Team,
>
> Guys,
>
> We were facing cephFs volume mount issue and ceph status it was showing
>  mds slow requests
>  Mds behind on trimming
>
> After restarting mds pods it was resolved
> But wanted to know Root caus of this
> It was started after 2 hours of one of the active mds was crashed
> So does that an active mds crash can cause this issue ?
>
>
> Please provide your inputs anyone
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Clients failing to advance oldest client?

2024-03-25 Thread Dhairya Parmar
I think this bug has already been worked on in
https://tracker.ceph.com/issues/63364, can you tell which version you're on?

--
*Dhairya Parmar*

Associate Software Engineer, CephFS

IBM, Inc.


On Tue, Mar 26, 2024 at 2:32 AM Erich Weiler  wrote:

> Hi Y'all,
>
> I'm seeing this warning via 'ceph -s' (this is on Reef):
>
> # ceph -s
>cluster:
>  id: 58bde08a-d7ed-11ee-9098-506b4b4da440
>  health: HEALTH_WARN
>  3 clients failing to advance oldest client/flush tid
>  1 MDSs report slow requests
>  1 MDSs behind on trimming
>
>services:
>  mon: 5 daemons, quorum
> pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 3d)
>  mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz
>  mds: 1/1 daemons up, 1 standby
>  osd: 46 osds: 46 up (since 3d), 46 in (since 2w)
>
>data:
>  volumes: 1/1 healthy
>  pools:   4 pools, 1313 pgs
>  objects: 258.13M objects, 454 TiB
>  usage:   688 TiB used, 441 TiB / 1.1 PiB avail
>  pgs: 1303 active+clean
>   8active+clean+scrubbing
>   2active+clean+scrubbing+deep
>
>io:
>  client:   131 MiB/s rd, 111 MiB/s wr, 41 op/s rd, 613 op/s wr
>
> I googled around and looked at the docs and it seems like this isn't a
> critical problem, but I couldn't find a clear path to resolution.  Does
> anyone have any advice on what I can do to resolve the health issues up
> top?
>
> My CephFS filesystem is incredibly busy so I have a feeling that has
> some impact here, but not 100% sure...
>
> Thanks as always for the help!
>
> cheers,
> erich
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Linux Laptop Losing CephFS mounts on Sleep/Hibernate

2024-03-28 Thread Dhairya Parmar
So the client session was dropped when the laptop went into sleep mode,
maybe what could've happened is that since the client is silent; it failed
to renew its caps in time and hit `session_autoclose` (defaults to 300
secs) and thus got evicted. As Kotresh mentioned, client logs would reveal
better insights.


*Dhairya Parmar*

Associate Software Engineer, CephFS

<https://www.redhat.com/>IBM, Inc.


On Thu, Mar 28, 2024 at 4:42 PM Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> I think the client should reconnect when it's out of sleep. Could you
> please share the client logs to check what's happening?
>
>
>
> On Tue, Mar 26, 2024 at 4:16 AM  wrote:
>
> > Hi All,
> >
> > So I've got a Ceph Reef Cluster (latest version) with a CephFS system set
> > up with a number of directories on it.
> >
> > On a Laptop (running Rocky Linux (latest version)) I've used fstab to
> > mount a number of those directories - all good, everything works, happy
> > happy joy joy! :-)
> >
> > However, when the laptop goes into sleep or hibernate mode (ie when I
> > close the lid) and then bring it back out of sleep/hibernate (ie open the
> > lid) the CephFS mounts are "not present". The only way to get them back
> is
> > to run `mount -a` as either root or as sudo. This, as I'm sure you'll
> > agree, is less than ideal - especially as this is a pilot project for
> > non-admin users (ie they won't have access to the root account or sudo on
> > their own (corporate) laptops).
> >
> > So, my question to the combined wisdom of the Community is what's the
> best
> > way to resolve this issue?
> >
> > I've looked at autofs, and even tried (half-heartedly - it was late, and
> I
> > wanted to go home  :-) ) to get this running, but I'm note sure if this
> is
> > the best way to resolve things.
> >
> > All help and advice on this greatly appreciated - thank in advance
> >
> > Cheers
> >
> > Dulux-Oz
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to recover from an MDs rank in state 'failed'

2024-05-30 Thread Dhairya Parmar
Hi Noe,

If the MDS has failed and you're sure of the fact that there are no pending
tasks or sessions associated with the failed MDS, you can try to make use
of `ceph mds rmfailed` but beware this MDS is really doing nothing and
doesn't link to any file system otherwise things can go wrong and can lead
to an inaccessible file system, more info regarding the command can be
found at [0] and [1].

[0] https://docs.ceph.com/en/quincy/man/8/ceph/
[1] https://docs.ceph.com/en/latest/cephfs/administration/#advanced
--
*Dhairya Parmar*

Associate Software Engineer, CephFS

<https://www.redhat.com/>IBM, Inc.

On Wed, May 29, 2024 at 4:24 PM Noe P.  wrote:

> Hi,
>
> after our desaster yesterday, it seems that we got our MONs back.
> One of the filesystems, however, seems in a strange state:
>
>   % ceph fs status
>
>   
>   fs_cluster - 782 clients
>   ==
>   RANK  STATE MDSACTIVITY DNSINOS   DIRS   CAPS
>0active  cephmd6a  Reqs:5 /s  13.2M  13.2M  1425k  51.4k
>1failed
> POOL TYPE USED  AVAIL
>   fs_cluster_meta  metadata  3594G  53.5T
>   fs_cluster_datadata 421T  53.5T
>   
>   STANDBY MDS
> cephmd6b
> cephmd4b
>   MDS version: ceph version 17.2.7
> (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
>
>
>   % ceph fs dump
>   
>   Filesystem 'fs_cluster' (3)
>   fs_name fs_cluster
>   epoch   3068261
>   flags   12 joinable allow_snaps allow_multimds_snaps
>   created 2022-08-26T15:55:07.186477+0200
>   modified2024-05-29T12:43:30.606431+0200
>   tableserver 0
>   root0
>   session_timeout 60
>   session_autoclose   300
>   max_file_size   4398046511104
>   required_client_features{}
>   last_failure0
>   last_failure_osd_epoch  1777109
>   compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
> data,8=no anchor table,9=file layout v2,10=snaprealm v2}
>   max_mds 2
>   in  0,1
>   up  {0=911794623}
>   failed
>   damaged
>   stopped 2,3
>   data_pools  [32]
>   metadata_pool   33
>   inline_data disabled
>   balancer
>   standby_count_wanted1
>   [mds.cephmd6a{0:911794623} state up:active seq 44701 addr [v2:
> 10.13.5.6:6800/189084355,v1:10.13.5.6:6801/189084355] compat
> {c=[1],r=[1],i=[7ff]}]
>
>
> We would like to get rid of the failed rank 1 (without crashing the MONs)
> and have a 2nd MD from the standbys step in .
>
> Anyone have an idea how to do this ?
> I'm a bit reluctant to try 'ceph mds rmfailed', as this seems to have
> triggered the MONs to crash.
>
> Regards,
>   Noe
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs: num_stray growing without bounds (octopus)

2022-08-05 Thread Dhairya Parmar
On Fri, Aug 5, 2022 at 9:12 PM Frank Schilder  wrote:

> Hi Dhairya,
>
> thanks to pointing me to this tracker. I can try an MDS fail to see if it
> clears the stray buckets or if there are still left-overs. Before doing so:
>
> > Thanks for the logs though. It will help me while writing the patch.
>
> I couldn't see if you were asking for logs. Do you want me to collect
> something or do you mean the session logs included in my e-mail. Also, is
> it on purpose to leave out the ceph-user list in CC (e-mail address)?


Nah, the session logs included are good enough. I missed CCing ceph-users.
Done now.

For my urgent needs, failing the MDS periodically during the benchmark
> might be an interesting addition any ways - if this helps with the stray
> count.
>

Yeah it might be helpful for now. Do let me know if that works for you.


> Thanks for your fast reply and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Dhairya Parmar 
> Sent: 05 August 2022 16:10
> To: Frank Schilder
> Subject: Re: [ceph-users] cephfs: num_stray growing without bounds
> (octopus)
>
> Hi Frank,
>
> This seems to be related to a tracker<
> https://tracker.ceph.com/issues/53724> that I'm working on. I've got some
> rough ideas in my mind, a simple solution would be to run a single thread
> that would regularly evaluate strays (maybe every 1 or 2 minutes?) or a
> much better approach would be to evaluate strays whenever snapshot removal
> takes place but it's not that easy as it looks, therefore I'm currently
> going through the code to understand it's whole process(snapshot removal),
> I'll try my best to come up with something as soon as possible. Thanks for
> the logs though. It will help me while writing the patch.
>
> Regards,
> Dhairya
>
> On Fri, Aug 5, 2022 at 6:55 PM Frank Schilder  fr...@dtu.dk>> wrote:
> Dear Gregory, Dan and Patrick,
>
> this is a reply to an older thread about num_stray growing without limits
> (thread
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/2NT55RUMD33KLGQCDZ74WINPPQ6WN6CW,
> message
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/FYEN2W4HGMC6CGOCS2BS4PQDRPGUSNOO/).
> I'm opening a new thread for a better matching subject line.
>
> I now started testing octopus and am afraid I came across a very serious
> issue with unlimited growth of stray buckets. I'm running a test that puts
> constant load on a file system by adding a blob of data, creating a
> snapshot, deleting a blob of data and deleting a snapshot in a cyclic
> process. A blob of data contains about 330K hard links to make it more
> interesting.
>
> The benchmark crashed after half a day in rm with "no space left on
> device", which was due to the stray buckets being too full (old thread).
> OK, so I increased mds_bal_fragment_size_max and cleaned out all data to
> start fresh. However, this happened:
>
> [root@rit-tceph ~]# df -h /mnt/adm/cephfs
> Filesystem Size  Used Avail Use% Mounted on
> 10.41.24.13,10.41.24.14,10.41.24.15:/  2.5T   35G  2.5T   2%
> /mnt/adm/cephfs
>
> [root@rit-tceph ~]# find /mnt/adm/cephfs/
> /mnt/adm/cephfs/
> /mnt/adm/cephfs/data
> /mnt/adm/cephfs/data/blobs
>
> [root@rit-tceph ~]# find /mnt/adm/cephfs/.snap
> /mnt/adm/cephfs/.snap
>
> [root@rit-tceph ~]# find /mnt/adm/cephfs/data/.snap
> /mnt/adm/cephfs/data/.snap
>
> [root@rit-tceph ~]# find /mnt/adm/cephfs/data/blobs/.snap
> /mnt/adm/cephfs/data/blobs/.snap
>
> All snapshots were taken in /mnt/adm/cephfs/.snap. Snaptrimming finished a
> long time ago. Now look at this:
>
> [root@rit-tceph ~]# ssh "tceph-03" "ceph daemon mds.tceph-03 perf dump |
> jq .mds_cache.num_strays"
> 962562
>
> What?
>
> There is data left over in the fs pools and the stray buckets are cloaked
> up.
>
> [root@rit-tceph ~]# ceph df
> --- RAW STORAGE ---
> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> hdd2.4 TiB  2.4 TiB  1.4 GiB35 GiB   1.38
> TOTAL  2.4 TiB  2.4 TiB  1.4 GiB35 GiB   1.38
>
> --- POOLS ---
> POOL   ID  PGS  STORED   OBJECTS  USED %USED  MAX AVAIL
> device_health_metrics   11  170 KiB9  509 KiB  0781 GiB
> fs-meta12   64  2.2 GiB  160.25k  6.5 GiB   0.28781 GiB
> fs-meta23  128  0 B  802.40k  0 B  0781 GiB
> fs-data 4  128  0 B  802.40k  0 B  01.5 TiB
>
> There is either a very serious bug with cleaning up stray entries when
> their last snap

[ceph-users] Re: cephfs: num_stray growing without bounds (octopus)

2022-08-08 Thread Dhairya Parmar
On Sun, Aug 7, 2022 at 6:45 PM Frank Schilder  wrote:

> Hi Dhairya,
>
> I have some new results (below) and also some wishes as an operator that
> might even help with the decision you mentioned in your e-mails:
>
> - Please implement both ways, a possibility to trigger an evaluation
> manually via a "ceph tell|daemon" command and a periodic evaluation.
> - For the periodic evaluation, please introduce a tuning parameter, for
> example, mds_gc_interval (in seconds). If set to 0, disable periodic
> evaluation.
>
Actually these are pretty good ideas! It will definitely be better to have
it both ways. I'll bring this up in our next meeting.

> Reasons:
>
> - On most production systems, doing this once per 24 hours seems enough
> (my benchmark is very special, it needs to delete aggressively). The
> default for mds_gc_interval could therefore be 86400 (24h).
>
I was thinking about a way more aggressive number of a minute or two but If
your tests say that 86400 might be a possible value then It might be very
good performance wise as well. I have discussed this with Greg before and
personally have been brainstorming about a number to come up with and this
might actually be it(or close to it), anyways it would help for sure.
Thanks.

> - On my production system I would probably disable periodic evaluation and
> rather do a single shot manual evaluation some time after snapshot removal
> but before users start working to synchronise with snapshot removal (where
> the "lost" entries are created).
>
I was also thinking about a solution where we evaluate strays as soon as we
delete a snap. What do you think about this on production clusters?

> This follows a general software design principle: Whenever there is a
> choice like this to take, it is best to try to implement an API that can
> support all use cases and to leave the choice of what fits best for their
> workloads to the operators. Try not to restrict operators by hard-coding
> decisions. Rather pick reasonable defaults but also empower operators to
> tune things to special needs. One-size-fits-all never works.
>
+1

> Now to the results: Indeed, a restart triggers complete removal of all
> orphaned stray entries:
>
> [root@rit-tceph bench]# ./mds-stray-num
> 962562
> [root@rit-tceph bench]# ceph mds fail 0
> failed mds gid 371425
> [root@rit-tceph bench]# ./mds-stray-num
> 767329
> [root@rit-tceph bench]# ./mds-stray-num
> 766777
> [root@rit-tceph bench]# ./mds-stray-num
> 572430
> [root@rit-tceph bench]# ./mds-stray-num
> 199172
> [root@rit-tceph bench]# ./mds-stray-num
> 0
>
Awesome, so far it looks like this might be helpful until we come up with a
robust solution.

> # ceph df
> --- RAW STORAGE ---
> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> hdd2.4 TiB  2.4 TiB  896 MiB25 GiB   0.99
> TOTAL  2.4 TiB  2.4 TiB  896 MiB25 GiB   0.99
>
> --- POOLS ---
> POOL   ID  PGS  STORED   OBJECTS  USED %USED  MAX AVAIL
> device_health_metrics   11  205 KiB9  616 KiB  0785 GiB
> fs-meta12   64  684 MiB   44  2.0 GiB   0.09785 GiB
> fs-meta23  128  0 B0  0 B  0785 GiB
> fs-data 4  128  0 B0  0 B  01.5 TiB
>
> Good to see that the bookkeeping didn't loose track of anything. I will
> add a periodic mds fail to my benchmark and report back how all of this
> works under heavy load.
>
Good to hear it keeps the track. Yeah, that report will be very helpful.
Thanks in advance!

>
> Best regards and thanks for our help!
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Dhairya Parmar 
> Sent: 05 August 2022 22:53:09
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] cephfs: num_stray growing without bounds
> (octopus)
>
> On Fri, Aug 5, 2022 at 9:12 PM Frank Schilder  fr...@dtu.dk>> wrote:
> Hi Dhairya,
>
> thanks to pointing me to this tracker. I can try an MDS fail to see if it
> clears the stray buckets or if there are still left-overs. Before doing so:
>
> > Thanks for the logs though. It will help me while writing the patch.
>
> I couldn't see if you were asking for logs. Do you want me to collect
> something or do you mean the session logs included in my e-mail. Also, is
> it on purpose to leave out the ceph-user list in CC (e-mail address)?
>
> Nah, the session logs included are good enough. I missed CCing ceph-users.
> Done now.
>
> For my urgent needs, failing the MDS periodically during the benchmark
> might be an interesting addition any ways - if this helps w

[ceph-users] Re: Multi-active MDS cache pressure

2022-08-10 Thread Dhairya Parmar
;>>> "entity": {
> >>>>   "name": {
> >>>> "type": "client",
> >>>> "num": 2728101146
> >>>>   },
> >>>> [...]
> >>>> "nonce": 1105499797
> >>>>   }
> >>>> },
> >>>> "state": "open",
> >>>> "num_leases": 0,
> >>>> "num_caps": 16158,
> >>>> "request_load_avg": 0,
> >>>> "uptime": 1118066.210318422,
> >>>> "requests_in_flight": 0,
> >>>> "completed_requests": [],
> >>>> "reconnecting": false,
> >>>> "recall_caps": {
> >>>>   "value": 788916.8276369586,
> >>>>   "halflife": 60
> >>>> },
> >>>> "release_caps": {
> >>>>   "value": 8.814981576458962,
> >>>>   "halflife": 60
> >>>> },
> >>>> "recall_caps_throttle": {
> >>>>   "value": 27379.27162576508,
> >>>>   "halflife": 1.5
> >>>> },
> >>>> "recall_caps_throttle2o": {
> >>>>   "value": 5382.261925615086,
> >>>>   "halflife": 0.5
> >>>> },
> >>>> "session_cache_liveness": {
> >>>>   "value": 12.91841737465921,
> >>>>   "halflife": 300
> >>>> },
> >>>> "cap_acquisition": {
> >>>>   "value": 0,
> >>>>   "halflife": 10
> >>>> },
> >>>> [...]
> >>>> "used_inos": [],
> >>>> "client_metadata": {
> >>>>   "features": "0x3bff",
> >>>>   "entity_id": "cephfs_client",
> >>>>
> >>>>
> >>>> # ceph fs status
> >>>>
> >>>> cephfs - 25 clients
> >>>> ==
> >>>> +--+++---+---+---+
> >>>> | Rank | State  |  MDS   |Activity   |  dns  |  inos |
> >>>> +--+++---+---+---+
> >>>> |  0   | active | stmailmds01d-3 | Reqs:   89 /s |  375k |  371k |
> >>>> |  1   | active | stmailmds01d-4 | Reqs:   64 /s |  386k |  383k |
> >>>> |  2   | active | stmailmds01a-3 | Reqs:9 /s |  403k |  399k |
> >>>> |  3   | active | stmailmds01a-8 | Reqs:   23 /s |  393k |  390k |
> >>>> |  4   | active | stmailmds01a-2 | Reqs:   36 /s |  391k |  387k |
> >>>> |  5   | active | stmailmds01a-4 | Reqs:   57 /s |  394k |  390k |
> >>>> |  6   | active | stmailmds01a-6 | Reqs:   50 /s |  395k |  391k |
> >>>> |  7   | active | stmailmds01d-5 | Reqs:   37 /s |  384k |  380k |
> >>>> |  8   | active | stmailmds01a-5 | Reqs:   39 /s |  397k |  394k |
> >>>> |  9   | active |  stmailmds01a  | Reqs:   23 /s |  400k |  396k |
> >>>> |  10  | active | stmailmds01d-8 | Reqs:   74 /s |  402k |  399k |
> >>>> |  11  | active | stmailmds01d-6 | Reqs:   37 /s |  399k |  395k |
> >>>> |  12  | active |  stmailmds01d  | Reqs:   36 /s |  394k |  390k |
> >>>> |  13  | active | stmailmds01d-7 | Reqs:   80 /s |  397k |  393k |
> >>>> |  14  | active | stmailmds01d-2 | Reqs:   56 /s |  414k |  410k |
> >>>> |  15  | active | stmailmds01a-7 | Reqs:   25 /s |  390k |  387k |
> >>>> +--+++---+---+---+
> >>>> +-+--+---+---+
> >>>> |   Pool  |   type   |  used | avail |
> >>>> +-+--+---+---+
> >>>> | cephfs_metadata | metadata | 25.4G | 16.1T |
> >>>> |   cephfs_data   |   data   | 2078G | 16.1T |
> >>>> +-+--+---+---+
> >>>> ++
> >>>> |  Standby MDS   |
> >>>> ++
> >>>> | stmailmds01b-5 |
> >>>> | stmailmds01b-2 |
> >>>> | stmailmds01b-3 |
> >>>> |  stmailmds01b  |
> >>>> | stmailmds01b-7 |
> >>>> | stmailmds01b-8 |
> >>>> | stmailmds01b-6 |
> >>>> | stmailmds01b-4 |
> >>>> ++
> >>>> MDS version: ceph version 14.2.22-404-gf74e15c2e55
> >>>> (f74e15c2e552b3359f5a51482dfd8b049e262743) nautilus (stable)
> >>>> ---snip---
> >>>
> >>>
> >>>
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
*Dhairya Parmar*

He/Him/His

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dpar...@redhat.com
<https://www.redhat.com/>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crashes after evicting client session

2022-09-22 Thread Dhairya Parmar
- What operation was being carried out which led to client eviction?
- Can you share MDS side logs when that event was being carried out?

On Thu, Sep 22, 2022 at 5:12 PM E Taka <0eta...@gmail.com> wrote:

> Ceph 17.2.3 (dockerized in Ubuntu 20.04)
>
> The subject says it. The MDS process always crashes after evicting. ceph -w
> shows:
>
> 2022-09-22T13:26:23.305527+0200 mds.ksz-cephfs2.ceph00.kqjdwe [INF]
> Evicting (and blocklisting) client session 5181680 (
> 10.149.12.21:0/3369570791)
> 2022-09-22T13:26:35.729317+0200 mon.ceph00 [INF] daemon
> mds.ksz-cephfs2.ceph03.vsyrbk restarted
> 2022-09-22T13:26:36.039678+0200 mon.ceph00 [INF] daemon
> mds.ksz-cephfs2.ceph01.xybiqv restarted
> 2022-09-22T13:29:21.000392+0200 mds.ksz-cephfs2.ceph04.ekmqio [INF]
> Evicting (and blocklisting) client session 5249349 (
> 10.149.12.22:0/2459302619)
> 2022-09-22T13:29:32.069656+0200 mon.ceph00 [INF] daemon
> mds.ksz-cephfs2.ceph01.xybiqv restarted
> 2022-09-22T13:30:00.000101+0200 mon.ceph00 [INF] overall HEALTH_OK
> 2022-09-22T13:30:20.710271+0200 mon.ceph00 [WRN] Health check failed: 1
> daemons have recently crashed (RECENT_CRASH)
>
> The crash info of the crashed MDS is:
> # ceph crash info
> 2022-09-22T11:26:24.013274Z_b005f3fc-7704-4cfc-96c5-f2a9c993f166
> {
>"assert_condition": "!mds->is_any_replay()",
>"assert_file":
>
> "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.3/rpm/el8/BUILD/ceph-17.2.3/src/mds/MDLog.cc",
>
>"assert_func": "void MDLog::_submit_entry(LogEvent*,
> MDSLogContextBase*)",
>"assert_line": 283,
>"assert_msg":
>
> "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.3/rpm/el8/BUILD/ceph-17.2.3/src/mds/MDLog.cc:
> In function 'void MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)'
> thread 7f76fa8f6700 time
>
> 2022-09-22T11:26:23.992050+\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.3/rpm/el8/BUILD/ceph-17.2.3/src/mds/MDLog.cc:
> 283: FAILED ceph_assert(!mds->is_any_replay())\n",
>"assert_thread_name": "ms_dispatch",
>"backtrace": [
>"/lib64/libpthread.so.0(+0x12ce0) [0x7f770231bce0]",
>"gsignal()",
>"abort()",
>"(ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x1b0) [0x7f770333bcd2]",
>"/usr/lib64/ceph/libceph-common.so.2(+0x283e95) [0x7f770333be95]",
>"(MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x3f)
> [0x55991905efdf]",
>"(Server::journal_close_session(Session*, int, Context*)+0x78c)
> [0x559918d7d63c]",
>"(Server::kill_session(Session*, Context*)+0x212) [0x559918d7dd92]",
>"(Server::apply_blocklist()+0x10d) [0x559918d7e04d]",
>"(MDSRank::apply_blocklist(std::set std::less, std::allocator > const&, unsigned
> int)+0x34) [0x559918d39d74]",
>"(MDSRankDispatcher::handle_osd_map()+0xf6) [0x559918d3a0b6]",
>"(MDSDaemon::handle_core_message(boost::intrusive_ptr
> const&)+0x39b) [0x559918d2330b]",
>"(MDSDaemon::ms_dispatch2(boost::intrusive_ptr
> const&)+0xc3) [0x559918d23cc3]",
>"(DispatchQueue::entry()+0x14fa) [0x7f77035c240a]",
>"(DispatchQueue::DispatchThread::entry()+0x11) [0x7f7703679481]",
>"/lib64/libpthread.so.0(+0x81ca) [0x7f77023111ca]",
>"clone()"
>],
>"ceph_version": "17.2.3",
>"crash_id":
> "2022-09-22T11:26:24.013274Z_b005f3fc-7704-4cfc-96c5-f2a9c993f166",
>"entity_name": "mds.ksz-cephfs2.ceph03.vsyrbk",
>"os_id": "centos",
>"os_name": "CentOS Stream",
>"os_version": "8",
>"os_version_id": "8",
>"process_name": "ceph-mds",
>"stack_sig":
> "b75e46941b5f6b7c05a037f9af5d42bb19d82ab7fc6a3c168533fc31a42b4de8",
>"timestamp": "2022-09-22T11:26:24.013274Z",
>"utsname_hostname": "ceph03",
>"utsname_machine": "x86_64",
>"utsname_release": "5.4.0-125-generic",
>"utsname_sysname": "Linux",
>"utsname_version": "#141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022"
> }
>
> (Don't be confused by the time information, "ceph -w" is UTC+2, "crash
> info" is UTC)
>
> Should I report this a bug or did I miss something which caused the error?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 
*Dhairya Parmar*

He/Him/His

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dpar...@redhat.com
<https://www.redhat.com/>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: HA cluster

2022-09-26 Thread Dhairya Parmar
You should give this doc
https://docs.ceph.com/en/quincy/rados/configuration/mon-config-ref/#monitor-quorum
a read. Will help you understand and set up the HA cluster much better.
Long story short, you would need at least 3 MONs to achieve HA because of
the monitor quoram.

On Sun, Sep 25, 2022 at 7:51 PM Murilo Morais  wrote:

> Hello guys.
>
> I have a question regarding HA.
>
> I set up two hosts with cephadm, created the pools and set up an NFS,
> everything working so far. I turned off the second Host and the first one
> continued to work without problems, but if I turn off the first, the second
> is totally irresponsible. What could be causing this?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 
*Dhairya Parmar*

He/Him/His

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dpar...@redhat.com
<https://www.redhat.com/>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crashes after evicting client session

2022-09-26 Thread Dhairya Parmar
Patch for this has already been merged and backported to quincy as well. It
will be there in the next Quincy release.

On Thu, Sep 22, 2022 at 5:12 PM E Taka <0eta...@gmail.com> wrote:

> Ceph 17.2.3 (dockerized in Ubuntu 20.04)
>
> The subject says it. The MDS process always crashes after evicting. ceph -w
> shows:
>
> 2022-09-22T13:26:23.305527+0200 mds.ksz-cephfs2.ceph00.kqjdwe [INF]
> Evicting (and blocklisting) client session 5181680 (
> 10.149.12.21:0/3369570791)
> 2022-09-22T13:26:35.729317+0200 mon.ceph00 [INF] daemon
> mds.ksz-cephfs2.ceph03.vsyrbk restarted
> 2022-09-22T13:26:36.039678+0200 mon.ceph00 [INF] daemon
> mds.ksz-cephfs2.ceph01.xybiqv restarted
> 2022-09-22T13:29:21.000392+0200 mds.ksz-cephfs2.ceph04.ekmqio [INF]
> Evicting (and blocklisting) client session 5249349 (
> 10.149.12.22:0/2459302619)
> 2022-09-22T13:29:32.069656+0200 mon.ceph00 [INF] daemon
> mds.ksz-cephfs2.ceph01.xybiqv restarted
> 2022-09-22T13:30:00.000101+0200 mon.ceph00 [INF] overall HEALTH_OK
> 2022-09-22T13:30:20.710271+0200 mon.ceph00 [WRN] Health check failed: 1
> daemons have recently crashed (RECENT_CRASH)
>
> The crash info of the crashed MDS is:
> # ceph crash info
> 2022-09-22T11:26:24.013274Z_b005f3fc-7704-4cfc-96c5-f2a9c993f166
> {
>"assert_condition": "!mds->is_any_replay()",
>"assert_file":
>
> "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.3/rpm/el8/BUILD/ceph-17.2.3/src/mds/MDLog.cc",
>
>"assert_func": "void MDLog::_submit_entry(LogEvent*,
> MDSLogContextBase*)",
>"assert_line": 283,
>"assert_msg":
>
> "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.3/rpm/el8/BUILD/ceph-17.2.3/src/mds/MDLog.cc:
> In function 'void MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)'
> thread 7f76fa8f6700 time
>
> 2022-09-22T11:26:23.992050+\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.3/rpm/el8/BUILD/ceph-17.2.3/src/mds/MDLog.cc:
> 283: FAILED ceph_assert(!mds->is_any_replay())\n",
>"assert_thread_name": "ms_dispatch",
>"backtrace": [
>"/lib64/libpthread.so.0(+0x12ce0) [0x7f770231bce0]",
>"gsignal()",
>"abort()",
>"(ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x1b0) [0x7f770333bcd2]",
>"/usr/lib64/ceph/libceph-common.so.2(+0x283e95) [0x7f770333be95]",
>"(MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x3f)
> [0x55991905efdf]",
>"(Server::journal_close_session(Session*, int, Context*)+0x78c)
> [0x559918d7d63c]",
>"(Server::kill_session(Session*, Context*)+0x212) [0x559918d7dd92]",
>"(Server::apply_blocklist()+0x10d) [0x559918d7e04d]",
>"(MDSRank::apply_blocklist(std::set std::less, std::allocator > const&, unsigned
> int)+0x34) [0x559918d39d74]",
>"(MDSRankDispatcher::handle_osd_map()+0xf6) [0x559918d3a0b6]",
>"(MDSDaemon::handle_core_message(boost::intrusive_ptr
> const&)+0x39b) [0x559918d2330b]",
>"(MDSDaemon::ms_dispatch2(boost::intrusive_ptr
> const&)+0xc3) [0x559918d23cc3]",
>"(DispatchQueue::entry()+0x14fa) [0x7f77035c240a]",
>"(DispatchQueue::DispatchThread::entry()+0x11) [0x7f7703679481]",
>"/lib64/libpthread.so.0(+0x81ca) [0x7f77023111ca]",
>"clone()"
>],
>"ceph_version": "17.2.3",
>"crash_id":
> "2022-09-22T11:26:24.013274Z_b005f3fc-7704-4cfc-96c5-f2a9c993f166",
>"entity_name": "mds.ksz-cephfs2.ceph03.vsyrbk",
>"os_id": "centos",
>"os_name": "CentOS Stream",
>"os_version": "8",
>"os_version_id": "8",
>"process_name": "ceph-mds",
>"stack_sig":
> "b75e46941b5f6b7c05a037f9af5d42bb19d82ab7fc6a3c168533fc31a42b4de8",
>"timestamp": "2022-09-22T11:26:24.013274Z",
>"utsname_hostname": "ceph03",
>"utsname_machine": "x86_64",
>"utsname_release": "5.4.0-125-generic",
>"utsname_sysname": "Linux",
>"utsname_version": "#141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022"
> }
>
> (Don't be confused by the time information, "ceph -w" is UTC+2, "crash
> info" is UTC)
>
> Should I report this a bug or did I miss something which caused the error?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 
*Dhairya Parmar*

He/Him/His

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dpar...@redhat.com
<https://www.redhat.com/>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Cluster clone

2022-09-26 Thread Dhairya Parmar
Can you provide some more information on this? Can you show exactly what
error you get while trying to start the cluster?

On Mon, Sep 26, 2022 at 7:19 PM Ahmed Bessaidi 
wrote:

> Hello,
> I am working on cloning an existent Ceph Cluster (VMware).
> I fixed the IP/hostname part, but I cannot get the cloned cluster to start
> (Monitors issues).
> Any ideas ?
>
>
>
>
> Best Regards,
> Ahmed.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 
*Dhairya Parmar*

He/Him/His

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dpar...@redhat.com
<https://www.redhat.com/>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osds not bootstrapping: monclient: wait_auth_rotating timed out

2022-09-26 Thread Dhairya Parmar
Looking at the shared tracker, I can see people talking about restarting
primary mon/mgr
and getting this fixed at note-4
<https://tracker.ceph.com/issues/17170#note-4> and note-8
<https://tracker.ceph.com/issues/17170#note-8>. Did you try that out?

On Tue, Sep 27, 2022 at 12:44 AM Wyll Ingersoll <
wyllys.ingers...@keepertech.com> wrote:

> Ceph Pacific (16.2.9) on a large cluster.  Approximately 60 (out of 700)
> osds fail to start and show an error:
>
> monclient: wait_auth_rotating timed out after 300
>
> We modified the "rotating_keys_bootstrap_timeout" from 30 to 300, but they
> still fail.  All nodes are time-synced with NTP and the skew has been
> verified to be < 1.0 seconds.
> It looks a lot like this bug: https://tracker.ceph.com/issues/17170
> which does not appear to be resolved yet.
>
> Any other suggestions on how to get these OSDs to sync up with the cluster?
>
>
> thanks!
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 
*Dhairya Parmar*

He/Him/His

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dpar...@redhat.com
<https://www.redhat.com/>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: why rgw generates large quantities orphan objects?

2022-10-14 Thread Dhairya Parmar
'.
> >
> >Intermediate files are './rados-20221008062356.intermediate' and
> > './radosgw-admin-20221008062356.intermediate'.
> > ***
> >
> > *** WARNING: This is EXPERIMENTAL code and the results should be used
> > ***  only with CAUTION!
> > ***
> > Done at Sat Oct  8 06:48:07 UTC 2022.
> >
> > [root@node01 /]# radosgw-admin gc list
> > []
> >
> > [root@node01 /]# cat orphan-list-20221008062356.out | wc -l
> > 39662551
> >
> > [root@node01 /]# rados df
> > POOL_NAME   USED   OBJECTS  CLONES COPIES
> > MISSING_ON_PRIMARY  UNFOUND  DEGRADED RD_OPS   RD WR_OPS
> > WR  USED COMPR  UNDER COMPR
> > .nfs 4.3 MiB 4   0 12
> >00 0  77398   76 MiB146   79
> > KiB 0 B  0 B
> > .rgw.root180 KiB16   0 48
> >00 0  28749   28 MiB  0
> > 0 B 0 B  0 B
> > cephfs-metadata  932 MiB 14772   0  44316
> >00 01569690  3.8 GiB1258651  3.4
> > GiB 0 B  0 B
> > cephfs-replicated-pool   738 GiB300962   0 902886
> >00 0 794612  470 GiB 770689  245
> > GiB 0 B  0 B
> > deeproute-replica-hdd-pool  1016 GiB104276   0 312828
> >00 0   18176216  298 GiB  441783780  6.7
> > TiB 0 B  0 B
> > deeproute-replica-ssd-pool30 GiB  3691   0  11073
> >00 02466079  2.1 GiB8416232  221
> > GiB 0 B  0 B
> > device_health_metrics 50 MiB   108   0324
> >00 0   1836  1.8 MiB   1944   18
> > MiB 0 B  0 B
> > os-test.rgw.buckets.data 5.6 TiB  39844453   0  239066718
> >00 0  552896177  3.0 TiB  999441015   60
> > TiB 0 B  0 B
> > os-test.rgw.buckets.index1.8 GiB33   0 99
> >00 0  153600295  154 GiB  110916573   62
> > GiB 0 B  0 B
> > os-test.rgw.buckets.non-ec   2.1 MiB45   0135
> >00 0 574240  349 MiB 153725  139
> > MiB 0 B  0 B
> > os-test.rgw.control  0 B 8   0 24
> >00 0  0  0 B  0
> > 0 B 0 B  0 B
> > os-test.rgw.log  3.7 MiB   346   0   1038
> >00 0   83877803   80 GiB6306730  7.6
> > GiB 0 B  0 B
> > os-test.rgw.meta 220 KiB23   0 69
> >00 0 640854  506 MiB 108229   53
> > MiB 0 B  0 B
> >
> > total_objects40268737
> > total_used   7.8 TiB
> > total_avail  1.1 PiB
> > total_space  1.1 PiB
> > ```
> > ceph verison:
> > ```
> > [root@node01 /]# ceph versions
> > {
> >"mon": {
> >"ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17)
> > pacific (stable)": 3
> >},
> >"mgr": {
> >"ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17)
> > pacific (stable)": 2
> >},
> >"osd": {
> >"ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17)
> > pacific (stable)": 108
> >},
> >"mds": {
> >"ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17)
> > pacific (stable)": 2
> >},
> >"rgw": {
> >"ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17)
> > pacific (stable)": 9
> >},
> >"overall": {
> >"ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17)
> > pacific (stable)": 124
> >}
> > }
> > ```
> >
> > Thanks,
> > Best regards
> > Liang Zheng
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
*Dhairya Parmar*

He/Him/His

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dpar...@redhat.com
<https://www.redhat.com/>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to determine if a filesystem is allow_standby_replay = true

2022-10-20 Thread Dhairya Parmar
Hi Wesley,

You can find if the `allow_standby_replay` is turned on or off by looking
at the fs dump,
run `ceph fs dump | grep allow_standby_replay` and if it is turned on you
will find something like:

$ ./bin/ceph fs dump | grep allow_standby_replay
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2022-10-21T00:06:14.656+0530 7fbed4fc3640 -1 WARNING: all dangerous and
experimental features are enabled.
2022-10-21T00:06:14.663+0530 7fbed4fc3640 -1 WARNING: all dangerous and
experimental features are enabled.
dumped fsmap epoch 8
flags 32 joinable allow_snaps allow_multimds_snaps *allow_standby_replay*

turn it to false and it will be gone:

$ ./bin/ceph fs set a allow_standby_replay false
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2022-10-21T00:10:38.668+0530 7f68b66f0640 -1 WARNING: all dangerous and
experimental features are enabled.
2022-10-21T00:10:38.675+0530 7f68b66f0640 -1 WARNING: all dangerous and
experimental features are enabled.
$ ./bin/ceph fs dump | grep allow_standby_replay
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2022-10-21T00:10:43.938+0530 7fe6b3e7a640 -1 WARNING: all dangerous and
experimental features are enabled.
2022-10-21T00:10:43.945+0530 7fe6b3e7a640 -1 WARNING: all dangerous and
experimental features are enabled.
dumped fsmap epoch 15

Hope it helps.


On Thu, Oct 20, 2022 at 11:09 PM Wesley Dillingham 
wrote:

> I am building some automation for version upgrades of MDS and part of the
> process I would like to determine if a filesystem has allow_standby_replay
> set to true and if so then disable it. Granted I could just issue: "ceph fs
> set MyFS allow_standby_replay false" and be done with it but Its got me
> curious that there is not the equivalent command: "ceph fs get MyFS
> allow_standby_replay" to check this information. So where can an operator
> determine this?
>
> I tried a diff of "ceph fs get MyFS" with this configurable in both true
> and false and found:
>
> diff /tmp/true /tmp/false
> 3,4c3,4
> < epoch 66
> < flags 32
> ---
> > epoch 67
> > flags 12
>
> and Im guessing this information is encoded  in the "flags" field. I am
> working with 16.2.10. Thanks.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> ___________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 
*Dhairya Parmar*

He/Him/His

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dpar...@redhat.com
<https://www.redhat.com/>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to determine if a filesystem is allow_standby_replay = true

2022-10-20 Thread Dhairya Parmar
Hi Wesley,

It's 17.0.0-14319-ga686eb80799 (a686eb80799dc503a45002f4b9181f4573e8e0b3)
quincy (dev)

On Fri, Oct 21, 2022 at 3:29 AM Wesley Dillingham 
wrote:

> Thanks Dhairya, what version are you using? I am 16.2.10
>
> [root@alma3-4 ~]# ceph fs dump | grep -i replay
> dumped fsmap epoch 90
> [mds.alma3-6{0:10340349} state up:standby-replay seq 1 addr [v2:
> 10.0.24.6:6803/937383171,v1:10.0.24.6:6818/937383171] compat
> {c=[1],r=[1],i=[7ff]}]
>
> as you can see i have a MDS in replay mode and standby replay is enabled
> but my output is different from yours.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Thu, Oct 20, 2022 at 2:43 PM Dhairya Parmar  wrote:
>
>> Hi Wesley,
>>
>> You can find if the `allow_standby_replay` is turned on or off by looking
>> at the fs dump,
>> run `ceph fs dump | grep allow_standby_replay` and if it is turned on you
>> will find something like:
>>
>> $ ./bin/ceph fs dump | grep allow_standby_replay
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> 2022-10-21T00:06:14.656+0530 7fbed4fc3640 -1 WARNING: all dangerous and
>> experimental features are enabled.
>> 2022-10-21T00:06:14.663+0530 7fbed4fc3640 -1 WARNING: all dangerous and
>> experimental features are enabled.
>> dumped fsmap epoch 8
>> flags 32 joinable allow_snaps allow_multimds_snaps *allow_standby_replay*
>>
>> turn it to false and it will be gone:
>>
>> $ ./bin/ceph fs set a allow_standby_replay false
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> 2022-10-21T00:10:38.668+0530 7f68b66f0640 -1 WARNING: all dangerous and
>> experimental features are enabled.
>> 2022-10-21T00:10:38.675+0530 7f68b66f0640 -1 WARNING: all dangerous and
>> experimental features are enabled.
>> $ ./bin/ceph fs dump | grep allow_standby_replay
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> 2022-10-21T00:10:43.938+0530 7fe6b3e7a640 -1 WARNING: all dangerous and
>> experimental features are enabled.
>> 2022-10-21T00:10:43.945+0530 7fe6b3e7a640 -1 WARNING: all dangerous and
>> experimental features are enabled.
>> dumped fsmap epoch 15
>>
>> Hope it helps.
>>
>>
>> On Thu, Oct 20, 2022 at 11:09 PM Wesley Dillingham 
>> wrote:
>>
>>> I am building some automation for version upgrades of MDS and part of the
>>> process I would like to determine if a filesystem has
>>> allow_standby_replay
>>> set to true and if so then disable it. Granted I could just issue: "ceph
>>> fs
>>> set MyFS allow_standby_replay false" and be done with it but Its got me
>>> curious that there is not the equivalent command: "ceph fs get MyFS
>>> allow_standby_replay" to check this information. So where can an operator
>>> determine this?
>>>
>>> I tried a diff of "ceph fs get MyFS" with this configurable in both true
>>> and false and found:
>>>
>>> diff /tmp/true /tmp/false
>>> 3,4c3,4
>>> < epoch 66
>>> < flags 32
>>> ---
>>> > epoch 67
>>> > flags 12
>>>
>>> and Im guessing this information is encoded  in the "flags" field. I am
>>> working with 16.2.10. Thanks.
>>>
>>> Respectfully,
>>>
>>> *Wes Dillingham*
>>> w...@wesdillingham.com
>>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>>
>>
>> --
>> *Dhairya Parmar*
>>
>> He/Him/His
>>
>> Associate Software Engineer, CephFS
>>
>> Red Hat Inc. <https://www.redhat.com/>
>>
>> dpar...@redhat.com
>> <https://www.redhat.com/>
>>
>

-- 
*Dhairya Parmar*

He/Him/His

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dpar...@redhat.com
<https://www.redhat.com/>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: what happens if a server crashes with cephfs?

2022-12-07 Thread Dhairya Parmar
Hi Charles,

There are many scenarios where the write/close operation can fail but
generally
failures/errors are logged (normally every time) to help debug the case.
Therefore
there are no silent failures as such except you encountered  a very rare
bug.
- Dhairya


On Wed, Dec 7, 2022 at 11:38 PM Charles Hedrick  wrote:

> I believe asynchronous operations are used for some operations in cephfs.
> That means the server acknowledges before data has been written to stable
> storage. Does that mean there are failure scenarios when a write or close
> will return an error? fail silently?
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-12 Thread Dhairya Parmar
Hi there,

You might want to look at [1] for this, also I found a relevant thread [2]
that could be helpful.

[1]
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts
[2] https://www.spinics.net/lists/ceph-users/msg53202.html

- Dhairya


On Mon, Dec 12, 2022 at 7:10 PM Sascha Lucas  wrote:

> Hi,
>
> without any outage/disaster cephFS (17.2.5/cephadm) reports damaged
> metadata:
>
> [root@ceph106 ~]# zcat
> /var/log/ceph/3cacfa58-55cf-11ed-abaf-5cba2c03dec0/ceph-mds.disklib.ceph106.kbzjbg.log-20221211.gz
> 2022-12-10T10:12:35.161+ 7fa46779d700  1 mds.disklib.ceph106.kbzjbg
> Updating MDS map to version 958 from mon.1
> 2022-12-10T10:12:50.974+ 7fa46779d700  1 mds.disklib.ceph106.kbzjbg
> Updating MDS map to version 959 from mon.1
> 2022-12-10T15:18:36.609+ 7fa461791700  0
> mds.0.cache.dir(0x11516b1) _fetched missing object for [dir
> 0x11516b1
> /volumes/_nogroup/ec-pool4p2/aa36abb9-a22e-405f-921c-76152599c6ba/LQ1WYG_10.28.2022_04.50/CV_MAGNETIC/V_7770505/
> [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741888|fetching f() n()
> hs=0+0,ss=0+0 | waiter=1 authpin=1 0x56541d3c5a80]
> 2022-12-10T15:18:36.615+ 7fa461791700 -1 log_channel(cluster) log
> [ERR] : dir 0x11516b1 object missing on disk; some files may be lost
> (/volumes/_nogroup/ec-pool4p2/aa36abb9-a22e-405f-921c-76152599c6ba/LQ1WYG_10.28.2022_04.50/CV_MAGNETIC/V_7770505)
> 2022-12-10T15:18:40.010+ 7fa46779d700  1 mds.disklib.ceph106.kbzjbg
> Updating MDS map to version 960 from mon.1
> 2022-12-11T02:32:01.474+ 7fa468fa0700 -1 received  signal: Hangup from
> Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() )
> UID: 0
>
> [root@ceph101 ~]# ceph tell mds.disklib:0 damage ls
> 2022-12-12T10:20:42.484+0100 7fa9e37fe700  0 client.165258 ms_handle_reset
> on v2:xxx.xxx.xxx.xxx:6800/519677707
> 2022-12-12T10:20:42.504+0100 7fa9e37fe700  0 client.165264 ms_handle_reset
> on v2:xxx.xxx.xxx.xxx:6800/519677707
> [
>  {
>  "damage_type": "dir_frag",
>  "id": 2085830739,
>  "ino": 1099513009841,
>  "frag": "*",
>  "path":
> "/volumes/_nogroup/ec-pool4p2/aa36abb9-a22e-405f-921c-76152599c6ba/LQ1WYG_10.28.2022_04.50/CV_MAGNETIC/V_7770505"
>  }
> ]
>
> The mentioned path CV_MAGNETIC/V_7770505 is not visible, but I can't
> tell whether this is due to being lost, or removed by the application
> using the cephFS.
>
> Data is on EC4+2 pool, ROOT and METADATA are on replica=3 pools.
>
> Questions are: What happened? And how to fix the problem?
>
> Is running "ceph tell mds.disklib:0 scrub start /what/path?
> recursive,repair" the right thing? Is this a safe command? How is the
> impact on production?
>
> Can the file-system stay mounted/used by clients? How long will it take
> for 340T? What is a dir_frag damage?
>
> TIA, Sascha.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recent ceph.io Performance Blog Posts

2022-12-23 Thread Dhairya Parmar
If this is the same issue that affected a couple of PRs in the last few
weeks
then rebasing the PR with the latest fetch of the main branch and force
pushing
it should solve the problem.
- Dhairya


On Fri, Dec 23, 2022 at 7:12 PM Stefan Kooman  wrote:

> On 12/19/22 10:26, Stefan Kooman wrote:
> > On 12/14/22 19:04, Mark Nelson wrote:
> >
> >>
> >> This is great work!  Would you consider making a PR against main for
> >> the change to ceph-volume?  Given that you have performance data it
> >> sounds like good justification.  I'm not sure who's merging changes to
> >> ceph-volume these days, but I can try to find out if no one is biting.
> >>
> >>
> > Yes, will do.
>
> https://github.com/ceph/ceph/pull/49554
>
> Gr. Stefan
>
> P.s. failed "make check" does not seem to be related to my changes AFAICT.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph v15.2.14 - Dirty Object issue

2023-03-02 Thread Dhairya Parmar
Did you try options from cache-sizing
 or
other-tunables
?
- Dhairya


On Fri, Mar 3, 2023 at 5:39 AM  wrote:

> Hi, we have a cluster with this ceph df
>
> --- RAW STORAGE ---
> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> hdd240 GiB  205 GiB   29 GiB35 GiB  14.43
> hddvm  1.6 TiB  1.2 TiB  277 GiB   332 GiB  20.73
> TOTAL  1.8 TiB  1.4 TiB  305 GiB   366 GiB  19.91
>
> --- POOLS ---
> POOL   ID  PGS  STORED   (DATA)   (OMAP)   OBJECTS  USED
>(DATA)   (OMAP)   %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES  DIRTY
> USED COMPR  UNDER COMPR
> device_health_metrics   11  0 B  0 B  0 B0  0
> B  0 B  0 B  0308 GiB  N/AN/A0
>0 B  0 B
> rbd-pool2   32539 B 19 B520 B9539
> B 19 B520 B  0462 GiB  N/AN/A9
>0 B  0 B
> cephfs.sharedfs.meta3   32  299 MiB  190 MiB  109 MiB   87.10k  299
> MiB  190 MiB  109 MiB   0.03308 GiB  N/AN/A
>  87.10k 0 B  0 B
> cephfs.sharedfs.data4   32  2.2 GiB  2.2 GiB  0 B  121.56k  2.2
> GiB  2.2 GiB  0 B   0.23308 GiB  N/AN/A
> 121.56k 0 B  0 B
> rbd-pool-proddeb02  5   32  712 MiB  712 MiB568 B  201  712
> MiB  712 MiB568 B   0.08308 GiB  N/AN/A
> 201 0 B  0 B
>
>
> So as you can see we have 332GB RAW but data really are 299+2.2G+712M
>
> POOL   ID  PGS  STORED   OBJECTS  USED %USED  MAX AVAIL
> device_health_metrics   11  0 B0  0 B  0308 GiB
> rbd-pool2   32539 B9539 B  0462 GiB
> cephfs.sharedfs.meta3   32  299 MiB   87.10k  299 MiB   0.03308 GiB
> cephfs.sharedfs.data4   32  2.2 GiB  121.56k  2.2 GiB   0.23308 GiB
> rbd-pool-proddeb02  5   32  712 MiB  201  712 MiB   0.08308 GiB
>
> How to clean Dirty ? How is that possible ? any cache issue or not
> committed flush from client ?
> Best regards
> Alessandro
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to restart mds - mds crashes almost immediately after finishing recovery

2023-05-04 Thread Dhairya Parmar
Apart from PR mentioned by xiubo, #49691
 also contains a good fix for this
issue.
- Dhairya


On Fri, May 5, 2023 at 6:32 AM Xiubo Li  wrote:

> Hi Emmanuel,
>
> This should be one known issue as https://tracker.ceph.com/issues/58392
> and there is one fix in https://github.com/ceph/ceph/pull/49652.
>
> Could you just stop all the clients first and then set the 'max_mds' to
> 1 and then restart the MDS daemons ?
>
> Thanks
>
> On 5/3/23 16:01, Emmanuel Jaep wrote:
> > Hi,
> >
> > I just inherited a ceph storage. Therefore, my level of confidence with
> the tool is certainly less than ideal.
> >
> > We currently have an mds server that refuses to come back online. While
> reviewing the logs, I can see that, upon mds start, the recovery goes well:
> > ```
> > -10> 2023-05-03T08:12:43.632+0200 7f345d00b700  1 mds.4.2638711
> cluster recovered.
> >   12: (MDCache::_open_ino_traverse_dir(inodeno_t,
> MDCache::open_ino_info_t&, int)+0xbf) [0x558323d602df]
> > ```
> >
> > However, rights after this message, ceph handles a couple of clients
> request:
> > ```
> >  -9> 2023-05-03T08:12:43.632+0200 7f345d00b700  4 mds.4.2638711
> set_osd_epoch_barrier: epoch=249241
> >  -8> 2023-05-03T08:12:43.632+0200 7f3459003700  2 mds.4.cache Memory
> usage:  total 2739784, rss 2321188, heap 348412, baseline 315644, 0 /
> 765023 inodes have caps, 0 caps, 0 caps per inode
> >  -7> 2023-05-03T08:12:43.688+0200 7f3458802700  4 mds.4.server
> handle_client_request client_request(client.108396030:57271 lookup
> #0x70001516236/012385530.npy 2023-05-02T20:37:19.675666+0200 RETRY=6
> caller_uid=135551, caller_gid=11157{0,4,27,11157,}) v5
> >  -6> 2023-05-03T08:12:43.688+0200 7f3458802700  4 mds.4.server
> handle_client_request client_request(client.104073212:5109945 readdir
> #0x70001516236 2023-05-02T20:36:29.517066+0200 RETRY=6 caller_uid=180090,
> caller_gid=11157{0,4,27,11157,}) v5
> >  -5> 2023-05-03T08:12:43.688+0200 7f3458802700  4 mds.4.server
> handle_client_request client_request(client.104288735:3008344 readdir
> #0x70001516236 2023-05-02T20:36:29.520801+0200 RETRY=6 caller_uid=135551,
> caller_gid=11157{0,4,27,11157,}) v5
> >  -4> 2023-05-03T08:12:43.688+0200 7f3458802700  4 mds.4.server
> handle_client_request client_request(client.8558540:46306346 readdir
> #0x700019ba15e 2023-05-01T21:26:34.303697+0200 RETRY=49 caller_uid=0,
> caller_gid=0{}) v2
> >  -3> 2023-05-03T08:12:43.688+0200 7f3458802700  4 mds.4.server
> handle_client_request client_request(client.96913903:2156912 create
> #0x1000b37db9a/street-photo-3.png 2023-05-01T17:27:37.454042+0200 RETRY=59
> caller_uid=271932, caller_gid=30034{}) v2
> >  -2> 2023-05-03T08:12:43.688+0200 7f345d00b700  5 mds.icadmin006
> handle_mds_map old map epoch 2638715 <= 2638715, discarding
> > ```
> >
> > and crashes:
> > ```
> >  -1> 2023-05-03T08:12:43.692+0200 7f345d00b700 -1
> /build/ceph-16.2.10/src/mds/Server.cc: In function 'void
> Server::handle_client_open(MDRequestRef&)' thread 7f345d00b700 time
> 2023-05-03T08:12:43.694660+0200
> > /build/ceph-16.2.10/src/mds/Server.cc: 4240: FAILED
> ceph_assert(cur->is_auth())
> >
> >   ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17)
> pacific (stable)
> >   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x152) [0x7f3462533d65]
> >   2: /usr/lib/ceph/libceph-common.so.2(+0x265f6d) [0x7f3462533f6d]
> >   3:
> (Server::handle_client_open(boost::intrusive_ptr&)+0x1834)
> [0x558323c89f04]
> >   4:
> (Server::handle_client_openc(boost::intrusive_ptr&)+0x28f)
> [0x558323c925ef]
> >   5:
> (Server::dispatch_client_request(boost::intrusive_ptr&)+0xa45)
> [0x558323cc3575]
> >   6:
> (MDCache::dispatch_request(boost::intrusive_ptr&)+0x3d)
> [0x558323d7460d]
> >   7: (MDSContext::complete(int)+0x61) [0x558323f68681]
> >   8: (MDCache::_open_remote_dentry_finish(CDentry*, inodeno_t,
> MDSContext*, bool, int)+0x3e) [0x558323d3edce]
> >   9: (C_MDC_OpenRemoteDentry::finish(int)+0x3e) [0x558323de6cce]
> >   10: (MDSContext::complete(int)+0x61) [0x558323f68681]
> >   11: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&,
> int)+0xcf) [0x558323d5ff2f]
> >   12: (MDCache::_open_ino_traverse_dir(inodeno_t,
> MDCache::open_ino_info_t&, int)+0xbf) [0x558323d602df]
> >   13: (MDSContext::complete(int)+0x61) [0x558323f68681]
> >   14: (MDSRank::_advance_queues()+0x88) [0x558323c23c38]
> >   15: (MDSRank::_dispatch(boost::intrusive_ptr const&,
> bool)+0x1fa) [0x558323c24a1a]
> >   16: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr const> const&)+0x5e) [0x558323c254fe]
> >   17: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr
> const&)+0x1d6) [0x558323bfd906]
> >   18: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr
> const&)+0x460) [0x7f34627854e0]
> >   19: (DispatchQueue::entry()+0x58f) [0x7f3462782d7f]
> >   20: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f346284eee1]
> >   21: /lib/x86_64-linux-gnu/libpthre

[ceph-users] Re: MDS Upgrade from 17.2.5 to 17.2.6 not possible

2023-05-24 Thread Dhairya Parmar
On Wed, May 17, 2023 at 9:26 PM Henning Achterrath 
wrote:

> Hi all,
>
> we did a major update from Pacific to Quincy (17.2.5) a month ago
> without any problems.
>
> Now we have tried a minor update from 17.2.5 to 17.2.6 (ceph orch
> upgrade). It stucks at mds upgrade phase. At this point the cluster
> tries to scale down mds (ceph fs set max_mds 1). We waited a few hours.
>
Just an FYI (if you use cephadm to carry out upgrades), having max_mds 1 can
be disastrous (especially for huge CephFS deployments) because cluster
cannot
quickly reduce active MDSs to 1 and a single active MDS cannot easily handle
the load of all clients. To overcome this, you can upgrade MDSs without
reducing max_mds, the fail_fs option can to be set to true prior to the
upgrade. There's a note in the beginning of the "STARTING THE UPGRADE"
section
that might be helpful to understand this better.

https://docs.ceph.com/en/latest/cephadm/upgrade/#starting-the-upgrade


> We are running two active mds with 1 standby. No subdir pinning
> configured. CephFS data pool: 575 TB
>
> While Upgrading, Rank 1 MDS remains in state stopping. During this state
> clients are not able to reconnect. So we paused this upgrade and set
> max_mds to 2 back again and fail rank 1. After that, standby becomes
> active.
> In the mds (rank 1 in stopping state) logs we can see: waiting for
> strays to migrate
>
> In our second try, we have evicted all clients first without success.
>
> We make daily snapshots on / and rotate them via snapshot scheduler
> after one week.
>
> Is there a way to get rid of stray entries without scale down mds or do
> we have to wait longer?
>
> We had about the same amount of strays before we did the major upgrade.
> So, it is a bit curious.
>
> Current output from ceph perf dump
>
> Rank0:
>
> "num_strays": 417304,
>  "num_strays_delayed": 3,
>  "num_strays_enqueuing": 0,
>  "strays_created": 567879,
>  "strays_enqueued": 561803,
>  "strays_reintegrated": 13751,
>  "strays_migrated": 4,
>
>
> Rank1:
>
> ceph daemon mds.fdi-cephfs.ceph-service-13.rwdkqs perf dump | grep stray
>
>  "num_strays": 172528,
>  "num_strays_delayed": 0,
>  "num_strays_enqueuing": 0,
>  "strays_created": 418365,
>  "strays_enqueued": 396142,
>  "strays_reintegrated": 67406,
>  "strays_migrated": 4,
>
>
>
> Any help would be appreciated.
>
>
> best regards
> Henning
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: what are the options for config a CephFS client session

2023-06-12 Thread Dhairya Parmar
Hi,

There's just one option for `session config` (or `client config` both are
same) as of now i.e. "timeout"
#> ceph tell mds.0 session config  timeout 


*Dhairya Parmar*

Associate Software Engineer, CephFS


On Mon, Jun 12, 2023 at 2:29 PM Denis Polom  wrote:

> Hi,
>
> I didn't find any doc and any way how to get to know valid options to
> configure client session over mds socket:
>
> #> ceph tell mds.mds1 session config
>
> session config   [] :  Config a CephFS
> client session
>
>
> Any hint on this?
>
> Thank you
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Delete or move files from lost+found in cephfs

2023-07-04 Thread Dhairya Parmar
Hi,

These symptoms look relevant to [0] and its PR is already merged in main;
backported to quincy but pacific and reef are pending.

[0] https://tracker.ceph.com/issues/59569

- Dhairya


On Tue, Jul 4, 2023 at 1:54 AM Thomas Widhalm 
wrote:

> Hi,
>
> I had some trouble in the past with my CephFS which I was able to
> resolve - mostly with your help.
>
> Now I have about 150GB of data in lost+found in my CephFS. No matter
> what I try and how I change permissions, every time when I try to delete
> or move something from there I only get the reply: "mv: cannot remove
> 'lost+found/12b047c': Read-only file system".
>
> I searched the web and configuration items but I didn't find a way to
> get rid of these files. I copyied most of them to another place,
> identified them and have them back. So in lost+found there are mostly
> useless copies.
>
> Cheers,
> Thomas
> --
> http://www.widhalm.or.at
> GnuPG : 6265BAE6 , A84CB603
> Threema: H7AV7D33
> Telegram, Signal: widha...@widhalm.or.at
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: what is the point of listing "auth: unable to find a keyring on /etc/ceph/ceph.client nfs-ganesha

2023-07-21 Thread Dhairya Parmar
Hi Marc,

Can you confirm if the mon ip in ceph.conf is correct and is public; also
the keyring path is specified correctly?


*Dhairya Parmar*

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dpar...@redhat.com
<https://www.redhat.com/>


On Thu, Jul 20, 2023 at 9:40 PM Marc  wrote:

>
> I need some help understanding this. I have configured nfs-ganesha for
> cephfs using something like this in ganesha.conf
>
> FSAL { Name = CEPH; User_Id = "testing.nfs"; Secret_Access_Key =
> "AAA=="; }
>
> But I contstantly have these messages in de ganesha logs, 6x per user_id
>
> auth: unable to find a keyring on /etc/ceph/ceph.client.testing
>
> I thought this was a ganesha authentication order issue, but they[1] say
> it has to do with ceph. I am still on Nautilus so maybe this has been fixed
> in newer releases. I still have a hard time understanding why this is an
> issue of ceph (libraries).
>
>
> [1]
> https://github.com/nfs-ganesha/nfs-ganesha/issues/974
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: what is the point of listing "auth: unable to find a keyring on /etc/ceph/ceph.client nfs-ganesha

2023-07-21 Thread Dhairya Parmar
Okay then I'd suggest adding keyring to the client section in ceph.conf, it
is as simple as
keyring = /keyring

I hope the client(that the logs complain) is in the keyring file. Do let me
know if that works for you, if not, some logs would be good to have to
diagnose further.

On Fri, Jul 21, 2023 at 7:44 PM Marc  wrote:

> Hi Dhairya,
>
> Yes I have in ceph.conf (only copied the lines below, there are more in
> these sections). I do not have a keyring path setting in ceph.conf
>
>
> public network = a.b.c.111/24
>
> [mon]
> mon host = a.b.c.111,a.b.c.112,a.b.c.113
>
> [mon.a]
> mon addr = a.b.c.111
>
> [mon.b]
> mon addr = a.b.c.112
>
> [mon.c]
> mon addr = a.b.c.113
>
>
> >
> > Can you confirm if the mon ip in ceph.conf is correct and is public;
> > also the keyring path is specified correctly?
> >
> >
> >
> >   I need some help understanding this. I have configured nfs-ganesha
> > for cephfs using something like this in ganesha.conf
> >
> >   FSAL { Name = CEPH; User_Id = "testing.nfs"; Secret_Access_Key =
> > "AAA=="; }
> >
> >   But I contstantly have these messages in de ganesha logs, 6x per
> > user_id
> >
> >   auth: unable to find a keyring on /etc/ceph/ceph.client.testing
> >
> >   I thought this was a ganesha authentication order issue, but
> > they[1] say it has to do with ceph. I am still on Nautilus so maybe this
> > has been fixed in newer releases. I still have a hard time understanding
> > why this is an issue of ceph (libraries).
> >
> >
> >   [1]
> >   https://github.com/nfs-ganesha/nfs-ganesha/issues/974
> >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Client failing to respond to capability release

2023-08-23 Thread Dhairya Parmar
Hi Frank,

This usually happens when the client is buggy/unresponsive. This warning is
triggered when the client fails to respond to MDS's request to release caps
in time which is determined by session_timeout(defaults to 60 secs). Did
you make any config changes?


*Dhairya Parmar*

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dpar...@redhat.com
<https://www.redhat.com/>


On Tue, Aug 22, 2023 at 9:12 PM Frank Schilder  wrote:

> Hi all,
>
> I have this warning the whole day already (octopus latest cluster):
>
> HEALTH_WARN 4 clients failing to respond to capability release; 1 pgs not
> deep-scrubbed in time
> [WRN] MDS_CLIENT_LATE_RELEASE: 4 clients failing to respond to capability
> release
> mds.ceph-24(mds.1): Client sn352.hpc.ait.dtu.dk:con-fs2-hpc failing
> to respond to capability release client_id: 145698301
> mds.ceph-24(mds.1): Client sn463.hpc.ait.dtu.dk:con-fs2-hpc failing
> to respond to capability release client_id: 189511877
> mds.ceph-24(mds.1): Client sn350.hpc.ait.dtu.dk:con-fs2-hpc failing
> to respond to capability release client_id: 189511887
> mds.ceph-24(mds.1): Client sn403.hpc.ait.dtu.dk:con-fs2-hpc failing
> to respond to capability release client_id: 231250695
>
> If I look at the session info from mds.1 for these clients I see this:
>
> # ceph tell mds.1 session ls | jq -c '[.[] | {id: .id, h:
> .client_metadata.hostname, addr: .inst, fs: .client_metadata.root, caps:
> .num_caps, req: .request_load_avg}]|sort_by(.caps)|.[]' | grep -e 145698301
> -e 189511877 -e 189511887 -e 231250695
> {"id":189511887,"h":"sn350.hpc.ait.dtu.dk","addr":"client.189511887 v1:
> 192.168.57.221:0/4262844211","fs":"/hpc/groups","caps":2,"req":0}
> {"id":231250695,"h":"sn403.hpc.ait.dtu.dk","addr":"client.231250695 v1:
> 192.168.58.18:0/1334540218","fs":"/hpc/groups","caps":3,"req":0}
> {"id":189511877,"h":"sn463.hpc.ait.dtu.dk","addr":"client.189511877 v1:
> 192.168.58.78:0/3535879569","fs":"/hpc/groups","caps":4,"req":0}
> {"id":145698301,"h":"sn352.hpc.ait.dtu.dk","addr":"client.145698301 v1:
> 192.168.57.223:0/2146607320","fs":"/hpc/groups","caps":7,"req":0}
>
> We have mds_min_caps_per_client=4096, so it looks like the limit is well
> satisfied. Also, the file system is pretty idle at the moment.
>
> Why and what exactly is the MDS complaining about here?
>
> Thanks and best regards.
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS warning: clients laggy due to laggy OSDs

2023-09-20 Thread Dhairya Parmar
Hi Janek,

The PR venky mentioned makes use of OSD's laggy parameters (laggy_interval
and
laggy_probability) to find if any OSD is laggy or not. These laggy
parameters
can reset to 0 if the interval between the last modification done to OSDMap
and
the time stamp when OSD was marked down exceeds the grace interval threshold
which is the value we get by `mon_osd_laggy_halflife * 48` where
mon_osd_laggy_halflife is a configurable value which is by default 3600 so
only
if the interval I talked about exceeds 172800; the laggy parameters would
reset
to 0. I'd recommend taking a look at what your configured value is(using
cmd:
ceph config get osd mon_osd_laggy_halflife).

There is also a "hack" to reset the parameters manually(
*Not recommended, justfor info*): set mon_osd_laggy_weight to 1 using `ceph
config set osd
mon_osd_laggy_weight 1` and reboot the OSD(s) which is/are being said laggy
and
you will see the lagginess go away.


*Dhairya Parmar*

Associate Software Engineer, CephFS

Red Hat Inc. <https://www.redhat.com/>

dpar...@redhat.com
<https://www.redhat.com/>


On Wed, Sep 20, 2023 at 3:25 PM Venky Shankar  wrote:

> Hey Janek,
>
> I took a closer look at various places where the MDS would consider a
> client as laggy and it seems like a wide variety of reasons are taken
> into consideration and not all of them might be a reason to defer client
> eviction, so the warning is a bit misleading. I'll post a PR for this. In
> the meantime, could you share the debug logs stated in my previous email?
>
> On Wed, Sep 20, 2023 at 3:07 PM Venky Shankar  wrote:
>
> > Hi Janek,
> >
> > On Tue, Sep 19, 2023 at 4:44 PM Janek Bevendorff <
> > janek.bevendo...@uni-weimar.de> wrote:
> >
> >> Hi Venky,
> >>
> >> As I said: There are no laggy OSDs. The maximum ping I have for any OSD
> >> in ceph osd perf is around 60ms (just a handful, probably aging disks).
> The
> >> vast majority of OSDs have ping times of less than 1ms. Same for the
> host
> >> machines, yet I'm still seeing this message. It seems that the affected
> >> hosts are usually the same, but I have absolutely no clue why.
> >>
> >
> > It's possible that you are running into a bug which does not clear the
> > laggy clients list which the MDS sends to monitors via beacons. Could you
> > help us out with debug mds logs (by setting debug_mds=20) for the active
> > mds for around 15-20 seconds and share the logs please? Also reset the
> log
> > level once done since it can hurt performance.
> >
> > # ceph config set mds.<> debug_mds 20
> >
> > and reset via
> >
> > # ceph config rm mds.<> debug_mds
> >
> >
> >> Janek
> >>
> >>
> >> On 19/09/2023 12:36, Venky Shankar wrote:
> >>
> >> Hi Janek,
> >>
> >> On Mon, Sep 18, 2023 at 9:52 PM Janek Bevendorff <
> >> janek.bevendo...@uni-weimar.de> wrote:
> >>
> >>> Thanks! However, I still don't really understand why I am seeing this.
> >>>
> >>
> >> This is due to a changes that was merged recently in pacific
> >>
> >> https://github.com/ceph/ceph/pull/52270
> >>
> >> The MDS would not evict laggy clients if the OSDs report as laggy. Laggy
> >> OSDs can cause cephfs clients to not flush dirty data (during cap
> revokes
> >> by the MDS) and thereby showing up as laggy and getting evicted by the
> MDS.
> >> This behaviour was changed and therefore you get warnings that some
> client
> >> are laggy but they are not evicted since the OSDs are laggy.
> >>
> >>
> >>> The first time I had this, one of the clients was a remote user
> dialling
> >>> in via VPN, which could indeed be laggy. But I am also seeing it from
> >>> neighbouring hosts that are on the same physical network with reliable
> ping
> >>> times way below 1ms. How is that considered laggy?
> >>>
> >>  Are some of your OSDs reporting laggy? This can be check via `perf
> dump`
> >>
> >> > ceph tell mds.<> perf dump
> >> (search for op_laggy/osd_laggy)
> >>
> >>
> >>> On 18/09/2023 18:07, Laura Flores wrote:
> >>>
> >>> Hi Janek,
> >>>
> >>> There was some documentation added about it here:
> >>> https://docs.ceph.com/en/pacific/cephfs/health-messages/
> >>>
> >>> There is a description of what it means, and it's tied to an mds
> >>> configurable.
> >>>
> >>> On M

[ceph-users] Re: zap an osd and it appears again

2022-03-31 Thread Dhairya Parmar
Can you try using the --force option with your command?

On Thu, Mar 31, 2022 at 1:25 AM Alfredo Rezinovsky 
wrote:

> I want to create osds manually
>
> If I zap the osd  0 with:
>
> ceph orch osd rm 0 --zap
>
> as soon as the dev is available the orchestrator creates it again
>
> If I use:
>
> ceph orch apply osd --all-available-devices --unmanaged=true
>
> and then zap the osd.0 it also appears again.
>
> There is a real way to disable the orch apply persistency or disable it
> temporarily?
>
> --
> Alfrenovsky
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recommendations on books

2022-04-27 Thread Dhairya Parmar
Hi Angelo,

Publications and RPs: You can follow this link
, it contains all the Ceph
publications and research papers that will substantially help you
understand Ceph and its umbrella(Ceph's components).

Ceph Architecture: link 

Crash Course in CRUSH by Sage Weil: link


I hope it helps you understand Ceph in some or the other way.

Regards,
Dhairya

On Wed, Apr 27, 2022 at 8:47 AM Angelo Höngens  wrote:

> Hey guys and girls,
>
> Can you recommend some books to get started with ceph? I know the docs are
> probably a good source, but books, in my experience, do a better job of
> glueing it all together and painting the big picture. And I can take a book
> to places where reading docs on a laptop is inconvenient. I know Amazon has
> some books, but what do you think are the best books?
>
> I hope to read about the different deployment methods (cephadm? Docker?
> Native?), what pg’s and crush maps are, best practices in building
> clusters, ratios between osd, wal, db, etc, what they do and why, use cases
> for cephfs vs rdb vs s3, etc.
>
> Looking forward to your tips!
>
> Angelo.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Maintenance mode?

2022-05-29 Thread Dhairya Parmar
Hi Jeremy,

I think there is a maintenance mode for Ceph, maybe check this
 out or
maybe this

could
help too.

Thanks,
Dhairya

On Mon, May 30, 2022 at 9:41 AM Jeremy Hansen  wrote:

> So in my experience so far, if I take out a switch after a firmware update
> and a reboot of the switch, meaning all ceph nodes lose network
> connectivity and communication with each other, Ceph becomes unresponsive
> and my only fix up to this point has been to, one by one, reboot the
> compute nodes. Are you saying I just need to wait? I don’t know how long
> I’ve waited in the past, but if you’re saying at least 10 minutes, I
> probably haven’t waited that long.
>
> Thanks
> -jeremy
>
> > On Sunday, May 29, 2022 at 3:40 PM, Tyler Stachecki <
> stachecki.ty...@gmail.com (mailto:stachecki.ty...@gmail.com)> wrote:
> > Ceph always aims to provide high availability. So, if you do not set
> cluster flags that prevent Ceph from trying to self-heal, it will self-heal.
> >
> > Based on your description, it sounds like you want to consider the
> 'noout' flag. By default, after 10(?) minutes of an OSD being down, Ceph
> will begin the process of outing the affected OSD to ensure high
> availability.
> >
> > But be careful, as far as latency goes -- you likely still want to
> pre-emptively mark OSDs down ahead of the planned maintenance for latency
> purposes, and you must be cognisant of whether or not your replication
> policy puts you in a position where an unrelated failure during the
> maintenance can result in inactive PGs.
> >
> > Cheers,
> > Tyler
> >
> >
> > On Sun, May 29, 2022, 5:30 PM Jeremy Hansen  jer...@skidrow.la)> wrote:
> > > Is there a maintenance mode for Ceph that would allow me to do work on
> underlying network equipment without causing Ceph to panic? In our test
> lab, we don’t have redundant networking and when doing switch upgrades and
> such, Ceph has a panic attack and we end up having to reboot Ceph nodes
> anyway. Like an hdfs style readonly mode or something?
> > >
> > > Thanks!
> > >
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io (mailto:
> ceph-users@ceph.io)
> > > To unsubscribe send an email to ceph-users-le...@ceph.io (mailto:
> ceph-users-le...@ceph.io)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Troubleshooting cephadm - not deploying any daemons

2022-06-08 Thread Dhairya Parmar
Hi Zach,

Try running `ceph orch apply mgr 2` or `ceph orch apply mgr
--placement="
"`. Refer this

doc for more information, hope it helps.

Regards,
Dhairya

On Thu, Jun 9, 2022 at 1:59 AM Zach Heise (SSCC)  wrote:

> Our 16.2.7 cluster was deployed using cephadm from the start, but now it
> seems like deploying daemons with it is broken. Running 'ceph orch apply
> mgr --placement=2' causes '6/8/22 2:34:18 PM[INF]Saving service mgr spec
> with placement count:2' to appear in the logs, but a 2nd mgr does not
> get created.
>
> I also confirmed the same with mds daemons - using the dashboard, I
> tried creating a new set of MDS daemons "220606" count:3, but they never
> got deployed. The service type appears in the dashboard, though, just
> with no daemons deployed under it. Then I tried to delete it with the
> dashboard, and now 'ceph orch ls' outputs:
>
> NAME   PORTSRUNNING  REFRESHED   AGE PLACEMENT
> mds.220606  0/3   15h  count:3
>
> More detail in YAML format doesn't even give me that much information:
>
> ceph01> ceph orch ls --service_name=mds.220606 --format yaml
> service_type: mds
> service_id: '220606'
> service_name: mds.220606
> placement:
>count: 3
> status:
>created: '2022-06-07T03:42:57.234124Z'
>running: 0
>size: 3
> events:
> - 2022-06-07T03:42:57.301349Z service:mds.220606 [INFO] "service was
> created"
>
> 'ceph health detail' reports HEALTH_OK but cephadm doesn't seem to be
> doing its job. I read through the Cephadm troubleshooting page on ceph's
> website but since the daemons I'm trying to create don't even seem to
> try to spawn containers (podman ps shows the existing containers just
> fine) I don't know where to look next for logs, to see if cephadm +
> podman are trying to create new containers and failing, or not even trying.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS 16.2.10 problem

2024-11-25 Thread Dhairya Parmar
Hi,

The log you shared indicates that MDS is waiting for the latest OSDMap
epoch. The epoch number in log line 123138 is the epoch of last failure.
Any MDS entering replay state needs at least this osdmap epoch to ensure
the blocklist propopates. If the epoch is less than this then it just goes
back to waiting.

I have limited knowledge about the OSDs but you had mentioned in your
initial mail about executing some OSD commands, I'm not sure if the issue
lies there. You can check and share OSD logs or maybe `ceph -s` could
reveal some potential warnings.


*Dhairya Parmar*

Associate Software Engineer, CephFS

IBM, Inc.



On Mon, Nov 25, 2024 at 1:29 PM 
wrote:

> Good afternoon
>
> We tried to leave only one mds, stopped others, even deleted one, and
> turned off the requirement for stand-by mds. Nothing helped, mds remained
> in the status of replays.
> Current situation: we now have two active mds in the status of replays,
> and one in stand-by.
> At the same time, in the logs we see a message
> mds.0.660178  waiting for osdmap 123138 (which blocklists prior instance)
> At the same time, there is no activity on both mds.
> The launch of the cephfs-journal-tool journal inspect utility does not
> produce any results - the utility worked for 12 hours and did not produce
> anything, we stopped it.
>
> Maybe the problem is this blocking? How to remove it?
>
> Best regards!
>
> Alexey Tsivinsky
> e-mail: a.tsivin...@baikalelectronics.com
> 
> От: Marc 
> Отправлено: 25 ноября 2024 г. 1:47
> Кому: Цивинский Алексей Александрович; ceph-users@ceph.io
> Тема: RE: CephFS 16.2.10 problem
>
> >
> > The following problem occurred.
> > There is a cluster ceph 16.2.10
> > The cluster was operating normally on Friday. Shut down cluster:
> > -Excluded all clients
> > Executed commands:
> > ceph osd set noout
> > ceph osd set nobackfill
> > ceph osd set norecover
> > ceph osd set norebalance
> > ceph osd set nodown
> > ceph osd set pause
> > Turned off the cluster, checked server maintenance.
> > Enabled cluster. He gathered himself, found all the nodes, and here the
> > problem began. After all OSD went up and all pg became available, cephfs
> > refused to start.
> > Now mds are in the replay status, and do not go to the ready status.
> > Previously, one of them was in the replay (laggy) status, but we
> > executed command:  ceph config set mds mds_wipe_sessions true
> > After that, mds switched to the status of replays, the third in standby
> > status started, and mds crashes with an error stopped.
> > But cephfs is still unavailable.
> > What else can we do?
> > The cluster is very large, almost 200 million files.
> >
>
> I assume you tried to start just one mds and wait until it would come up
> as active (before starting the others)?
>
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS 16.2.10 problem

2024-11-27 Thread Dhairya Parmar
t.13654791
> 172.16.1.76:0/3475543361 from time 2024-11-22T19:30:44.930601+,
> ignoring
> debug 2024-11-25T11:47:02.157+ 7fe61047a700 -1 log_channel(cluster)
> log [ERR] : replayed stray Session close event for client.15103382
> 172.16.1.76:0/2783362396 from time 2024-11-22T19:30:46.768393+,
> ignoring
> debug 2024-11-25T11:47:02.157+ 7fe61047a700 -1 log_channel(cluster)
> log [ERR] : replayed stray Session close event for client.15718734
> 172.16.1.76:0/897474938 from time 2024-11-22T19:30:50.850188+,
> ignoring
> debug 2024-11-25T11:47:02.157+ 7fe61047a700 -1 log_channel(cluster)
> log [ERR] : replayed stray Session close event for client.11288093
> 172.16.1.76:0/2876628524 from time 2024-11-22T19:30:54.081311+,
> ignoring
> debug 2024-11-25T11:49:10.106+ 7fe619c8d700  1 mds.cephfs.cmon3.isftcc
> asok_command: client ls {prefix=client ls} (starting...)
> debug 2024-11-25T13:43:48.963+ 7fe619c8d700  1 mds.cephfs.cmon3.isftcc
> asok_command: status {prefix=status} (starting...)
>
>
> Best Regards!
>
>
> Alexey Tsivinsky
>
>
> e-mail:a.tsivin...@baikalelectronics.com e-mail%3aa.tsivin...@baikalelectronics.com>
>
>
> 
> От: Dhairya Parmar 
> Отправлено: 25 ноября 2024 г. 16:07
> Кому: Цивинский Алексей Александрович
> Копия: m...@f1-outsourcing.eu; ceph-users@ceph.io
> Тема: Re: [ceph-users] Re: CephFS 16.2.10 problem
>
>
>
>
> On Mon, Nov 25, 2024 at 3:33 PM  <mailto:alexey.tsivin...@baikalelectronics.ru>> wrote:
>
> Thanks for your answer!
>
>
> Current status of our cluster
>
>
> cluster:
> id: c3d33e01-dfcd-4b39-8614-993370672504
> health: HEALTH_WARN
> 1 failed cephadm daemon(s)
> 1 filesystem is degraded
>
>   services:
> mon: 3 daemons, quorum cmon1,cmon2,cmon3 (age 15h)
> mgr: cmon3.ixtbep(active, since 19h), standbys: cmon1.efktsr
> mds: 2/2 daemons up
> osd: 168 osds: 168 up (since 2d), 168 in (since 3w)
>
>   data:
> volumes: 0/1 healthy, 1 recovering
> pools:   4 pools, 4641 pgs
> objects: 181.91M objects, 235 TiB
> usage:   708 TiB used, 290 TiB / 997 TiB avail
> pgs: 4630 active+clean
>  11   active+clean+scrubbing+deep
>
> This doesn't reveal much. Can you share MDS logs?
>
>
>
> We are trying to do cephfs-journal-tool --rank cephfs: 0 journal inspect
> and the utility does nothing.
>
> If the ranks are unavailable, it won't do anything. Do you see any log
> statements like "Couldn't determine MDS rank."?
>
>
> We thought that mds were blocking their journals, and turned them off. But
> the utility does not work, and ceph -s says that one mds is running,
> although we checked that we stopped all processes.
> It turns out somewhere else there is a blocking of magazines.
> What else can be done? Do you want to restart the monitors?
>
>
> Best Regards!
>
>
> Alexey Tsivinsky
>
>
> e-mail:a.tsivin...@baikalelectronics.com e-mail%3aa.tsivin...@baikalelectronics.com>
>
>
>
> От: Dhairya Parmar mailto:dpar...@redhat.com>>
> Отправлено: 25 ноября 2024 г. 12:19
> Кому: Цивинский Алексей Александрович
> Копия: m...@f1-outsourcing.eu<mailto:m...@f1-outsourcing.eu>;
> ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> Тема: Re: [ceph-users] Re: CephFS 16.2.10 problem
>
> Hi,
>
> The log you shared indicates that MDS is waiting for the latest OSDMap
> epoch. The epoch number in log line 123138 is the epoch of last failure.
> Any MDS entering replay state needs at least this osdmap epoch to ensure
> the blocklist propopates. If the epoch is less than this then it just goes
> back to waiting.
>
> I have limited knowledge about the OSDs but you had mentioned in your
> initial mail about executing some OSD commands, I'm not sure if the issue
> lies there. You can check and share OSD logs or maybe `ceph -s` could
> reveal some potential warnings.
>
>
> Dhairya Parmar
>
> Associate Software Engineer, CephFS
>
> IBM, Inc.
>
>
>
> On Mon, Nov 25, 2024 at 1:29 PM  <mailto:alexey.tsivin...@baikalelectronics.ru>> wrote:
> Good afternoon
>
> We tried to leave only one mds, stopped others, even deleted one, and
> turned off the requirement for stand-by mds. Nothing helped, mds remained
> in the status of replays.
> Current situation: we now have two active mds in the status of replays,
> and one in stand-by.
> At the same time, in the logs we see a message
> mds.0.660178  waiting for osdmap 123138 (which blocklists prior instance)
> At the same time, there is no activity on both mds.
> The launch of 

[ceph-users] Re: CephFS 16.2.10 problem

2024-11-25 Thread Dhairya Parmar
On Mon, Nov 25, 2024 at 3:33 PM 
wrote:

> Thanks for your answer!
>
>
> Current status of our cluster
>
> cluster:
> id: c3d33e01-dfcd-4b39-8614-993370672504
> health: HEALTH_WARN
> 1 failed cephadm daemon(s)
> 1 filesystem is degraded
>
>   services:
> mon: 3 daemons, quorum cmon1,cmon2,cmon3 (age 15h)
> mgr: cmon3.ixtbep(active, since 19h), standbys: cmon1.efktsr
> mds: 2/2 daemons up
> osd: 168 osds: 168 up (since 2d), 168 in (since 3w)
>
>   data:
> volumes: 0/1 healthy, 1 recovering
> pools:   4 pools, 4641 pgs
> objects: 181.91M objects, 235 TiB
> usage:   708 TiB used, 290 TiB / 997 TiB avail
> pgs: 4630 active+clean
>  11   active+clean+scrubbing+deep
>

This doesn't reveal much. Can you share MDS logs?


>
> We are trying to do cephfs-journal-tool --rank cephfs: 0 journal inspect
> and the utility does nothing.
>

If the ranks are unavailable, it won't do anything. Do you see any log
statements like "Couldn't determine MDS rank."?

We thought that mds were blocking their journals, and turned them off. But
> the utility does not work, and ceph -s says that one mds is running,
> although we checked that we stopped all processes.
> It turns out somewhere else there is a blocking of magazines.
> What else can be done? Do you want to restart the monitors?
>
>
> Best Regards!
>
>
> Alexey Tsivinsky
>
>
> e-mail:a.tsivin...@baikalelectronics.com
>
>
>
> *От:* Dhairya Parmar 
> *Отправлено:* 25 ноября 2024 г. 12:19
> *Кому:* Цивинский Алексей Александрович
> *Копия:* m...@f1-outsourcing.eu; ceph-users@ceph.io
> *Тема:* Re: [ceph-users] Re: CephFS 16.2.10 problem
>
> Hi,
>
> The log you shared indicates that MDS is waiting for the latest OSDMap
> epoch. The epoch number in log line 123138 is the epoch of last failure.
> Any MDS entering replay state needs at least this osdmap epoch to ensure
> the blocklist propopates. If the epoch is less than this then it just goes
> back to waiting.
>
> I have limited knowledge about the OSDs but you had mentioned in your
> initial mail about executing some OSD commands, I'm not sure if the issue
> lies there. You can check and share OSD logs or maybe `ceph -s` could
> reveal some potential warnings.
>
>
> *Dhairya Parmar*
>
> Associate Software Engineer, CephFS
>
> IBM, Inc.
>
>
>
> On Mon, Nov 25, 2024 at 1:29 PM 
> wrote:
>
>> Good afternoon
>>
>> We tried to leave only one mds, stopped others, even deleted one, and
>> turned off the requirement for stand-by mds. Nothing helped, mds remained
>> in the status of replays.
>> Current situation: we now have two active mds in the status of replays,
>> and one in stand-by.
>> At the same time, in the logs we see a message
>> mds.0.660178  waiting for osdmap 123138 (which blocklists prior instance)
>> At the same time, there is no activity on both mds.
>> The launch of the cephfs-journal-tool journal inspect utility does not
>> produce any results - the utility worked for 12 hours and did not produce
>> anything, we stopped it.
>>
>> Maybe the problem is this blocking? How to remove it?
>>
>> Best regards!
>>
>> Alexey Tsivinsky
>> e-mail: a.tsivin...@baikalelectronics.com
>> 
>> От: Marc 
>> Отправлено: 25 ноября 2024 г. 1:47
>> Кому: Цивинский Алексей Александрович; ceph-users@ceph.io
>> Тема: RE: CephFS 16.2.10 problem
>>
>> >
>> > The following problem occurred.
>> > There is a cluster ceph 16.2.10
>> > The cluster was operating normally on Friday. Shut down cluster:
>> > -Excluded all clients
>> > Executed commands:
>> > ceph osd set noout
>> > ceph osd set nobackfill
>> > ceph osd set norecover
>> > ceph osd set norebalance
>> > ceph osd set nodown
>> > ceph osd set pause
>> > Turned off the cluster, checked server maintenance.
>> > Enabled cluster. He gathered himself, found all the nodes, and here the
>> > problem began. After all OSD went up and all pg became available, cephfs
>> > refused to start.
>> > Now mds are in the replay status, and do not go to the ready status.
>> > Previously, one of them was in the replay (laggy) status, but we
>> > executed command:  ceph config set mds mds_wipe_sessions true
>> > After that, mds switched to the status of replays, the third in standby
>> > status started, and mds crashes with an error stopped.
>> > But cephfs is still unavailable.
>> > What else can we do?
>> > The cluster is very large, almost 200 million files.
>> >
>>
>> I assume you tried to start just one mds and wait until it would come up
>> as active (before starting the others)?
>>
>>
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to reduce CephFS num_strays effectively?

2025-02-18 Thread Dhairya Parmar
Hi,

If the strays are increasing, it mostly means there are references
lingering around. You can try to evaluate strays in the ~mdsdir [0]. If
strays keep on increasing at a staggering rate then check if the files/dirs
deleted are referenced anywhere (like snapshots) and as Eugen mentioned
note the mentioned correlation between mds_bal_fragment_size_max and
mds_cache_memory_limit.

Also, since the client is doing a huge 10TiB file deletion, can you show me
what the purge_queue looks like?

[0]
https://docs.ceph.com/en/reef/cephfs/scrub/#evaluate-strays-using-recursive-scrub

*Dhairya Parmar*

Software Engineer, CephFS


On Tue, Feb 18, 2025 at 5:32 AM Jinfeng Biao 
wrote:

> Hello Eugen and all,
>
> Thanks for the reply. We’ve checked the SuSE doc before raising it twice.
> From 100k to 125k, then to 150k.
>
> We are a  bit worried about the continuous growth of strays at 50K a day
> and would like to find an effective to reduce the strays.
>
> Last night another 30K increase in the strays.
>
> Thanks
> Jinfeng
>
>
> From: Eugen Block 
> Date: Sunday, 16 February 2025 at 7:32 PM
> To: ceph-users@ceph.io 
> Subject: [ceph-users] Re: How to reduce CephFS num_strays effectively?
> ⚠ EXTERNAL EMAIL: Do not click links or open any attachments unless you
> trust the sender and know the content is safe. ⚠
>
>
> Hi,
>
> this SUSE article [0] covers that, it helped us with a customer a few
> years ago. The recommendation was to double the
> mds_bal_fragment_size_max (default 100k) to 200k, which worked nicely
> for them. Also note the mentioned correlation between
> mds_bal_fragment_size_max and mds_cache_memory_limit.
>
> Regards,
> Eugen
>
> [0]
> https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.suse.com%2Fde-de%2Fsupport%2Fkb%2Fdoc%2F%3Fid%3D20569&data=05%7C02%7Cjinfeng.biao%40cba.com.au%7Cd2558d1e23384b02c9f308dd4e6cc82a%7Cdddffba06c174f3497483fa5e08cc366%7C0%7C0%7C638752951318780733%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=GXn4vSVpmchKLA3IWFZx0HTTRlo%2FHHimg82inRUnHy4%3D&reserved=0
> <https://www.suse.com/de-de/support/kb/doc/?id=20569>
>
> Zitat von jinfeng.b...@cba.com.au:
>
> > Hello folks,
> >
> > We had an issue with the num_strays hit 1 million recently. As a
> > workaround, max bal was increased to 125,000.
> >
> > The stray_num keeps growing at 25k per day.  After a recent
> > observation of 10TiB file deletion,  the relevant application was
> > stopped.
> >
> > Then we increased purging options to below values
> >
> >   mdsadvanced filer_max_purge_ops
> 40
> >   mdsadvanced mds_max_purge_files
>  1024
> >   mdsadvanced mds_max_purge_ops
> 32768
> >   mdsadvanced mds_max_purge_ops_per_pg  3
> >
> > And run "du -hsx" to the top level directory mounted to the app that
> > does massive deletion.
> >
> > Despite all above, strays still growing at 60K per day.
> >
> > There are a lot more applications using this CephFS filesystem  and
> > only this app is observed perform deletion at this scale.
> >
> > I'm wondering what would be the effective way to cleanup the strays
> > in this situation  while making the least impact to production.
> >
> > Note: We are on 14.2.6
> >
> > thanks
> > James Biao
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ** IMPORTANT MESSAGE **
> This e-mail message is intended only for the addressee(s) and contains
> information which may be confidential.
> If you are not the intended recipient please advise the sender by return
> email, do not use or disclose the contents, and delete the message and any
> attachments from your system. Unless specifically indicated, this email
> does not constitute formal advice or commitment by the sender or the
> Commonwealth Bank of Australia (ABN 48 123 123 124 AFSL and Australian
> credit licence 234945) or its subsidiaries.
> We can be contacted through our web site: commbank.com.au. If you no
> longer wish to receive commercial electronic messages from us, please reply
> to this e-mail by typing Opt Out in the subject line.
> **
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io