Need logs to check more on this.
Or just call a script during wakeup?
On Tue, 26 Mar 2024 at 04:16, wrote:
> Hi All,
>
> So I've got a Ceph Reef Cluster (latest version) with a CephFS system set
> up with a number of directories on it.
>
> On a Laptop (running Rocky Linux (latest version)) I've
Hello all.
I have a cluster with ~80TB of spinning disk. Its primary role is
cephfs. Recently I had a multiple drive failure (it was not
simultaneous) but it's left me with 20 incomplete pg's
I know this data is toast, but I need to be able to get what isn't
toast out of the cephfs. Well out of t
On 3/29/24 04:57, Erich Weiler wrote:
MDS logs show:
Mar 28 13:42:29 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
3676.400077 secs
Mar 28 13:42:30 pr-md-02.prism ceph-mds[1464328]:
mds.slugfs.pr-md-02.sbblqq Updat
Hi,
we have similar problems from time to time. Running Reef on servers and
latest ubuntu 20.04 hwe kernel on the clients.
There are probably two scenarios with slightly different observations:
1. MDS reports slow ops
Some client is holding caps for a certain file / directory and blocks
o
Hello Erich,
What you are experiencing is definitely a bug - but possibly a client
bug. Not sure. Upgrading Ceph packages on the clients, though, will
not help, because the actual CephFS client is the kernel. You can try
upgrading it to the latest 6.8.x (or, better, trying the same workload
from d
Running on Octopus:
While attempting to install a bunch of new OSDs on multiple hosts, I ran some
ceph orchestrator commands to install them, such as
ceph orch apply osd --all-available-devices
ceph orch apply osd -I HDD_drive_group.yaml
I assumed these were just helper processes, and they wou
Could there be an issue with the fact that the servers (MDS, MGR, MON,
OSD) are running reef and all the clients are running quincy?
I can easily enough get the new reef repo in for all our clients (Ubuntu
22.04) and upgrade the clients to reef if that might help..?
On 3/28/24 3:05 PM, Erich
I asked the user and they said no, no rsync involved. Although I
rsync'd 500TB into this filesystem in the beginning without incident, so
hopefully it's not a big deal here.
I'm asking the user what their workflow does to try and pin this down.
Are there any other known reason why a slow requ
Hello Erich,
Does the workload, by any chance, involve rsync? It is unfortunately
well-known for triggering such issues. A workaround is to export the
directory via NFS and run rsync against the NFS mount instead of
directly against CephFS.
On Fri, Mar 29, 2024 at 4:58 AM Erich Weiler wrote:
>
>
MDS logs show:
Mar 28 13:42:29 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
3676.400077 secs
Mar 28 13:42:30 pr-md-02.prism ceph-mds[1464328]:
mds.slugfs.pr-md-02.sbblqq Updating MDS map to version 22775 from mon.3
Wow those are extremely useful commands. Next time this happens I'll be
sure to use them. A quick test shows they work just great!
cheers,
erich
On 3/28/24 11:16 AM, Alexander E. Patrakov wrote:
Hi Erich,
Here is how to map the client ID to some extra info:
ceph tell mds.0 client ls id=994
Hey,
We make use of the ctdb_mutex_ceph_rados_helper so the lock file just gets
stored within CephFS metadata pool rather than on a shared CephFS mount as a
file.
We don't recommend storing directly on CephFS as if the mount hosting the lock
file is to go down we have seen the mds mark as stal
Hi Erich,
Here is how to map the client ID to some extra info:
ceph tell mds.0 client ls id=99445
Here is how to map inode ID to the path:
ceph tell mds.0 dump inode 0x100081b9ceb | jq -r .path
On Fri, Mar 29, 2024 at 1:12 AM Erich Weiler wrote:
>
> Here are some of the MDS logs:
>
> Mar 27 1
Here are some of the MDS logs:
Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
log [WRN] : slow request 511.703289 seconds old, received at
2024-03-27T18:49:53.623192+: client_request(client.99375:459393
getattr AsXsFs #0x100081b9ceb 2024-03-27T18:49:53.620806+
Thanks so much!
I'll give it a shot.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hello,
Maybe your issue depends to this https://tracker.ceph.com/issues/63642
On Wed, Mar 27, 2024 at 7:31 PM xu chenhui wrote:
> Hi, Eric Ivancich
> I have similar problem in ceph version 16.2.5. Has this problem been
> completely resolved in Pacific version?
> Our bucket has no lifecy
No, you can't use the image id for hte upgrade command, it has to be the
image name. So it should start, based on what you have,
registry.redhat.io/rhceph/. As for the full name, it depends which image
you want to go with. As for trying this on an OSD first, there is `ceph
orch daemon redeploy --i
On March 26, 2024 5:02:16 PM GMT+01:00, "Szabo, Istvan (Agoda)"
wrote:
>Hi,
>
>Wonder what we are missing from the netplan configuration on ubuntu which ceph
>needs to tolerate properly.
>We are using this bond configuration on ubuntu 20.04 with octopus ceph:
>
>
>bond1:
> macaddress: x
On Tue, Mar 26, 2024 at 7:30 PM Yongseok Oh
wrote:
> Hi,
>
> CephFS is provided as a shared file system service in a private cloud
> environment of our company, LINE. The number of sessions is approximately
> more than 5,000, and session evictions occur several times a day. When
> session evictio
The best way I would recommend is to disable sleep when the laptop lid
closes.
You will need to open the “logind.conf” file, (
sudo nano /etc/systemd/logind.conf
) and then will need to change
#HandleLidSwitch=suspend
to
HandleLidSwitch=ignore
Remove the #
Remember to save and exit.
You ca
So the client session was dropped when the laptop went into sleep mode,
maybe what could've happened is that since the client is silent; it failed
to renew its caps in time and hit `session_autoclose` (defaults to 300
secs) and thus got evicted. As Kotresh mentioned, client logs would reveal
better
I think the client should reconnect when it's out of sleep. Could you
please share the client logs to check what's happening?
On Tue, Mar 26, 2024 at 4:16 AM wrote:
> Hi All,
>
> So I've got a Ceph Reef Cluster (latest version) with a CephFS system set
> up with a number of directories on it.
Luis Domingues
Proton AG
On Thursday, 28 March 2024 at 10:10, Sridhar Seshasayee
wrote:
> Hi Luis,
>
> > So our question, is mClock taking into account the reads as well as the
> > writes? Or are the reads calculate to be less expensive than the writes?
>
> mClock treats both reads and
Hi Luis,
> So our question, is mClock taking into account the reads as well as the
> writes? Or are the reads calculate to be less expensive than the writes?
>
>
mClock treats both reads and writes equally. When you say "massive reads",
do you mean a predominantly
read workload? Also, the size of
24 matches
Mail list logo