Hi,

I'm fairly new to cephfs, on my new job there is a cephfs-cluster that I have 
to administer.

The problem is, I can't write from some clients to the cephfs-mount.
When I try from the specific clients I get in the Logfile:
>Apr 24 13:14:00 cuda002 kernel: ceph: mds0 hung
>Apr 24 13:14:00 cuda002 kernel: ceph: mds0 caps stale
>Apr 24 13:14:00 cuda002 kernel: ceph: mds0 came back
>Apr 24 13:14:00 cuda002 kernel: ceph: mds0 caps still stale

Restarting the mds doesn't make any difference.

>ceph -s
Says:
[root@cuda001:/var/log/ceph]# ceph -s
    cluster cde1487e-f930-417a-9403-28e9ebf406b8
     health HEALTH_OK
     monmap e6: 1 mons at {cephcontrol=172.22.12.241:6789/0}
            election epoch 1, quorum 0 cephcontrol
     mdsmap e1574: 1/1/1 up {0=A1214-2950-01=up:active}
     osdmap e9571: 6 osds: 6 up, 6 in
      pgmap v11438317: 320 pgs, 3 pools, 20427 GB data, 7102 kobjects
            62100 GB used, 52968 GB / 112 TB avail
                 319 active+clean
                   1 active+clean+scrubbing+deep


So everything should be right, but it is not working.

The only thing I found that is different to the other hosts is when I do:
> ceph daemon mds.A1214-2950-01 session ls

On the working clients I get:
    {
        "id": 670317,
        "num_leases": 0,
        "num_caps": 35386,
        "state": "open",
        "replay_requests": 0,
        "reconnecting": false,
        "inst": "client.670317 172.22.7.52:0\/4290071627",
        "client_metadata": {
            "ceph_sha1": "mySHA1-ID",
            "ceph_version": "ceph version 0.94.9 (mySHA1-ID)",
            "entity_id": "admin",
            "hostname": "PE8",
            "mount_point": "\/cephfs01"
        }


On the non-working clients it looks like:
    {
        "id": 670648,
        "num_leases": 0,
        "num_caps": 60,
        "state": "open",
        "replay_requests": 0,
        "reconnecting": false,
        "inst": "client.670648 172.22.20.5:0\/2770536198",
        "client_metadata": {
            "entity_id": "cephfs",
            "hostname": "slurmgate",
            "kernel_version": "3.10.0-514.16.1.el7.x86_64"
        }

The biggest difference is,
There are no 'ceph_sha1'-Entrys, no 'ceph_version'-Entrys also no 'mount_point' 
and the entity-id is also different.

Could someone please shed some light upon me what I did wrong?
The Guy who installed it, is no longer here and there is also no documentation.
I just try to mount it per automount/autofs.

If you need more info, just let me know.

Thanks in Advance,
Steininger Herbert

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to