[ceph-users] Re: Data loss on appends, prod outage

Frank Schilder Tue, 07 Sep 2021 16:30:50 -0700

Hi Nathan,

could be a regression. The write append bug was a known issue for older kernel 
clients. I can try to find the link. We have one of the affected kernel 
versions and asked our users to use a single node for all writes to a file.


In general, for distributed/parallel file systems, this is expected behaviour. 
It is explicitly the duty of the developer to coordinate write processes to 
prevent simultaneous writes to the same file area. In principle, ceph fs should 
do this for you using client CAPs, however, I personally think at a way too 
high performance cost.

Depending on what you write to, try to use a collector process on a single node 
that collects all input from the remote nodes and writes a single consistent 
stream. Basically what rsyslogd does (maybe you can abuse rsyslogd?). This will 
also greatly outperform any such application running on a non-broken ceph-fs 
client, because the coordination of meta data updates and write locks between 
clients is unreasonably expensive.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Nathan Fish <lordci...@gmail.com>
Sent: 07 September 2021 21:17:05
To: ceph-users
Subject: [ceph-users] Data loss on appends, prod outage

As of this morning, when two CephFS clients append to the same file in
quick succession, one append sometimes overwrites the other. This
happens on some clients but not others; we're still trying to track
down the pattern, if any.  We've failed all production filesystems to
prevent further data loss. We added 3 new OSD servers last week, they
finished backfilling a few days ago. Servers are Ubuntu 18.04, clients
mostly 18.04 and 20.04, with HWE kernels (5.4 and 5.11 respectively).
Ceph was upgraded from nautilus to octopus months ago. There were no
relevant errors or even warnings in "ceph health" before we stopped
the filesystems:

HEALTH_ERR mons are allowing insecure global_id reclaim; 20 OSD(s)
experiencing BlueFS spillover; 6 filesystems are degraded; 6
filesystems are offline

ceph versions
{
    "mon": {
        "ceph version 15.2.14
(cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 3
    },
    "mgr": {
        "ceph version 15.2.14
(cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 3
    },
    "osd": {
        "ceph version 15.2.14
(cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 200
    },
    "mds": {
        "ceph version 15.2.14
(cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 48
    },
    "rgw": {
        "ceph version 15.2.13
(c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 1
    },
    "overall": {
        "ceph version 15.2.13
(c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 1,
        "ceph version 15.2.14
(cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 254
    }
}

I looked for bugs on the tracker but didn't see anything that seemed
like our issue. Any advice would be appreciated.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Data loss on appends, prod outage

Reply via email to