We decided to go ahead and try truncating the journal, but before we did, we would try to back it up. However, there are ridiculous values in the header. It can't write a journal this large because (I presume) my ext4 filesystem can't seek to this position in the (sparse) file.
I would not be surprised to learn that memory allocation is trying to do something similar, hence the allocation of all available memory. This seems like a new kind of journal corruption that isn't being reported correctly. [root@lima /]# time cephfs-journal-tool --cluster=prodstore journal export backup.bin journal is 24652730602129~673601102 2019-04-01 17:49:52.776977 7fdcb999e040 -1 Error 22 ((22) Invalid argument) seeking to 0x166be9401291 Error ((22) Invalid argument) real 0m27.832s user 0m2.028s sys 0m3.438s [root@lima /]# cephfs-journal-tool --cluster=prodstore event get summary Events by type: EXPORT: 187 IMPORTFINISH: 182 IMPORTSTART: 182 OPEN: 3133 SUBTREEMAP: 129 UPDATE: 42185 Errors: 0 [root@lima /]# cephfs-journal-tool --cluster=prodstore header get { "magic": "ceph fs volume v011", "write_pos": 24653404029749, "expire_pos": 24652730602129, "trimmed_pos": 24652730597376, "stream_format": 1, "layout": { "stripe_unit": 4194304, "stripe_count": 1, "object_size": 4194304, "pool_id": 2, "pool_ns": "" } } [root@lima /]# printf "%x\n" "24653404029749" 166c1163c335 [root@lima /]# printf "%x\n" "24652730602129" 166be9401291
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com