[ceph-users] Re: Failing heartbeats when no backfill is running

2019-08-19 Thread Robert LeBlanc
Only other thing I can think of is that a firewall is dropping idle connections, although Ceph should be sending heartbeats more often then the common 5 minutes for most firewalls. In the logs is it showing the monitor marking the OSDs out or the OSD peers? That would give you an idea where to look

[ceph-users] Multisite RGW data corruption (not 14.2.1 curl issue)

2019-08-19 Thread vladimir
Hello, I have setup two separate Ceph clusters with RGW instance each and trying to achieve multisite data synchronization. Primary runs 13.2.5, slave runs 14.2.2 (I have upgraded slave side from 14.2.1 due to known data corruption during transfer due to curl errors). I have emptied slave zone

[ceph-users] cephfs creation error

2019-08-19 Thread Ramanathan S
Hi all, I just had created a ceph cluster to use cephfs. When i create the a ceph fs pool i get the filesystem below error. # ceph osd pool create cephfs_data 128 pool 'cephfs_data' created # ceph osd pool create cephfs_metadata 128 pool 'cephfs_metadata' created # ceph fs new cephfs cephfs_meta

[ceph-users] Re: cephfs creation error

2019-08-19 Thread Patrick Donnelly
Hello Ram, On Mon, Aug 19, 2019 at 9:51 AM Ramanathan S wrote: > mds: cephfs-0/0/1 up You have no MDS available. Are you not running the daemon? -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D __

[ceph-users] Re: Multisite RGW data corruption (not 14.2.1 curl issue)

2019-08-19 Thread Xiaoxi Chen
Yes there is no checksum check in RadosSync at this stage... we discussed a bit around it when handling the curl issue. The challenge is for multipart object, the e-tag is not the checksum of the object itself, instead , it is the checksum of the manifest. Special (internal) API is needed to expos

[ceph-users] Re: Multisite RGW data corruption (not 14.2.1 curl issue)

2019-08-19 Thread vladimir
That's a bit disappointing to say the least... It pretty much renders RGW unusable for production use at the moment then. Do you know how much work and when manifest checksumming is likely to happen? Should I hold my breath? In the mean time then perhaps I should remove corrupted objects from sl