Den fre 12 apr. 2019 kl 16:37 skrev Jason Dillaman <jdill...@redhat.com>:
> On Fri, Apr 12, 2019 at 9:52 AM Magnus Grönlund <mag...@gronlund.se> > wrote: > > > > Hi Jason, > > > > Tried to follow the instructions and setting the debug level to 15 > worked OK, but the daemon appeared to silently ignore the restart command > (nothing indicating a restart seen in the log). > > So I set the log level to 15 in the config file and restarted the rbd > mirror daemon. The output surprised me though, my previous perception of > the issue might be completely wrong... > > Lots of "image_replayer::BootstrapRequest:.... failed to create local > image: (2) No such file or directory" and ":ImageReplayer: .... replay > encountered an error: (42) No message of desired type" > > What is the result from "rbd mirror pool status --verbose nova" > against your DR cluster now? Are they in up+error now? The ENOENT > errors most likely related to a parent image that hasn't been > mirrored. The ENOMSG error seems to indicate that there might be some > corruption in a journal and it's missing expected records (like a > production client crashed), but it should be able to recover from > that > # rbd mirror pool status --verbose nova health: WARNING images: 2479 total 2479 unknown 002344ab-c324-4c01-97ff-de32868fa712_disk: global_id: c02e0202-df8f-46ce-a4b6-1a50a9692804 state: down+unknown description: status not found last_update: 002a8fde-3a63-4e32-9c18-b0bf64393d0f_disk: global_id: d412abc4-b37e-44a2-8aba-107f352dec60 state: down+unknown description: status not found last_update: <Repeat 2477 times> > > https://pastebin.com/1bTETNGs > > > > Best regards > > /Magnus > > > > Den tis 9 apr. 2019 kl 18:35 skrev Jason Dillaman <jdill...@redhat.com>: > >> > >> Can you pastebin the results from running the following on your backup > >> site rbd-mirror daemon node? > >> > >> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 15 > >> ceph --admin-socket /path/to/asok rbd mirror restart nova > >> .... wait a minute to let some logs accumulate ... > >> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 0/5 > >> > >> ... and collect the rbd-mirror log from /var/log/ceph/ (should have > >> lots of "rbd::mirror"-like log entries. > >> > >> > >> On Tue, Apr 9, 2019 at 12:23 PM Magnus Grönlund <mag...@gronlund.se> > wrote: > >> > > >> > > >> > > >> > Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman < > jdill...@redhat.com>: > >> >> > >> >> Any chance your rbd-mirror daemon has the admin sockets available > >> >> (defaults to /var/run/ceph/cephdr-client.<id>.<pid>.<random>.asok)? > If > >> >> so, you can run "ceph --admin-daemon /path/to/asok rbd mirror > status". > >> > > >> > > >> > { > >> > "pool_replayers": [ > >> > { > >> > "pool": "glance", > >> > "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 > cluster: production client: client.productionbackup", > >> > "instance_id": "869081", > >> > "leader_instance_id": "869081", > >> > "leader": true, > >> > "instances": [], > >> > "local_cluster_admin_socket": > "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok", > >> > "remote_cluster_admin_socket": > "/var/run/ceph/client.productionbackup.1936211.production.94225675210000.asok", > >> > "sync_throttler": { > >> > "max_parallel_syncs": 5, > >> > "running_syncs": 0, > >> > "waiting_syncs": 0 > >> > }, > >> > "image_replayers": [ > >> > { > >> > "name": > "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0", > >> > "state": "Replaying" > >> > }, > >> > { > >> > "name": > "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62", > >> > "state": "Replaying" > >> > }, > >> > -------------------cut---------- > >> > { > >> > "name": > "cinder/volume-bcb41f46-3716-4ee2-aa19-6fbc241fbf05", > >> > "state": "Replaying" > >> > } > >> > ] > >> > }, > >> > { > >> > "pool": "nova", > >> > "peer": "uuid: 1fc7fefc-9bcb-4f36-a259-66c3d8086702 > cluster: production client: client.productionbackup", > >> > "instance_id": "889074", > >> > "leader_instance_id": "889074", > >> > "leader": true, > >> > "instances": [], > >> > "local_cluster_admin_socket": > "/var/run/ceph/client.backup.1936211.backup.94225678548048.asok", > >> > "remote_cluster_admin_socket": > "/var/run/ceph/client.productionbackup.1936211.production.94225679621728.asok", > >> > "sync_throttler": { > >> > "max_parallel_syncs": 5, > >> > "running_syncs": 0, > >> > "waiting_syncs": 0 > >> > }, > >> > "image_replayers": [] > >> > } > >> > ], > >> > "image_deleter": { > >> > "image_deleter_status": { > >> > "delete_images_queue": [ > >> > { > >> > "local_pool_id": 3, > >> > "global_image_id": > "ff531159-de6f-4324-a022-50c079dedd45" > >> > } > >> > ], > >> > "failed_deletes_queue": [] > >> > } > >> >> > >> >> > >> >> On Tue, Apr 9, 2019 at 11:26 AM Magnus Grönlund <mag...@gronlund.se> > wrote: > >> >> > > >> >> > > >> >> > > >> >> > Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman < > jdill...@redhat.com>: > >> >> >> > >> >> >> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund < > mag...@gronlund.se> wrote: > >> >> >> > > >> >> >> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund < > mag...@gronlund.se> wrote: > >> >> >> > >> > >> >> >> > >> Hi, > >> >> >> > >> We have configured one-way replication of pools between a > production cluster and a backup cluster. But unfortunately the rbd-mirror > or the backup cluster is unable to keep up with the production cluster so > the replication fails to reach replaying state. > >> >> >> > > > >> >> >> > >Hmm, it's odd that they don't at least reach the replaying > state. Are > >> >> >> > >they still performing the initial sync? > >> >> >> > > >> >> >> > There are three pools we try to mirror, (glance, cinder, and > nova, no points for guessing what the cluster is used for :) ), > >> >> >> > the glance and cinder pools are smaller and sees limited write > activity, and the mirroring works, the nova pool which is the largest and > has 90% of the write activity never leaves the "unknown" state. > >> >> >> > > >> >> >> > # rbd mirror pool status cinder > >> >> >> > health: OK > >> >> >> > images: 892 total > >> >> >> > 890 replaying > >> >> >> > 2 stopped > >> >> >> > # > >> >> >> > # rbd mirror pool status nova > >> >> >> > health: WARNING > >> >> >> > images: 2479 total > >> >> >> > 2479 unknown > >> >> >> > # > >> >> >> > The production clsuter has 5k writes/s on average and the > backup cluster has 1-2k writes/s on average. The production cluster is > bigger and has better specs. I thought that the backup cluster would be > able to keep up but it looks like I was wrong. > >> >> >> > >> >> >> The fact that they are in the unknown state just means that the > remote > >> >> >> "rbd-mirror" daemon hasn't started any journal replayers against > the > >> >> >> images. If it couldn't keep up, it would still report a status of > >> >> >> "up+replaying". What Ceph release are you running on your backup > >> >> >> cluster? > >> >> >> > >> >> > The backup cluster is running Luminous 12.2.11 (the production > cluster 12.2.10) > >> >> > > >> >> >> > >> >> >> > >> And the journals on the rbd volumes keep growing... > >> >> >> > >> > >> >> >> > >> Is it enough to simply disable the mirroring of the pool > (rbd mirror pool disable <pool>) and that will remove the lagging reader > from the journals and shrink them, or is there anything else that has to be > done? > >> >> >> > > > >> >> >> > >You can either disable the journaling feature on the image(s) > since > >> >> >> > >there is no point to leave it on if you aren't using > mirroring, or run > >> >> >> > >"rbd mirror pool disable <pool>" to purge the journals. > >> >> >> > > >> >> >> > Thanks for the confirmation. > >> >> >> > I will stop the mirror of the nova pool and try to figure out > if there is anything we can do to get the backup cluster to keep up. > >> >> >> > > >> >> >> > >> Best regards > >> >> >> > >> /Magnus > >> >> >> > >> _______________________________________________ > >> >> >> > >> ceph-users mailing list > >> >> >> > >> ceph-users@lists.ceph.com > >> >> >> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> >> >> > > > >> >> >> > >-- > >> >> >> > >Jason > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Jason > >> >> > >> >> > >> >> > >> >> -- > >> >> Jason > >> > >> > >> > >> -- > >> Jason > > > > -- > Jason >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com