Den fre 12 apr. 2019 kl 16:37 skrev Jason Dillaman <jdill...@redhat.com>:

> On Fri, Apr 12, 2019 at 9:52 AM Magnus Grönlund <mag...@gronlund.se>
> wrote:
> >
> > Hi Jason,
> >
> > Tried to follow the instructions and setting the debug level to 15
> worked OK, but the daemon appeared to silently ignore the restart command
> (nothing indicating a restart seen in the log).
> > So I set the log level to 15 in the config file and restarted the rbd
> mirror daemon. The output surprised me though, my previous perception of
> the issue might be completely wrong...
> > Lots of "image_replayer::BootstrapRequest:.... failed to create local
> image: (2) No such file or directory" and ":ImageReplayer: ....  replay
> encountered an error: (42) No message of desired type"
>
> What is the result from "rbd mirror pool status --verbose nova"
> against your DR cluster now? Are they in up+error now? The ENOENT
> errors most likely related to a parent image that hasn't been
> mirrored. The ENOMSG error seems to indicate that there might be some
> corruption in a journal and it's missing expected records (like a
> production client crashed), but it should be able to recover from
> that
>

# rbd mirror pool status --verbose nova
health: WARNING
images: 2479 total
    2479 unknown

002344ab-c324-4c01-97ff-de32868fa712_disk:
  global_id:   c02e0202-df8f-46ce-a4b6-1a50a9692804
  state:       down+unknown
  description: status not found
  last_update:

002a8fde-3a63-4e32-9c18-b0bf64393d0f_disk:
  global_id:   d412abc4-b37e-44a2-8aba-107f352dec60
  state:       down+unknown
  description: status not found
  last_update:

<Repeat 2477 times>



> > https://pastebin.com/1bTETNGs
> >
> > Best regards
> > /Magnus
> >
> > Den tis 9 apr. 2019 kl 18:35 skrev Jason Dillaman <jdill...@redhat.com>:
> >>
> >> Can you pastebin the results from running the following on your backup
> >> site rbd-mirror daemon node?
> >>
> >> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 15
> >> ceph --admin-socket /path/to/asok rbd mirror restart nova
> >> .... wait a minute to let some logs accumulate ...
> >> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 0/5
> >>
> >> ... and collect the rbd-mirror log from /var/log/ceph/ (should have
> >> lots of "rbd::mirror"-like log entries.
> >>
> >>
> >> On Tue, Apr 9, 2019 at 12:23 PM Magnus Grönlund <mag...@gronlund.se>
> wrote:
> >> >
> >> >
> >> >
> >> > Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman <
> jdill...@redhat.com>:
> >> >>
> >> >> Any chance your rbd-mirror daemon has the admin sockets available
> >> >> (defaults to /var/run/ceph/cephdr-client.<id>.<pid>.<random>.asok)?
> If
> >> >> so, you can run "ceph --admin-daemon /path/to/asok rbd mirror
> status".
> >> >
> >> >
> >> > {
> >> >     "pool_replayers": [
> >> >         {
> >> >             "pool": "glance",
> >> >             "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00
> cluster: production client: client.productionbackup",
> >> >             "instance_id": "869081",
> >> >             "leader_instance_id": "869081",
> >> >             "leader": true,
> >> >             "instances": [],
> >> >             "local_cluster_admin_socket":
> "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok",
> >> >             "remote_cluster_admin_socket":
> "/var/run/ceph/client.productionbackup.1936211.production.94225675210000.asok",
> >> >             "sync_throttler": {
> >> >                 "max_parallel_syncs": 5,
> >> >                 "running_syncs": 0,
> >> >                 "waiting_syncs": 0
> >> >             },
> >> >             "image_replayers": [
> >> >                 {
> >> >                     "name":
> "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0",
> >> >                     "state": "Replaying"
> >> >                 },
> >> >                 {
> >> >                     "name":
> "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62",
> >> >                     "state": "Replaying"
> >> >                 },
> >> > -------------------cut----------
> >> >                 {
> >> >                     "name":
> "cinder/volume-bcb41f46-3716-4ee2-aa19-6fbc241fbf05",
> >> >                     "state": "Replaying"
> >> >                 }
> >> >             ]
> >> >         },
> >> >          {
> >> >             "pool": "nova",
> >> >             "peer": "uuid: 1fc7fefc-9bcb-4f36-a259-66c3d8086702
> cluster: production client: client.productionbackup",
> >> >             "instance_id": "889074",
> >> >             "leader_instance_id": "889074",
> >> >             "leader": true,
> >> >             "instances": [],
> >> >             "local_cluster_admin_socket":
> "/var/run/ceph/client.backup.1936211.backup.94225678548048.asok",
> >> >             "remote_cluster_admin_socket":
> "/var/run/ceph/client.productionbackup.1936211.production.94225679621728.asok",
> >> >             "sync_throttler": {
> >> >                 "max_parallel_syncs": 5,
> >> >                 "running_syncs": 0,
> >> >                 "waiting_syncs": 0
> >> >             },
> >> >             "image_replayers": []
> >> >         }
> >> >     ],
> >> >     "image_deleter": {
> >> >         "image_deleter_status": {
> >> >             "delete_images_queue": [
> >> >                 {
> >> >                     "local_pool_id": 3,
> >> >                     "global_image_id":
> "ff531159-de6f-4324-a022-50c079dedd45"
> >> >                 }
> >> >             ],
> >> >             "failed_deletes_queue": []
> >> >         }
> >> >>
> >> >>
> >> >> On Tue, Apr 9, 2019 at 11:26 AM Magnus Grönlund <mag...@gronlund.se>
> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman <
> jdill...@redhat.com>:
> >> >> >>
> >> >> >> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund <
> mag...@gronlund.se> wrote:
> >> >> >> >
> >> >> >> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund <
> mag...@gronlund.se> wrote:
> >> >> >> > >>
> >> >> >> > >> Hi,
> >> >> >> > >> We have configured one-way replication of pools between a
> production cluster and a backup cluster. But unfortunately the rbd-mirror
> or the backup cluster is unable to keep up with the production cluster so
> the replication fails to reach replaying state.
> >> >> >> > >
> >> >> >> > >Hmm, it's odd that they don't at least reach the replaying
> state. Are
> >> >> >> > >they still performing the initial sync?
> >> >> >> >
> >> >> >> > There are three pools we try to mirror, (glance, cinder, and
> nova, no points for guessing what the cluster is used for :) ),
> >> >> >> > the glance and cinder pools are smaller and sees limited write
> activity, and the mirroring works, the nova pool which is the largest and
> has 90% of the write activity never leaves the "unknown" state.
> >> >> >> >
> >> >> >> > # rbd mirror pool status cinder
> >> >> >> > health: OK
> >> >> >> > images: 892 total
> >> >> >> >     890 replaying
> >> >> >> >     2 stopped
> >> >> >> > #
> >> >> >> > # rbd mirror pool status nova
> >> >> >> > health: WARNING
> >> >> >> > images: 2479 total
> >> >> >> >     2479 unknown
> >> >> >> > #
> >> >> >> > The production clsuter has 5k writes/s on average and the
> backup cluster has 1-2k writes/s on average. The production cluster is
> bigger and has better specs. I thought that the backup cluster would be
> able to keep up but it looks like I was wrong.
> >> >> >>
> >> >> >> The fact that they are in the unknown state just means that the
> remote
> >> >> >> "rbd-mirror" daemon hasn't started any journal replayers against
> the
> >> >> >> images. If it couldn't keep up, it would still report a status of
> >> >> >> "up+replaying". What Ceph release are you running on your backup
> >> >> >> cluster?
> >> >> >>
> >> >> > The backup cluster is running Luminous 12.2.11 (the production
> cluster 12.2.10)
> >> >> >
> >> >> >>
> >> >> >> > >> And the journals on the rbd volumes keep growing...
> >> >> >> > >>
> >> >> >> > >> Is it enough to simply disable the mirroring of the pool
> (rbd mirror pool disable <pool>) and that will remove the lagging reader
> from the journals and shrink them, or is there anything else that has to be
> done?
> >> >> >> > >
> >> >> >> > >You can either disable the journaling feature on the image(s)
> since
> >> >> >> > >there is no point to leave it on if you aren't using
> mirroring, or run
> >> >> >> > >"rbd mirror pool disable <pool>" to purge the journals.
> >> >> >> >
> >> >> >> > Thanks for the confirmation.
> >> >> >> > I will stop the mirror of the nova pool and try to figure out
> if there is anything we can do to get the backup cluster to keep up.
> >> >> >> >
> >> >> >> > >> Best regards
> >> >> >> > >> /Magnus
> >> >> >> > >> _______________________________________________
> >> >> >> > >> ceph-users mailing list
> >> >> >> > >> ceph-users@lists.ceph.com
> >> >> >> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >> > >
> >> >> >> > >--
> >> >> >> > >Jason
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Jason
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jason
> >>
> >>
> >>
> >> --
> >> Jason
>
>
>
> --
> Jason
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to