Christian,

That indeed looks like the bug! We tried with moving the monitor
host/address into global and everything works as expected - see
https://github.com/deis/deis/issues/2711#issuecomment-66566318

This seems like a potentially bad bug - how has it not come up before?
Anything we can do to help with a patch?

Chris

On Wed, Dec 10, 2014 at 5:14 PM, Christian Balzer <ch...@gol.com> wrote:

>
> Hello,
>
> I think this might very well be my poor, unacknowledged bug report:
> http://tracker.ceph.com/issues/10012
>
> People with a mon_hosts entry in [global] (as created by ceph-deploy) will
> be fine, people with mons specified outside of [global] will not.
>
> Regards,
>
> Christian
>
> On Thu, 11 Dec 2014 00:49:03 +0000 Joao Eduardo Luis wrote:
>
> > On 12/10/2014 09:05 PM, Gregory Farnum wrote:
> > > What version is he running?
> > >
> > > Joao, does this make any sense to you?
> >
> >  From the MonMap code I'm pretty sure that the client should have built
> > the monmap from the [mon.X] sections, and solely based on 'mon addr'.
> >
> > 'mon_initial_members' is only useful to the monitors anyway, so it can
> > be disregarded.
> >
> > Thus, there are two ways for a client to build a monmap:
> > 1) based on 'mon_hosts' on the config (or -m on cli); or
> > 2) based on 'mon addr = ip1,ip2...' from the [mon.X] sections
> >
> > I don't see a 'mon hosts = ip1,ip2,...' on the config file, and I'm
> > assuming a '-m ip1,ip2...' has been supplied on the cli, so we would
> > have been left with the 'mon addr' options on each individual [mon.X]
> > section.
> >
> > We are left with two options here: assume there was unexpected behavior
> > on this code path -- logs or steps to reproduce would be appreciated in
> > this case! -- or assume something else failed:
> >
> > - are the ips on the remaining mon sections correct (nodo-1 && nodo-2)?
> > - were all the remaining monitors up and running when the failure
> > occurred?
> > - were the remaining monitors reachable by the client?
> >
> > In case you are able to reproduce this behavior, would be nice if you
> > could provide logs with 'debug monc = 10' and 'debug ms = 1'.
> >
> > Cheers!
> >
> >    -Joao
> >
> >
> > > -Greg
> > >
> > > On Wed, Dec 10, 2014 at 11:54 AM, Christopher Armstrong
> > > <ch...@opdemand.com> wrote:
> > >> Thanks Greg - I thought the same thing, but confirmed with the user
> > >> that it appears the radosgw client is indeed using initial members -
> > >> when he added all of his hosts to initial members, things worked just
> > >> fine. In either event, all of the monitors were always fully
> > >> enumerated later in the config file. Is this potentially a bug
> > >> specific to radosgw? Here's his config file:
> > >>
> > >> [global]
> > >> fsid = fc0e2e09-ade3-4ff6-b23e-f789775b2515
> > >> mon initial members = nodo-3
> > >> auth cluster required = cephx
> > >> auth service required = cephx
> > >> auth client required = cephx
> > >> osd pool default size = 3
> > >> osd pool default min_size = 1
> > >> osd pool default pg_num = 128
> > >> osd pool default pgp_num = 128
> > >> osd recovery delay start = 15
> > >> log file = /dev/stdout
> > >> mon_clock_drift_allowed = 1
> > >>
> > >>
> > >> [mon.nodo-1]
> > >> host = nodo-1
> > >> mon addr = 192.168.2.200:6789
> > >>
> > >> [mon.nodo-2]
> > >> host = nodo-2
> > >> mon addr = 192.168.2.201:6789
> > >>
> > >> [mon.nodo-3]
> > >> host = nodo-3
> > >> mon addr = 192.168.2.202:6789
> > >>
> > >>
> > >>
> > >> [client.radosgw.gateway]
> > >> host = deis-store-gateway
> > >> keyring = /etc/ceph/ceph.client.radosgw.keyring
> > >> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> > >> log file = /dev/stdout
> > >>
> > >>
> > >> On Wed, Dec 10, 2014 at 11:40 AM, Gregory Farnum <g...@gregs42.com>
> > >> wrote:
> > >>>
> > >>> On Tue, Dec 9, 2014 at 3:11 PM, Christopher Armstrong
> > >>> <ch...@opdemand.com> wrote:
> > >>>> Hi folks,
> > >>>>
> > >>>> I think we have a bit of confusion around how initial members is
> > >>>> used. I understand that we can specify a single monitor (or a
> > >>>> subset of monitors) so
> > >>>> that the cluster can form a quorum when it first comes up. This is
> > >>>> how we're
> > >>>> using the setting now - so the cluster can come up with just one
> > >>>> monitor,
> > >>>> with the other monitors to follow later.
> > >>>>
> > >>>> However, a Deis user reported that when the monitor in his initial
> > >>>> members
> > >>>> list went down, radosgw stopped functioning, even though there are
> > >>>> three mons in his config file. I would think that the radosgw
> > >>>> client would connect
> > >>>> to any of the nodes in the config file to get the state of the
> > >>>> cluster, and
> > >>>> that the initial members list is only used when the monitors first
> > >>>> come up
> > >>>> and are trying to achieve quorum.
> > >>>>
> > >>>> The issue he filed is here:
> https://github.com/deis/deis/issues/2711
> > >>>>
> > >>>> He also found this Ceph issue filed:
> > >>>> https://github.com/ceph/ceph/pull/1233
> > >>>
> > >>> Nope, this has nothing to do with it.
> > >>>
> > >>>>
> > >>>> Is that what we're seeing here? Can anyone point us in the right
> > >>>> direction?
> > >>>
> > >>> I didn't see the actual conf file posted anywhere to look at, but my
> > >>> guess is simply that (since it looks like you're using generated conf
> > >>> files which can differ across hosts) that the one on the server(s) in
> > >>> question don't have the monitors listed in them. I'm only skimming
> > >>> the code, but from it and my recollection, when a Ceph client starts
> > >>> up it will try to assemble a list of monitors to contact from:
> > >>> 1) the contents of the "mon host" config entry
> > >>> 2) the "mon addr" value in any of the "global", "mon" or "mon.X"
> > >>> sections
> > >>>
> > >>> The clients don't even look at mon_initial_members that I can see,
> > >>> actually — so perhaps your client config only lists the initial
> > >>> monitor, without adding the others?
> > >>> -Greg
> > >>
> > >>
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
> >
>
>
> --
> Christian Balzer        Network/Systems Engineer
> ch...@gol.com           Global OnLine Japan/Fusion Communications
> http://www.gol.com/
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to