If someone could point me to where this fix should go in the code, I'd
actually love to dive in - I've been wanting to contribute back to Ceph,
and this bug has hit us personally so I think it's a good candidate :)

On Wed, Dec 10, 2014 at 8:25 PM, Christopher Armstrong <ch...@opdemand.com>
wrote:

> We're running Ceph entirely in Docker containers, so we couldn't use
> ceph-deploy due to the requirement of having a process management daemon
> (upstart, in Ubuntu's case). So, I wrote things out and templated them
> myself following the documentation.
>
> Thanks for linking the bug, Christian! You saved us a lot of time and
> troubleshooting. I'll post a comment on the bug.
>
> Chris
>
> On Wed, Dec 10, 2014 at 8:18 PM, Christian Balzer <ch...@gol.com> wrote:
>
>> On Wed, 10 Dec 2014 20:09:01 -0800 Christopher Armstrong wrote:
>>
>> > Christian,
>> >
>> > That indeed looks like the bug! We tried with moving the monitor
>> > host/address into global and everything works as expected - see
>> > https://github.com/deis/deis/issues/2711#issuecomment-66566318
>> >
>> > This seems like a potentially bad bug - how has it not come up before?
>>
>> Ah, but as you can see from the issue report is has come up before.
>> But that discussion as well as that report clearly fell through the
>> cracks.
>>
>> It's another reason I dislike ceph-deploy, as people using just it
>> (probably the vast majority) will be unaffected as it stuffs everything
>> into [global].
>>
>> People reading the documentation examples or coming from older versions
>> (and making changes to their config) will get bitten.
>>
>> Christian
>>
>> > Anything we can do to help with a patch?
>> >
>> > Chris
>> >
>> > On Wed, Dec 10, 2014 at 5:14 PM, Christian Balzer <ch...@gol.com>
>> wrote:
>> >
>> > >
>> > > Hello,
>> > >
>> > > I think this might very well be my poor, unacknowledged bug report:
>> > > http://tracker.ceph.com/issues/10012
>> > >
>> > > People with a mon_hosts entry in [global] (as created by ceph-deploy)
>> > > will be fine, people with mons specified outside of [global] will not.
>> > >
>> > > Regards,
>> > >
>> > > Christian
>> > >
>> > > On Thu, 11 Dec 2014 00:49:03 +0000 Joao Eduardo Luis wrote:
>> > >
>> > > > On 12/10/2014 09:05 PM, Gregory Farnum wrote:
>> > > > > What version is he running?
>> > > > >
>> > > > > Joao, does this make any sense to you?
>> > > >
>> > > >  From the MonMap code I'm pretty sure that the client should have
>> > > > built the monmap from the [mon.X] sections, and solely based on 'mon
>> > > > addr'.
>> > > >
>> > > > 'mon_initial_members' is only useful to the monitors anyway, so it
>> > > > can be disregarded.
>> > > >
>> > > > Thus, there are two ways for a client to build a monmap:
>> > > > 1) based on 'mon_hosts' on the config (or -m on cli); or
>> > > > 2) based on 'mon addr = ip1,ip2...' from the [mon.X] sections
>> > > >
>> > > > I don't see a 'mon hosts = ip1,ip2,...' on the config file, and I'm
>> > > > assuming a '-m ip1,ip2...' has been supplied on the cli, so we would
>> > > > have been left with the 'mon addr' options on each individual
>> [mon.X]
>> > > > section.
>> > > >
>> > > > We are left with two options here: assume there was unexpected
>> > > > behavior on this code path -- logs or steps to reproduce would be
>> > > > appreciated in this case! -- or assume something else failed:
>> > > >
>> > > > - are the ips on the remaining mon sections correct (nodo-1 &&
>> > > > nodo-2)?
>> > > > - were all the remaining monitors up and running when the failure
>> > > > occurred?
>> > > > - were the remaining monitors reachable by the client?
>> > > >
>> > > > In case you are able to reproduce this behavior, would be nice if
>> you
>> > > > could provide logs with 'debug monc = 10' and 'debug ms = 1'.
>> > > >
>> > > > Cheers!
>> > > >
>> > > >    -Joao
>> > > >
>> > > >
>> > > > > -Greg
>> > > > >
>> > > > > On Wed, Dec 10, 2014 at 11:54 AM, Christopher Armstrong
>> > > > > <ch...@opdemand.com> wrote:
>> > > > >> Thanks Greg - I thought the same thing, but confirmed with the
>> > > > >> user that it appears the radosgw client is indeed using initial
>> > > > >> members - when he added all of his hosts to initial members,
>> > > > >> things worked just fine. In either event, all of the monitors
>> > > > >> were always fully enumerated later in the config file. Is this
>> > > > >> potentially a bug specific to radosgw? Here's his config file:
>> > > > >>
>> > > > >> [global]
>> > > > >> fsid = fc0e2e09-ade3-4ff6-b23e-f789775b2515
>> > > > >> mon initial members = nodo-3
>> > > > >> auth cluster required = cephx
>> > > > >> auth service required = cephx
>> > > > >> auth client required = cephx
>> > > > >> osd pool default size = 3
>> > > > >> osd pool default min_size = 1
>> > > > >> osd pool default pg_num = 128
>> > > > >> osd pool default pgp_num = 128
>> > > > >> osd recovery delay start = 15
>> > > > >> log file = /dev/stdout
>> > > > >> mon_clock_drift_allowed = 1
>> > > > >>
>> > > > >>
>> > > > >> [mon.nodo-1]
>> > > > >> host = nodo-1
>> > > > >> mon addr = 192.168.2.200:6789
>> > > > >>
>> > > > >> [mon.nodo-2]
>> > > > >> host = nodo-2
>> > > > >> mon addr = 192.168.2.201:6789
>> > > > >>
>> > > > >> [mon.nodo-3]
>> > > > >> host = nodo-3
>> > > > >> mon addr = 192.168.2.202:6789
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> [client.radosgw.gateway]
>> > > > >> host = deis-store-gateway
>> > > > >> keyring = /etc/ceph/ceph.client.radosgw.keyring
>> > > > >> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
>> > > > >> log file = /dev/stdout
>> > > > >>
>> > > > >>
>> > > > >> On Wed, Dec 10, 2014 at 11:40 AM, Gregory Farnum
>> > > > >> <g...@gregs42.com> wrote:
>> > > > >>>
>> > > > >>> On Tue, Dec 9, 2014 at 3:11 PM, Christopher Armstrong
>> > > > >>> <ch...@opdemand.com> wrote:
>> > > > >>>> Hi folks,
>> > > > >>>>
>> > > > >>>> I think we have a bit of confusion around how initial members
>> is
>> > > > >>>> used. I understand that we can specify a single monitor (or a
>> > > > >>>> subset of monitors) so
>> > > > >>>> that the cluster can form a quorum when it first comes up. This
>> > > > >>>> is how we're
>> > > > >>>> using the setting now - so the cluster can come up with just
>> one
>> > > > >>>> monitor,
>> > > > >>>> with the other monitors to follow later.
>> > > > >>>>
>> > > > >>>> However, a Deis user reported that when the monitor in his
>> > > > >>>> initial members
>> > > > >>>> list went down, radosgw stopped functioning, even though there
>> > > > >>>> are three mons in his config file. I would think that the
>> > > > >>>> radosgw client would connect
>> > > > >>>> to any of the nodes in the config file to get the state of the
>> > > > >>>> cluster, and
>> > > > >>>> that the initial members list is only used when the monitors
>> > > > >>>> first come up
>> > > > >>>> and are trying to achieve quorum.
>> > > > >>>>
>> > > > >>>> The issue he filed is here:
>> > > https://github.com/deis/deis/issues/2711
>> > > > >>>>
>> > > > >>>> He also found this Ceph issue filed:
>> > > > >>>> https://github.com/ceph/ceph/pull/1233
>> > > > >>>
>> > > > >>> Nope, this has nothing to do with it.
>> > > > >>>
>> > > > >>>>
>> > > > >>>> Is that what we're seeing here? Can anyone point us in the
>> right
>> > > > >>>> direction?
>> > > > >>>
>> > > > >>> I didn't see the actual conf file posted anywhere to look at,
>> > > > >>> but my guess is simply that (since it looks like you're using
>> > > > >>> generated conf files which can differ across hosts) that the one
>> > > > >>> on the server(s) in question don't have the monitors listed in
>> > > > >>> them. I'm only skimming the code, but from it and my
>> > > > >>> recollection, when a Ceph client starts up it will try to
>> > > > >>> assemble a list of monitors to contact from: 1) the contents of
>> > > > >>> the "mon host" config entry 2) the "mon addr" value in any of
>> > > > >>> the "global", "mon" or "mon.X" sections
>> > > > >>>
>> > > > >>> The clients don't even look at mon_initial_members that I can
>> > > > >>> see, actually — so perhaps your client config only lists the
>> > > > >>> initial monitor, without adding the others?
>> > > > >>> -Greg
>> > > > >>
>> > > > >>
>> > > > > _______________________________________________
>> > > > > ceph-users mailing list
>> > > > > ceph-users@lists.ceph.com
>> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > >
>> > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Christian Balzer        Network/Systems Engineer
>> > > ch...@gol.com           Global OnLine Japan/Fusion Communications
>> > > http://www.gol.com/
>> > >
>>
>>
>> --
>> Christian Balzer        Network/Systems Engineer
>> ch...@gol.com           Global OnLine Japan/Fusion Communications
>> http://www.gol.com/
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to