Re: [ceph-users] Importance of Stable Mon and OSD IPs

Mayank Kumar Wed, 31 Jan 2018 22:21:46 -0800

Thanks Gregory and Burkhard

In kubernetes we use rbd create  and rbd map/unmap commands. In this
perspective are you referring to rbd as the client or after the image is
created and mapped, is there a different client running inside the kernel
that you are referring to which can get osd and mon updates ?


My question is mainly after we have run the rbd ccreate and rbd map
commands, does a  client still exsit or its gone ? If the rbd image is
mapped on a host and then if osd or mon ips change , what happens in this
case ?

-Mayank

On Mon, Jan 29, 2018 at 10:25 AM, Gregory Farnum <gfar...@redhat.com> wrote:

> Ceph assumes monitor IP addresses are stable, as they're the identity for
> the monitor and clients need to know them to connect.
>
> Clients maintain a TCP connection to the monitors while they're running,
> and monitors publish monitor maps containing all the known monitors in the
> cluster. These are pushed out to running clients over those stable
> connections whenever the map changes. When a client isn't connected to the
> cluster, it relies on the monitor IP address(es) in its ceph.conf (or
> supplied on the command line) to connect. I'm not sure about Kubernetes,
> but in OpenStack the monitor IPs need to remain stable once an RBD image is
> configured because they're permanently stored in the config. (Or you can
> update the OpenStack config data, but it takes some pretty serious doing
> and doesn't have good tooling.)
>
> There's more to the monitor IPs than just the clients though. Like I said,
> the IP is considered the monitor's identity. I'm not sure offhand what
> happens if you change it and then boot up an existing store; it may
> automatically connect or you may need to do some manual commands. Either
> way, the prior IP will certainly remain in the monitor map (unless you or
> kubernetes does something to remove it) and that means you've added a
> "monitor" that nobody will ever be able to connect to. Do that to all of
> the monitors, and they won't be able to do any paxos consensus and things
> will grind to a halt.
>
> In contrast, the OSD IPs don't matter at all on their own. I'd just be
> worried about if whatever's changing the IP also changes the hostname or
> otherwise causes the OSD to move around in the crush map, as that will
> generate a great deal of data movement.
> -Greg
>
> On Fri, Jan 26, 2018 at 11:50 AM Mayank Kumar <krmaya...@gmail.com> wrote:
>
>> Resending in case this email was lost
>>
>> On Tue, Jan 23, 2018 at 10:50 PM Mayank Kumar <krmaya...@gmail.com>
>> wrote:
>>
>>> Thanks Burkhard for the detailed explanation. Regarding the following:-
>>>
>>> >>>The ceph client (librbd accessing a volume in this case) gets
>>> asynchronous notification from the ceph mons in case of relevant changes,
>>> e.g. updates to the osd map reflecting the failure of an OSD.
>>> i have some more questions:-
>>> 1:  Does the asynchronous notification for both osdmap and monmap comes
>>> from mons ?
>>> 2:  Are these asynchronous notifications retriable ?
>>> 3: Is it possible that the these asynchronous notifications are lost  ?
>>> 4: Does the monmap and osdmap reside in the kernel or user space ? The
>>> reason i am asking is , for a rbd volume that is already mounted on a host,
>>> will it continue to receive those asynchronoous notifications for changes
>>> to both osd and mon ips or not ? If All mon ips change,  but the mon
>>> configuration file is updated to reflect the new mon ips, should the
>>> existing rbd volume mounted still be able to contact the osd's and mons or
>>> is there some form of caching in the kernel space for an already mounted
>>> rbd volume
>>>
>>>
>>> Some more context for why i am getting all these doubts:-
>>> We internally had a ceph cluster with rbd volumes being provisioned by
>>> Kubernetes. With existing rbd volumes already mounted , we wiped out the
>>> old ceph cluster and created a brand new ceph cluster . But the existing
>>> rbd volumes from the old cluster still remained. Any kubernetes pods that
>>> landed on the same host as an old rbd volume would not create because the
>>> volume failed to attach and mount. Looking at the kernel messages we saw
>>> the following:-
>>>
>>> -- Logs begin at Fri 2018-01-19 02:05:38 GMT, end at Fri 2018-01-19
>>> 19:23:14 GMT. --
>>>
>>> Jan 19 19:20:39 host1.com kernel: *libceph: osd2 10.231.171.131:6808
>>> <http://10.231.171.131:6808/> socket closed (con state CONNECTING)*
>>>
>>> Jan 19 19:18:30 host1.com kernel: *libceph: osd28 10.231.171.52:6808
>>> <http://10.231.171.52:6808/> socket closed (con state CONNECTING)*
>>>
>>> Jan 19 19:18:30 host1.com kernel: *libceph: osd0 10.231.171.131:6800
>>> <http://10.231.171.131:6800/> socket closed (con state CONNECTING)*
>>>
>>> Jan 19 19:15:40 host1.com kernel: *libceph: osd21 10.231.171.99:6808
>>> <http://10.231.171.99:6808/> wrong peer at address*
>>>
>>> Jan 19 19:15:40 host1.com kernel: *libceph: wrong peer,
>>> want 10.231.171.99:6808/42661 <http://10.231.171.99:6808/42661>,
>>> got 10.231.171.99:6808/73168 <http://10.231.171.99:6808/73168>*
>>>
>>> Jan 19 19:15:34 host1.com kernel: *libceph: osd11 10.231.171.114:6816
>>> <http://10.231.171.114:6816/> wrong peer at address*
>>>
>>> Jan 19 19:15:34 host1.com kernel: *libceph: wrong peer,
>>> want 10.231.171.114:6816/130908 <http://10.231.171.114:6816/130908>,
>>> got 10.231.171.114:6816/85562 <http://10.231.171.114:6816/85562>*
>>>
>>> The Ceph cluster had new osd ip and mon ips.
>>>
>>> So my questions, since these messages are coming from the kernel module,
>>> why cant the kernel module figure out that the mon and osd ips have
>>> changed. Is there some caching in the kernel ? when rbd create/attach is
>>> called on that host, it is passed new mon ips , so doesnt that update the
>>> old already mounted rbd volumes.
>>>
>>> Hope i made my doubts clear and yes i am a beginner in Ceph with very
>>> limited knowledge.
>>>
>>> Thanks for your help again
>>> Mayank
>>>
>>>
>>> On Tue, Jan 23, 2018 at 1:24 AM, Burkhard Linke <
>>> burkhard.li...@computational.bio.uni-giessen.de> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> On 01/23/2018 09:53 AM, Mayank Kumar wrote:
>>>>
>>>>> Hi Ceph Experts
>>>>>
>>>>> I am a new user of Ceph and currently using Kubernetes to deploy Ceph
>>>>> RBD Volumes. We our doing some initial work rolling it out to internal
>>>>> customers and in doing that we are using the ip of the host as the ip of
>>>>> the osd and mons. This means if a host goes down , we loose that ip. While
>>>>> we are still experimenting with these behaviors, i wanted to see what the
>>>>> community thinks for the following scenario :-
>>>>>
>>>>> 1: a rbd volume is already attached and mounted on host A
>>>>> 2: the osd on which this rbd volume resides, dies and never comes back
>>>>> up
>>>>> 3: another osd is replaced in its place. I dont know the intricacies
>>>>> here, but i am assuming the data for this rbd volume either moves to
>>>>> different osd's or goes back to the newly installed osd
>>>>> 4: the new osd has completley new ip
>>>>> 5: will the rbd volume attached to host A learn the new osd ip on
>>>>> which its data resides and everything just continues to work ?
>>>>>
>>>>> What if all the mons also have changed ip ?
>>>>>
>>>> A volume does not reside "on a osd". The volume is striped, and each
>>>> strip is stored in a placement group; the placement group on the other hand
>>>> is distributed to several OSDs depending on the crush rules and the number
>>>> of replicates.
>>>>
>>>> If an OSD dies, ceph will backfill the now missing replicates to
>>>> another OSD, given another OSD satisfying the crush rules is available. The
>>>> same process is also triggered if an OSD is added.
>>>>
>>>> This process is somewhat transparent to the ceph client, as long as
>>>> enough replicates a present. The ceph client (librbd accessing a volume in
>>>> this case) gets asynchronous notification from the ceph mons in case of
>>>> relevant changes, e.g. updates to the osd map reflecting the failure of an
>>>> OSD. Traffic to the OSD will be automatically rerouted depending on the
>>>> crush rules as explained above. The OSD map also contains the IP address of
>>>> all OSDs, so changes to the IP address are just another update to the map.
>>>>
>>>> The only problem you might run into is changing the IP address of the
>>>> mons. There's also a mon map listing all active mons; if the mon a ceph
>>>> client is using dies/is removed, the client will switch to another active
>>>> mon from the map. This works fine in a running system; you can change the
>>>> IP address of a mon one by one without any interruption to the client
>>>> (theoretically....).
>>>>
>>>> The problem is starting the ceph client. In this case the client uses
>>>> the list of mons from the ceph configuration file to contact one mon and
>>>> receive the initial mon map. If you change the hostnames/IP address of the
>>>> mons, you also need to update the ceph configuration file.
>>>>
>>>> The above outline is how it should work, given a valid ceph and network
>>>> setup. YMMV.
>>>>
>>>> Regards,
>>>> Burkhard
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Importance of Stable Mon and OSD IPs

Reply via email to