Thanks Gregory and Burkhard In kubernetes we use rbd create and rbd map/unmap commands. In this perspective are you referring to rbd as the client or after the image is created and mapped, is there a different client running inside the kernel that you are referring to which can get osd and mon updates ?
My question is mainly after we have run the rbd ccreate and rbd map commands, does a client still exsit or its gone ? If the rbd image is mapped on a host and then if osd or mon ips change , what happens in this case ? -Mayank On Mon, Jan 29, 2018 at 10:25 AM, Gregory Farnum <gfar...@redhat.com> wrote: > Ceph assumes monitor IP addresses are stable, as they're the identity for > the monitor and clients need to know them to connect. > > Clients maintain a TCP connection to the monitors while they're running, > and monitors publish monitor maps containing all the known monitors in the > cluster. These are pushed out to running clients over those stable > connections whenever the map changes. When a client isn't connected to the > cluster, it relies on the monitor IP address(es) in its ceph.conf (or > supplied on the command line) to connect. I'm not sure about Kubernetes, > but in OpenStack the monitor IPs need to remain stable once an RBD image is > configured because they're permanently stored in the config. (Or you can > update the OpenStack config data, but it takes some pretty serious doing > and doesn't have good tooling.) > > There's more to the monitor IPs than just the clients though. Like I said, > the IP is considered the monitor's identity. I'm not sure offhand what > happens if you change it and then boot up an existing store; it may > automatically connect or you may need to do some manual commands. Either > way, the prior IP will certainly remain in the monitor map (unless you or > kubernetes does something to remove it) and that means you've added a > "monitor" that nobody will ever be able to connect to. Do that to all of > the monitors, and they won't be able to do any paxos consensus and things > will grind to a halt. > > In contrast, the OSD IPs don't matter at all on their own. I'd just be > worried about if whatever's changing the IP also changes the hostname or > otherwise causes the OSD to move around in the crush map, as that will > generate a great deal of data movement. > -Greg > > On Fri, Jan 26, 2018 at 11:50 AM Mayank Kumar <krmaya...@gmail.com> wrote: > >> Resending in case this email was lost >> >> On Tue, Jan 23, 2018 at 10:50 PM Mayank Kumar <krmaya...@gmail.com> >> wrote: >> >>> Thanks Burkhard for the detailed explanation. Regarding the following:- >>> >>> >>>The ceph client (librbd accessing a volume in this case) gets >>> asynchronous notification from the ceph mons in case of relevant changes, >>> e.g. updates to the osd map reflecting the failure of an OSD. >>> i have some more questions:- >>> 1: Does the asynchronous notification for both osdmap and monmap comes >>> from mons ? >>> 2: Are these asynchronous notifications retriable ? >>> 3: Is it possible that the these asynchronous notifications are lost ? >>> 4: Does the monmap and osdmap reside in the kernel or user space ? The >>> reason i am asking is , for a rbd volume that is already mounted on a host, >>> will it continue to receive those asynchronoous notifications for changes >>> to both osd and mon ips or not ? If All mon ips change, but the mon >>> configuration file is updated to reflect the new mon ips, should the >>> existing rbd volume mounted still be able to contact the osd's and mons or >>> is there some form of caching in the kernel space for an already mounted >>> rbd volume >>> >>> >>> Some more context for why i am getting all these doubts:- >>> We internally had a ceph cluster with rbd volumes being provisioned by >>> Kubernetes. With existing rbd volumes already mounted , we wiped out the >>> old ceph cluster and created a brand new ceph cluster . But the existing >>> rbd volumes from the old cluster still remained. Any kubernetes pods that >>> landed on the same host as an old rbd volume would not create because the >>> volume failed to attach and mount. Looking at the kernel messages we saw >>> the following:- >>> >>> -- Logs begin at Fri 2018-01-19 02:05:38 GMT, end at Fri 2018-01-19 >>> 19:23:14 GMT. -- >>> >>> Jan 19 19:20:39 host1.com kernel: *libceph: osd2 10.231.171.131:6808 >>> <http://10.231.171.131:6808/> socket closed (con state CONNECTING)* >>> >>> Jan 19 19:18:30 host1.com kernel: *libceph: osd28 10.231.171.52:6808 >>> <http://10.231.171.52:6808/> socket closed (con state CONNECTING)* >>> >>> Jan 19 19:18:30 host1.com kernel: *libceph: osd0 10.231.171.131:6800 >>> <http://10.231.171.131:6800/> socket closed (con state CONNECTING)* >>> >>> Jan 19 19:15:40 host1.com kernel: *libceph: osd21 10.231.171.99:6808 >>> <http://10.231.171.99:6808/> wrong peer at address* >>> >>> Jan 19 19:15:40 host1.com kernel: *libceph: wrong peer, >>> want 10.231.171.99:6808/42661 <http://10.231.171.99:6808/42661>, >>> got 10.231.171.99:6808/73168 <http://10.231.171.99:6808/73168>* >>> >>> Jan 19 19:15:34 host1.com kernel: *libceph: osd11 10.231.171.114:6816 >>> <http://10.231.171.114:6816/> wrong peer at address* >>> >>> Jan 19 19:15:34 host1.com kernel: *libceph: wrong peer, >>> want 10.231.171.114:6816/130908 <http://10.231.171.114:6816/130908>, >>> got 10.231.171.114:6816/85562 <http://10.231.171.114:6816/85562>* >>> >>> The Ceph cluster had new osd ip and mon ips. >>> >>> So my questions, since these messages are coming from the kernel module, >>> why cant the kernel module figure out that the mon and osd ips have >>> changed. Is there some caching in the kernel ? when rbd create/attach is >>> called on that host, it is passed new mon ips , so doesnt that update the >>> old already mounted rbd volumes. >>> >>> Hope i made my doubts clear and yes i am a beginner in Ceph with very >>> limited knowledge. >>> >>> Thanks for your help again >>> Mayank >>> >>> >>> On Tue, Jan 23, 2018 at 1:24 AM, Burkhard Linke < >>> burkhard.li...@computational.bio.uni-giessen.de> wrote: >>> >>>> Hi, >>>> >>>> >>>> On 01/23/2018 09:53 AM, Mayank Kumar wrote: >>>> >>>>> Hi Ceph Experts >>>>> >>>>> I am a new user of Ceph and currently using Kubernetes to deploy Ceph >>>>> RBD Volumes. We our doing some initial work rolling it out to internal >>>>> customers and in doing that we are using the ip of the host as the ip of >>>>> the osd and mons. This means if a host goes down , we loose that ip. While >>>>> we are still experimenting with these behaviors, i wanted to see what the >>>>> community thinks for the following scenario :- >>>>> >>>>> 1: a rbd volume is already attached and mounted on host A >>>>> 2: the osd on which this rbd volume resides, dies and never comes back >>>>> up >>>>> 3: another osd is replaced in its place. I dont know the intricacies >>>>> here, but i am assuming the data for this rbd volume either moves to >>>>> different osd's or goes back to the newly installed osd >>>>> 4: the new osd has completley new ip >>>>> 5: will the rbd volume attached to host A learn the new osd ip on >>>>> which its data resides and everything just continues to work ? >>>>> >>>>> What if all the mons also have changed ip ? >>>>> >>>> A volume does not reside "on a osd". The volume is striped, and each >>>> strip is stored in a placement group; the placement group on the other hand >>>> is distributed to several OSDs depending on the crush rules and the number >>>> of replicates. >>>> >>>> If an OSD dies, ceph will backfill the now missing replicates to >>>> another OSD, given another OSD satisfying the crush rules is available. The >>>> same process is also triggered if an OSD is added. >>>> >>>> This process is somewhat transparent to the ceph client, as long as >>>> enough replicates a present. The ceph client (librbd accessing a volume in >>>> this case) gets asynchronous notification from the ceph mons in case of >>>> relevant changes, e.g. updates to the osd map reflecting the failure of an >>>> OSD. Traffic to the OSD will be automatically rerouted depending on the >>>> crush rules as explained above. The OSD map also contains the IP address of >>>> all OSDs, so changes to the IP address are just another update to the map. >>>> >>>> The only problem you might run into is changing the IP address of the >>>> mons. There's also a mon map listing all active mons; if the mon a ceph >>>> client is using dies/is removed, the client will switch to another active >>>> mon from the map. This works fine in a running system; you can change the >>>> IP address of a mon one by one without any interruption to the client >>>> (theoretically....). >>>> >>>> The problem is starting the ceph client. In this case the client uses >>>> the list of mons from the ceph configuration file to contact one mon and >>>> receive the initial mon map. If you change the hostnames/IP address of the >>>> mons, you also need to update the ceph configuration file. >>>> >>>> The above outline is how it should work, given a valid ceph and network >>>> setup. YMMV. >>>> >>>> Regards, >>>> Burkhard >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >>> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com