There is something that isn't clear to me from your patch and based on your description of the workflow below. It sounds like you are following the basic L3 to ToR topology so each rack is a broadcast domain. If that’s the case, each rack should be a Neutron network and the mapping should be between racks and Networks, not racks and Subnets.
Also, can you elaborate a bit on the multiple gateway use case? If a subnet is isolated to a rack, wouldn’t all of the clients in that rack just want to use the ToR as their default gateway? > On Nov 9, 2015, at 9:39 PM, Shraddha Pandhe <spandhe.openst...@gmail.com> > wrote: > > Hi Carl, > > Please find me reply inline > > > On Mon, Nov 9, 2015 at 9:49 AM, Carl Baldwin <c...@ecbaldwin.net > <mailto:c...@ecbaldwin.net>> wrote: > On Fri, Nov 6, 2015 at 2:59 PM, Shraddha Pandhe <spandhe.openst...@gmail.com > <mailto:spandhe.openst...@gmail.com>> wrote: > We have a similar requirement where we want to pick a network thats > accessible in the rack that VM belongs to. We have L3 Top-of-rack, so the > network is confined to the rack. Right now, we are achieving this by naming > physical network name in a certain way, but thats not going to scale. > > We also want to be able to make scheduling decisions based on IP > availability. So we need to know rack <-> network <-> mapping. We can't > embed all factors in a name. It will be impossible to make scheduling > decisions by parsing name and comparing. GoDaddy has also been doing > something similar [1], [2]. > > This is precisely the use case that the large deployers team (LDT) has > brought to Neutron [1]. In fact, GoDaddy has been at the forefront of that > request. We've had discussions about this since just after Vancouver on the > ML. I've put up several specs to address it [2] and I'm working another > revision of it. My take on it is that Neutron needs a model for a layer 3 > network (IpNetwork) which would group the rack networks. The IpNetwork would > be visible to the end user and there will be a network <-> host mapping. I > am still aiming to have working code for this in Mitaka. I discussed this > with the LDT in Tokyo and they seemed to agree. We had a session on this in > the Neutron design track [3][4] though that discussion didn't produce > anything actionable. > > Thats great. L3 layer network model is definitely one of our most important > requirements. All our go-forward deployments are going to be L3. So this is a > big deal for us. > > Solving this problem at the IPAM level has come up in discussion but I don't > have any references for that. It is something that I'm still considering but > I haven't worked out all of the details for how this can work in a portable > way. Could you describe how you imagine how this flow would work from a > user's perspective? Specifically, when a user wants to boot a VM, what > precise API calls would be made to achieve this on your network and how where > would the IPAM data come in to play? > > Here's what the flow looks like to me. > > 1. User sends a boot request as usual. The user need not know all the network > and subnet information beforehand. All he would do is send a boot request. > > 2. The scheduler will pick a node in an L3 rack. The way we map nodes <-> > racks is as follows: > a. For VMs, we store rack_id in nova.conf on compute nodes > b. For Ironic nodes, right now we have static IP allocation, so we > practically know which IP we want to assign. But when we move to dynamic > allocation, we would probably use 'chassis' or 'driver_info' fields to store > the rack id. > > 3. Nova compute will try to pick a network ID for this instance. At this > point, it needs to know what networks (or subnets) are available in this > rack. Based on that, it will pick a network ID and send port creation request > to Neutron. At Yahoo, to avoid some back-and-forth, we send a fake network_id > and let the plugin do all the work. > > 4. We need some information associated with the network/subnet that tells us > what rack it belongs to. Right now, for VMs, we have that information > embedded in physnet name. But we would like to move away from that. If we had > a column for subnets - e.g. tag, it would solve our problem. Ideally, we > would like a column 'rack id' or a new table 'racks' that maps to subnets, or > something. We are open to different ideas that work for everyone. This is > where IPAM can help. > > 5. We have another requirement where we want to store multiple gateway > addresses for a subnet, just like name servers. > > > We also have a requirement where we want to make scheduling decisions based > on IP availability. We want to allocate multiple IPs to the hosts. e.g. We > want to allocate X IPs to a host. The flow in that case would be > > 1. User sends a boot request with --num-ips X > The network/subnet level complexities need not be exposed to the user. > For better experience, all we want our users to tell us is the number of IPs > they want. > > 2. When the scheduler tries to find an appropriate host in L3 racks, we want > it to find a rack that can satisfy this IP requirement. So, the scheduler > will basically say, "give me all racks that have >X IPs available". If we > have a 'Racks' table in IPAM, that would help. > Once the scheduler gets a rack, it will apply remaining filters to narrow > down to one host and call nova-compute. The IP count will be propagated to > nova compute from scheduler. > > > 3. Nova compute will call Neutron and send the node details and IP count > along. Neutron IPAM driver will then look at the node details, query the > database to find a network in that rack and allocate X IPs from the subnet. > > > > Carl > > [1] https://bugs.launchpad.net/neutron/+bug/1458890 > <https://bugs.launchpad.net/neutron/+bug/1458890> > [2] https://review.openstack.org/#/c/225384/ > <https://review.openstack.org/#/c/225384/> > [3] https://etherpad.openstack.org/p/mitaka-neutron-next-network-model > <https://etherpad.openstack.org/p/mitaka-neutron-next-network-model> > [4] https://www.openstack.org/summit/tokyo-2015/schedule/design-summit > <https://www.openstack.org/summit/tokyo-2015/schedule/design-summit> > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > <http://openstack-dev-requ...@lists.openstack.org/?subject:unsubscribe> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev> > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org > <mailto:openstack-dev-requ...@lists.openstack.org>?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev