Michael, I think you mean "compute_node hostname" as 'hypervisor_hostname' field in the 'compute_node' table. What do you mean by "service hostname"? I don't see such field in the 'service' table in the database. Is it in some other table? Or do you suggest adding 'service_hostname' field in the 'service' table?
Thanks, David ----- Original Message ----- > openstack-bounces+mjfork=us.ibm....@lists.launchpad.net wrote on > 08/27/2012 02:58:56 PM: > > > From: David Kang <dk...@isi.edu> > > To: Vishvananda Ishaya <vishvana...@gmail.com>, > > Cc: OpenStack Development Mailing List <openstack- > > d...@lists.openstack.org>, "openstack@lists.launchpad.net \ > > (openstack@lists.launchpad.net\)" <openstack@lists.launchpad.net> > > Date: 08/27/2012 03:06 PM > > Subject: Re: [Openstack] [openstack-dev] Discussion about where to > > put database for bare-metal provisioning (review 10726) > > Sent by: openstack-bounces+mjfork=us.ibm....@lists.launchpad.net > > > > > > Hi Vish, > > > > I think I understand your idea. > > One service entry with multiple bare-metal compute_node entries are > > registered at the start of bare-metal nova-compute. > > 'hypervisor_hostname' must be different for each bare-metal machine, > > such as 'bare-metal-0001.xxx.com', 'bare-metal-0002.xxx.com', etc.) > > But their IP addresses must be the IP address of bare-metal nova- > > compute, such that an instance is casted > > not to bare-metal machine directly but to bare-metal nova-compute. > > I believe the change here is to cast out the message to the > <topic>.<service-hostname>. Existing code sends it to the compute_node > hostname (see line 202 of nova/scheduler/filter_scheduler.py, > specifically host=weighted_host.host_state.host). Changing that to > cast to the service hostname would send the message to the bare-metal > proxy node and should not have an effect on current deployments since > the service hostname and the host_state.host would always be equal. > This model will also let you keep the bare-metal compute node IP in > the compute node table. > > > One extension we need to do at the scheduler side is using (host, > > hypervisor_hostname) instead of (host) only in host_manager.py. > > 'HostManager.service_state' is { <host> : { <service > : { cap k : v > > }}}. > > It needs to be changed to { <host> : { <service> : { > > <hypervisor_name> : { cap k : v }}}}. > > Most functions of HostState need to be changed to use (host, > > hypervisor_name) pair to identify a compute node. > > Would an alternative here be to change the top level "host" to be the > hypervisor_hostname and enforce uniqueness? > > > Are we on the same page, now? > > > > Thanks, > > David > > > > ----- Original Message ----- > > > Hi David, > > > > > > I just checked out the code more extensively and I don't see why > > > you > > > need to create a new service entry for each compute_node entry. > > > The > > > code in host_manager to get all host states explicitly gets all > > > compute_node entries. I don't see any reason why multiple > > > compute_node > > > entries can't share the same service. I don't see any place in the > > > scheduler that is grabbing records by "service" instead of by > > > "compute > > > node", but if there is one that I missed, it should be fairly easy > > > to > > > change it. > > > > > > The compute_node record is created in the > > > compute/resource_tracker.py > > > as of a recent commit, so I think the path forward would be to > > > make > > > sure that one of the records is created for each bare metal node > > > by > > > the bare metal compute, perhaps by having multiple > > > resource_trackers. > > > > > > Vish > > > > > > On Aug 27, 2012, at 9:40 AM, David Kang <dk...@isi.edu> wrote: > > > > > > > > > > > Vish, > > > > > > > > I think I don't understand your statement fully. > > > > Unless we use different hostnames, (hostname, > > > > hypervisor_hostname) > > > > must be the > > > > same for all bare-metal nodes under a bare-metal nova-compute. > > > > > > > > Could you elaborate the following statement a little bit more? > > > > > > > >> You would just have to use a little more than hostname. Perhaps > > > >> (hostname, hypervisor_hostname) could be used to update the > > > >> entry? > > > >> > > > > > > > > Thanks, > > > > David > > > > > > > > > > > > > > > > ----- Original Message ----- > > > >> I would investigate changing the capabilities to key off of > > > >> something > > > >> other than hostname. It looks from the table structure like > > > >> compute_nodes could be have a many-to-one relationship with > > > >> services. > > > >> You would just have to use a little more than hostname. Perhaps > > > >> (hostname, hypervisor_hostname) could be used to update the > > > >> entry? > > > >> > > > >> Vish > > > >> > > > >> On Aug 24, 2012, at 11:23 AM, David Kang <dk...@isi.edu> wrote: > > > >> > > > >>> > > > >>> Vish, > > > >>> > > > >>> I've tested your code and did more testing. > > > >>> There are a couple of problems. > > > >>> 1. host name should be unique. If not, any repetitive updates > > > >>> of > > > >>> new > > > >>> capabilities with the same host name are simply overwritten. > > > >>> 2. We cannot generate arbitrary host names on the fly. > > > >>> The scheduler (I tested filter scheduler) gets host names from > > > >>> db. > > > >>> So, if a host name is not in the 'services' table, it is not > > > >>> considered by the scheduler at all. > > > >>> > > > >>> So, to make your suggestions possible, nova-compute should > > > >>> register > > > >>> N different host names in 'services' table, > > > >>> and N corresponding entries in 'compute_nodes' table. > > > >>> Here is an example: > > > >>> > > > >>> mysql> select id, host, binary, topic, report_count, disabled, > > > >>> availability_zone from services; > > > >>> +----+-------------+----------------+----------- > > +--------------+----------+-------------------+ > > > >>> | id | host | binary | topic | report_count | disabled | > > > >>> | availability_zone | > > > >>> +----+-------------+----------------+----------- > > +--------------+----------+-------------------+ > > > >>> | 1 | bespin101 | nova-scheduler | scheduler | 17145 | 0 | > > > >>> | nova | > > > >>> | 2 | bespin101 | nova-network | network | 16819 | 0 | nova | > > > >>> | 3 | bespin101-0 | nova-compute | compute | 16405 | 0 | nova > > > >>> | | > > > >>> | 4 | bespin101-1 | nova-compute | compute | 1 | 0 | nova | > > > >>> +----+-------------+----------------+----------- > > +--------------+----------+-------------------+ > > > >>> > > > >>> mysql> select id, service_id, hypervisor_hostname from > > > >>> compute_nodes; > > > >>> +----+------------+------------------------+ > > > >>> | id | service_id | hypervisor_hostname | > > > >>> +----+------------+------------------------+ > > > >>> | 1 | 3 | bespin101.east.isi.edu | > > > >>> | 2 | 4 | bespin101.east.isi.edu | > > > >>> +----+------------+------------------------+ > > > >>> > > > >>> Then, nova db (compute_nodes table) has entries of all > > > >>> bare-metal > > > >>> nodes. > > > >>> What do you think of this approach. > > > >>> Do you have any better approach? > > > >>> > > > >>> Thanks, > > > >>> David > > > >>> > > > >>> > > > >>> > > > >>> ----- Original Message ----- > > > >>>> To elaborate, something the below. I'm not absolutely sure > > > >>>> you > > > >>>> need > > > >>>> to > > > >>>> be able to set service_name and host, but this gives you the > > > >>>> option > > > >>>> to > > > >>>> do so if needed. > > > >>>> > > > >>>> iff --git a/nova/manager.py b/nova/manager.py > > > >>>> index c6711aa..c0f4669 100644 > > > >>>> --- a/nova/manager.py > > > >>>> +++ b/nova/manager.py > > > >>>> @@ -217,6 +217,8 @@ class SchedulerDependentManager(Manager): > > > >>>> > > > >>>> def update_service_capabilities(self, capabilities): > > > >>>> """Remember these capabilities to send on next periodic > > > >>>> update.""" > > > >>>> + if not isinstance(capabilities, list): > > > >>>> + capabilities = [capabilities] > > > >>>> self.last_capabilities = capabilities > > > >>>> > > > >>>> @periodic_task > > > >>>> @@ -224,5 +226,8 @@ class SchedulerDependentManager(Manager): > > > >>>> """Pass data back to the scheduler at a periodic interval.""" > > > >>>> if self.last_capabilities: > > > >>>> LOG.debug(_('Notifying Schedulers of capabilities ...')) > > > >>>> - self.scheduler_rpcapi.update_service_capabilities(context, > > > >>>> - self.service_name, self.host, self.last_capabilities) > > > >>>> + for capability_item in self.last_capabilities: > > > >>>> + name = capability_item.get('service_name', > > > >>>> self.service_name) > > > >>>> + host = capability_item.get('host', self.host) > > > >>>> + self.scheduler_rpcapi.update_service_capabilities(context, > > > >>>> + name, host, capability_item) > > > >>>> > > > >>>> On Aug 21, 2012, at 1:28 PM, David Kang <dk...@isi.edu> > > > >>>> wrote: > > > >>>> > > > >>>>> > > > >>>>> Hi Vish, > > > >>>>> > > > >>>>> We are trying to change our code according to your comment. > > > >>>>> I want to ask a question. > > > >>>>> > > > >>>>>>>> a) modify driver.get_host_stats to be able to return a > > > >>>>>>>> list > > > >>>>>>>> of > > > >>>>>>>> host > > > >>>>>>>> stats instead of just one. Report the whole list back to > > > >>>>>>>> the > > > >>>>>>>> scheduler. We could modify the receiving end to accept a > > > >>>>>>>> list > > > >>>>>>>> as > > > >>>>>>>> well > > > >>>>>>>> or just make multiple calls to > > > >>>>>>>> self.update_service_capabilities(capabilities) > > > >>>>> > > > >>>>> Modifying driver.get_host_stats to return a list of host > > > >>>>> stats > > > >>>>> is > > > >>>>> easy. > > > >>>>> Calling muliple calls to > > > >>>>> self.update_service_capabilities(capabilities) doesn't seem > > > >>>>> to > > > >>>>> work, > > > >>>>> because 'capabilities' is overwritten each time. > > > >>>>> > > > >>>>> Modifying the receiving end to accept a list seems to be > > > >>>>> easy. > > > >>>>> However, 'capabilities' is assumed to be dictionary by all > > > >>>>> other > > > >>>>> scheduler routines, > > > >>>>> it looks like that we have to change all of them to handle > > > >>>>> 'capability' as a list of dictionary. > > > >>>>> > > > >>>>> If my understanding is correct, it would affect many parts > > > >>>>> of > > > >>>>> the > > > >>>>> scheduler. > > > >>>>> Is it what you recommended? > > > >>>>> > > > >>>>> Thanks, > > > >>>>> David > > > >>>>> > > > >>>>> > > > >>>>> ----- Original Message ----- > > > >>>>>> This was an immediate goal, the bare-metal nova-compute > > > >>>>>> node > > > >>>>>> could > > > >>>>>> keep an internal database, but report capabilities through > > > >>>>>> nova > > > >>>>>> in > > > >>>>>> the > > > >>>>>> common way with the changes below. Then the scheduler > > > >>>>>> wouldn't > > > >>>>>> need > > > >>>>>> access to the bare metal database at all. > > > >>>>>> > > > >>>>>> On Aug 15, 2012, at 4:23 PM, David Kang <dk...@isi.edu> > > > >>>>>> wrote: > > > >>>>>> > > > >>>>>>> > > > >>>>>>> Hi Vish, > > > >>>>>>> > > > >>>>>>> Is this discussion for long-term goal or for this Folsom > > > >>>>>>> release? > > > >>>>>>> > > > >>>>>>> We still believe that bare-metal database is needed > > > >>>>>>> because there is not an automated way how bare-metal nodes > > > >>>>>>> report > > > >>>>>>> their capabilities > > > >>>>>>> to their bare-metal nova-compute node. > > > >>>>>>> > > > >>>>>>> Thanks, > > > >>>>>>> David > > > >>>>>>> > > > >>>>>>>> > > > >>>>>>>> I am interested in finding a solution that enables > > > >>>>>>>> bare-metal > > > >>>>>>>> and > > > >>>>>>>> virtualized requests to be serviced through the same > > > >>>>>>>> scheduler > > > >>>>>>>> where > > > >>>>>>>> the compute_nodes table has a full view of schedulable > > > >>>>>>>> resources. > > > >>>>>>>> This > > > >>>>>>>> would seem to simplify the end-to-end flow while opening > > > >>>>>>>> up > > > >>>>>>>> some > > > >>>>>>>> additional use cases (e.g. dynamic allocation of a node > > > >>>>>>>> from > > > >>>>>>>> bare-metal to hypervisor and back). > > > >>>>>>>> > > > >>>>>>>> One approach would be to have a proxy running a single > > > >>>>>>>> nova-compute > > > >>>>>>>> daemon fronting the bare-metal nodes . That nova-compute > > > >>>>>>>> daemon > > > >>>>>>>> would > > > >>>>>>>> report up many HostState objects (1 per bare-metal node) > > > >>>>>>>> to > > > >>>>>>>> become > > > >>>>>>>> entries in the compute_nodes table and accessible through > > > >>>>>>>> the > > > >>>>>>>> scheduler HostManager object. > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> The HostState object would set cpu_info, vcpus, member_mb > > > >>>>>>>> and > > > >>>>>>>> local_gb > > > >>>>>>>> values to be used for scheduling with the hypervisor_host > > > >>>>>>>> field > > > >>>>>>>> holding the bare-metal machine address (e.g. for IPMI > > > >>>>>>>> based > > > >>>>>>>> commands) > > > >>>>>>>> and hypervisor_type = NONE. The bare-metal Flavors are > > > >>>>>>>> created > > > >>>>>>>> with > > > >>>>>>>> an > > > >>>>>>>> extra_spec of hypervisor_type= NONE and the corresponding > > > >>>>>>>> compute_capabilities_filter would reduce the available > > > >>>>>>>> hosts > > > >>>>>>>> to > > > >>>>>>>> those > > > >>>>>>>> bare_metal nodes. The scheduler would need to understand > > > >>>>>>>> that > > > >>>>>>>> hypervisor_type = NONE means you need an exact fit (or > > > >>>>>>>> best-fit) > > > >>>>>>>> host > > > >>>>>>>> vs weighting them (perhaps through the multi-scheduler). > > > >>>>>>>> The > > > >>>>>>>> scheduler > > > >>>>>>>> would cast out the message to the > > > >>>>>>>> <topic>.<service-hostname> > > > >>>>>>>> (code > > > >>>>>>>> today uses the HostState hostname), with the compute > > > >>>>>>>> driver > > > >>>>>>>> having > > > >>>>>>>> to > > > >>>>>>>> understand if it must be serviced elsewhere (but does not > > > >>>>>>>> break > > > >>>>>>>> any > > > >>>>>>>> existing implementations since it is 1 to 1). > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> Does this solution seem workable? Anything I missed? > > > >>>>>>>> > > > >>>>>>>> The bare metal driver already is proxying for the other > > > >>>>>>>> nodes > > > >>>>>>>> so > > > >>>>>>>> it > > > >>>>>>>> sounds like we need a couple of things to make this > > > >>>>>>>> happen: > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> a) modify driver.get_host_stats to be able to return a > > > >>>>>>>> list > > > >>>>>>>> of > > > >>>>>>>> host > > > >>>>>>>> stats instead of just one. Report the whole list back to > > > >>>>>>>> the > > > >>>>>>>> scheduler. We could modify the receiving end to accept a > > > >>>>>>>> list > > > >>>>>>>> as > > > >>>>>>>> well > > > >>>>>>>> or just make multiple calls to > > > >>>>>>>> self.update_service_capabilities(capabilities) > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> b) make a few minor changes to the scheduler to make sure > > > >>>>>>>> filtering > > > >>>>>>>> still works. Note the changes here may be very helpful: > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> https://review.openstack.org/10327 > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> c) we have to make sure that instances launched on those > > > >>>>>>>> nodes > > > >>>>>>>> take > > > >>>>>>>> up > > > >>>>>>>> the entire host state somehow. We could probably do this > > > >>>>>>>> by > > > >>>>>>>> making > > > >>>>>>>> sure that the instance_type ram, mb, gb etc. matches what > > > >>>>>>>> the > > > >>>>>>>> node > > > >>>>>>>> has, but we may want a new boolean field "used" if those > > > >>>>>>>> aren't > > > >>>>>>>> sufficient. > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> I This approach seems pretty good. We could potentially > > > >>>>>>>> get > > > >>>>>>>> rid > > > >>>>>>>> of > > > >>>>>>>> the > > > >>>>>>>> shared bare_metal_node table. I guess the only other > > > >>>>>>>> concern > > > >>>>>>>> is > > > >>>>>>>> how > > > >>>>>>>> you populate the capabilities that the bare metal nodes > > > >>>>>>>> are > > > >>>>>>>> reporting. > > > >>>>>>>> I guess an api extension that rpcs to a baremetal node to > > > >>>>>>>> add > > > >>>>>>>> the > > > >>>>>>>> node. Maybe someday this could be autogenerated by the > > > >>>>>>>> bare > > > >>>>>>>> metal > > > >>>>>>>> host > > > >>>>>>>> looking in its arp table for dhcp requests! :) > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> Vish > > > >>>>>>>> > > > >>>>>>>> _______________________________________________ > > > >>>>>>>> OpenStack-dev mailing list > > > >>>>>>>> openstack-...@lists.openstack.org > > > >>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > >>>>>>> > > > >>>>>>> _______________________________________________ > > > >>>>>>> OpenStack-dev mailing list > > > >>>>>>> openstack-...@lists.openstack.org > > > >>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > >>>>>> > > > >>>>>> > > > >>>>>> _______________________________________________ > > > >>>>>> OpenStack-dev mailing list > > > >>>>>> openstack-...@lists.openstack.org > > > >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > >>>>> > > > >>>>> _______________________________________________ > > > >>>>> OpenStack-dev mailing list > > > >>>>> openstack-...@lists.openstack.org > > > >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > >>>> > > > >>>> > > > >>>> _______________________________________________ > > > >>>> OpenStack-dev mailing list > > > >>>> openstack-...@lists.openstack.org > > > >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > > Mailing list: https://launchpad.net/~openstack > > Post to : openstack@lists.launchpad.net > > Unsubscribe : https://launchpad.net/~openstack > > More help : https://help.launchpad.net/ListHelp > > > > Michael > > ------------------------------------------------- > Michael Fork > Cloud Architect, Emerging Solutions > IBM Systems & Technology Group _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp