Great to see efforts to ensure we only send useful notifications.

On 5/23/2014 7:48 AM, Denis Makogon wrote:



Good day, Trove community.



    I would like to start thread related to Trove notification framework.

    Notification design was defined as: “Trove will emit events for resources 
as they are manipulated. These events can be used to meter the service and 
possibly used to calculate bills.”

    Actual reason of this mail is to start a discussion related to 
re-implementing/refactoring of notifications. For now notifications are 
hard-pinned to nova provisioning.


    What kind of issues/problem do notifications have?

    Let's first take a look at how they are 
implemented.[1]<https://wiki.openstack.org/wiki/Trove/trove-notifications> – 
this is how notifications design was defined and 
approved.[2]<https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L73-L133>
 – this is how notifications are being implemented. How notifications should 
look like [5]<https://wiki.openstack.org/wiki/Trove/trove-notifications-v2>.


    First of all, there are a lot issues 
with[2]<https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L73-L133>
 :

  *   pinning notifications to nova client – it's wrong way, because Trove is 
going to supportheat for resource 
management<https://blueprints.launchpad.net/trove/+spec/resource-manager-interface>;

  *   availability zone – should be only used at “trove.instance.create” 
notification only, no need to use it each time “trove.instance.modify_*” 
happens (* - flavor, volume);

  *   instance_size – this payload attribute referring to an amount of RAM 
defined by flavor;

  *   instance_type – this payload attribute referring to flavor name, which 
seems odd;

  *   instance_type_id – same thing, payload attribute referring to flavor id, 
which seems     odd;

  *   nova_instance_id – to be more generic, we should refuse from using 
specific names;

  *   state_description and state – same referring to instance service status, 
actual duplication;

  *   nova_volume_id – same as for nova_instance_id, should be more generic, 
since     instance can have cinder volume that has nothing common with nova at 
all.


    We need to have more generic, more flexible notifications, that can be used 
with any provisioning engine, no matter what it actually is (nova/heat)


Also, ensuring you have .start and .end notifications really helps with a 
number of problems:
1. early detection of problems, even before timeouts occur.
2. profiling a particular operation. Even amqp in-flight time can be computed 
from the last .end to the .start of another service, which is cool.
3. un-reported exceptions. A .start without a corresponding ERROR or .end 
notification can be troublesome.


    How do we can re-write notifications taking into account described issues?

  1.  We need to 
re-writesend_usage_event<https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L88>
 method.

  2.  It should not ask nova for flavor, server and AZ, because it's redundant. 
So, the beginning of the method should look 
like[3]<https://gist.github.com/denismakogon/9c2d802e2a61eb6164d2>.

Yeah, this is always a tricky one. You don't want to have to call out to a 
million other services just to get the data for a notification. Notifications 
should be light-weight (they can be large, but not computationally expensive). 
The big thing is that they have the necessary information to answer the 
question "what were the factors contributing to X happening" and not just "X 
happened". So long as you put some references in the payload to answer those 
questions, you don't really have to go back to the source to get the detailed 
information. We should be able to sew those relationships up on the consuming 
side.


  1.
  2.  Payload should be re-written. It should have the following 
form[4]<https://gist.github.com/denismakogon/c4a784d364f0af0fc543>.


What the actual value-add of this refactoring?

    Notifications would be reusable for any kinds of actions (create, delete, 
resizes), no matter what kind of the provisioning engine was used.

+1



    Next steps after suggested refactoring?

    Next steps will cover required notifications that were described as part of 
the ceilometer 
integration.<https://blueprints.launchpad.net/trove/+spec/ceilometer-integration>


Great to see ... go, go, go!

-Sandy



Best regards,

Denis Makogon

www.mirantis.com<http://www.mirantis.com>

dmako...@mirantis.com<mailto:dmako...@mirantis.com>

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to