Apologies for the top-posting, but just wanted to call out some potential confusion that arose on the #os-ceilometer channel earlier today.
TL;DR: the UI shouldn't assume a 1:1 mapping between alarms and resources, since this mapping does not exist in general Background: See ML post[1] Discussion: See IRC log [2] Ctrl+F: "Let's see what the UI guys think about it" Cheers, Eoghan [1] http://lists.openstack.org/pipermail/openstack-dev/2014-June/037788.html [2] http://eavesdrop.openstack.org/irclogs/%23openstack-ceilometer/%23openstack-ceilometer.2014-06-16.log ----- Original Message ----- > Hi all, > > Thanks again for the great comments on the initial cut of wireframes. I’ve > updated them a fair amount based on feedback in this e-mail thread along > with the feedback written up here: > https://etherpad.openstack.org/p/alarm-management-page-design-discussion > > Here is a link to the new version: > http://people.redhat.com/~lsurette/OpenStack/Alarm%20Management%20-%202014-06-05.pdf > > And a quick explanation of the updates that I made from the last version: > > 1) Removed severity. > > 2) Added Status column. I also added details around the fact that users can > enable/disable alerts. > > 3) Updated Alarm creation workflow to include choosing the project and user > (optionally for filtering the resource list), choosing resource, and > allowing for choose of amount of time to monitor for alarming. > -Perhaps we could be even more sophisticated for how we let users filter > down to find the right resources that they want to monitor for alarms? > > 4) As for notifying users…I’ve updated the “Alarms” section to be “Alarms > History”. The point here is to show any Alarms that have occurred to notify > the user. Other notification ideas could be to allow users to get notified > of alerts via e-mail (perhaps a user setting?). I’ve added a wireframe for > this update in User Settings. Then the Alarms Management section would just > be where the user creates, deletes, enables, and disables alarms. Do you > still think we don’t need the “alarms” tab? Perhaps this just becomes > iteration 2 and is left out for now as you mention in your etherpad. > > 5) Question about combined alarms…currently I’ve designed it so that a user > could create multiple levels in the “Alarm When…” section. They could > combine these with AND/ORs. Is this going far enough? Or do we actually need > to allow users to combine Alarms that might watch different resources? > > 6) I updated the Actions column to have the “More” drop down which is > consistent with other tables in Horizon. > > 7) Added in a section in the “Add Alarm” workflow for “Actions after Alarm”. > I’m thinking we could have some sort of If State is X, do X type selections, > but I’m looking to understand more details about how the backend works for > this feature. Eoghan gave examples of logging and potentially scaling out > via Heat. Would simple drop downs support these events? > > 8) I can definitely add in a “scheduling” feature with respect to Alarms. I > haven’t added it in yet, but I could see this being very useful in future > revisions of this feature. > > 9) Another though is that we could add in some padding for outlier data as > Eoghan mentioned. Perhaps a setting for “This has happened 3 times over the > last minute, so now send an alarm.”? > > A new round of feedback is of course welcome :) > > Best, > Liz > > On Jun 4, 2014, at 1:27 PM, Liz Blanchard <lsure...@redhat.com> wrote: > > > Thanks for the excellent feedback on these, guys! I’ll be working on making > > updates over the next week and will send a fresh link out when done. > > Anyone else with feedback, please feel free to fire away. > > > > Best, > > Liz > > On Jun 4, 2014, at 12:33 PM, Eoghan Glynn <egl...@redhat.com> wrote: > > > >> > >> Hi Liz, > >> > >> Two further thoughts occurred to me after hitting send on > >> my previous mail. > >> > >> First, is the concept of alarm dimensioning; see my RDO Ceilometer > >> getting started guide[1] for an explanation of that notion. > >> > >> "A key associated concept is the notion of dimensioning which defines the > >> set of matching meters that feed into an alarm evaluation. Recall that > >> meters are per-resource-instance, so in the simplest case an alarm might > >> be defined over a particular meter applied to all resources visible to a > >> particular user. More useful however would the option to explicitly > >> select which specific resources we're interested in alarming on. On one > >> extreme we would have narrowly dimensioned alarms where this selection > >> would have only a single target (identified by resource ID). On the other > >> extreme, we'd have widely dimensioned alarms where this selection > >> identifies many resources over which the statistic is aggregated, for > >> example all instances booted from a particular image or all instances > >> with matching user metadata (the latter is how Heat identifies > >> autoscaling groups)." > >> > >> We'd have to think about how that concept is captured in the > >> UX for alarm creation/update. > >> > >> Second, there are a couple of more advanced alarming features > >> that were added in Icehouse: > >> > >> 1. The ability to constrain alarms on time ranges, such that they > >> would only fire say during 9-to-5 on a weekday. This would > >> allow for example different autoscaling policies to be applied > >> out-of-hours, when resource usage is likely to be cheaper and > >> manual remediation less straight-forward. > >> > >> 2. The ability to exclude low-quality datapoints with anomolously > >> low sample counts. This allows the leading edge of the trend of > >> widely dimensioned alarms not to be skewed by eagerly-reporting > >> outliers. > >> > >> Perhaps not in a first iteration, but at some point it may make sense > >> to expose these more advanced features in the UI. > >> > >> Cheers, > >> Eoghan > >> > >> [1] http://openstack.redhat.com/CeilometerQuickStart > >> > >> > >> > >> ----- Original Message ----- > >>> > >>> Hi Liz, > >>> > >>> Looks great! > >>> > >>> Some thoughts on the wireframe doc: > >>> > >>> * The description of form: > >>> > >>> "If CPU Utilization exceeds 80%, send alarm." > >>> > >>> misses the time-window aspect of the alarm definition. > >>> > >>> Whereas the boilerplate default descriptions generated by > >>> ceilometer itself: > >>> > >>> "cpu_util > 70.0 during 3 x 600s" > >>> > >>> captures this important info. > >>> > >>> * The metric names, e.g. "CPU Utilization", are not an exact > >>> match for the meter names used by ceilometer, e.g. "cpu_util". > >>> > >>> * Non-admin users can create alarms in ceilometer: > >>> > >>> "This is where admins can come in and > >>> define and edit any alarms they want > >>> the environment to use." > >>> > >>> (though these alarms will only have visibility onto the stats > >>> that would be accessible to the user on behalf of whom the > >>> alarm is being evaluated) > >>> > >>> * There's no concept currently of alarm severity. > >>> > >>> * "Should users be able to enable/dis-able alarms." > >>> > >>> Yes, the API allows for disabled (i.e. non-evaluated) alarms. > >>> > >>> * "Should users be able to own/assign alarms?" > >>> > >>> Only admin users can create an alarm on behalf of another > >>> user/tenant. > >>> > >>> * "Should users be able to acknowledge, close alarms?" > >>> > >>> No, we have no concept of ACKing an alarm. > >>> > >>> * "Admins can also see a full list of all Alarms that have > >>> taken place in the past." > >>> > >>> In ceilometer terminology, we refer to this as alarm history > >>> or alarm change events. > >>> > >>> * "CPU Utilization exceeded 80%." > >>> > >>> Again good to capture the duration in that description of the > >>> event. > >>> > >>> * "Within the Overview section, there should be a new tab that allows the > >>> user to click and view all Alarms that have occurred in their > >>> environment." > >>> > >>> Not sure really what "environment" means here. Non-admin tenants only > >>> have visibility to their own alarm, whereas admins have visibility to > >>> all alarms. > >>> > >>> * "This list would keep the latest alarms." > >>> > >>> Presumably this would be based on querying the alarm-history API, > >>> as opposed to an assumption that Horizon is consuming the actual > >>> alarm notifications? > >>> > >>> Cheers, > >>> Eoghan > >>> > >>> ----- Original Message ----- > >>>> Hi All, > >>>> > >>>> I’ve recently put together a set of wireframes[1] around Alarm > >>>> Management > >>>> that would support the following blueprint: > >>>> https://blueprints.launchpad.net/horizon/+spec/ceilometer-alarm-management-page > >>>> > >>>> If you have a chance it would be great to hear any feedback that folks > >>>> have > >>>> on this direction moving forward with Alarms. > >>>> > >>>> Best, > >>>> Liz > >>>> > >>>> [1] > >>>> http://people.redhat.com/~lsurette/OpenStack/Alarm%20Management%20-%202014-05-30.pdf > >>>> > >>>> _______________________________________________ > >>>> OpenStack-dev mailing list > >>>> OpenStack-dev@lists.openstack.org > >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>>> > >>> > >>> _______________________________________________ > >>> OpenStack-dev mailing list > >>> OpenStack-dev@lists.openstack.org > >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>> > > > > _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev