I've got a review in progress for adding a telemetry scenario test:
https://review.openstack.org/#/c/115971/ It can't pass the *-icehouse tests because ceilometer-api is not present on the icehouse side of a havana->icehouse upgrade. In the process of trying to figure out what's going on I discovered so many confusing things that I'm no longer clear on: * Whether this is a fixable problem? * Whether it is worth fixing? * How (or if) it is possible to disable the test in question for older branches? * Maybe I should scrap the whole thing?[1] The core problem is that older branches of grenade do not have an upgrade-ceilometer, so though some ceilometer services do run in Havana they are not restarted over the upgrade gap. Presumably that could be fixed by backporting some stuff to the relevant branch. I admit, though, that at times it can be rather hard to tell which branch during a grenade run is providing the configuration and environment variables. In part this is due to an apparent difference in default local behavior and gate behavior. Suppose I wanted to exactly what replicate on a local setup what happens on a gate run, where do I go to figure that out? That seems a bit fragile, though. Wouldn't it be better to upgrade services based on what services are actually running, rather than some lines in a shell script? I looked into how this might be done and the mapping from ENABLED_SERVICES to actually-running-processes to some-generic-name-to-identify-an-upgrade is not at all straightforward. I suspect this is a known problem that people would like to fix, but I don't know where to look for more discussion on the matter. Please help? [1] And finally, the other grenade runs, those that are currently passing are only passing because a very long loop is waiting up to two minutes for notification messages (from the middleware) to show up at the ceilometer collector. Is this because the instance is just that overloaded and process contention is so high and it is just going to take that long? Is so, is there much point having a test which introduces this kind of potential legacy. A scenario test appears to be exactly what's needed here, but at what cost? What I'm after here is basically threefold: * Pointers to written info on how I can resolve these issues, if it exists. * If it doesn't, some discussion here on options to reach some resolution. * A cup of tea or other beverage of our choice and some sympathy and commiseration. A bit of "I too have suffered at the hands of grenade". Then we can all be friends.
From my side I can provide a promise to follow through on
improvements we discover. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev