On 28/04/17 03:48 AM, Julien Danjou wrote: > > Yes, I wrote that in a review somewhere. We need to rework 1. so > deletion happens at the same time we lock the sack to process metrics > basically. We might want to merge the janitor into the worker I imagine. > Currently a janitor can grab metrics and do dumb things like: > - metric1 from sackA > - metric2 from sackB > - metric3 from sackA > > and do 3 different lock+delete -_-
so the tradeoff here is that now we're doing a lot more calls to indexer. additionally, we're pulling a lot more unused results from db. a single janitor currently just grabs all deleted metrics and starts attempting to clean them up one at a time. if we merge, we will have n calls to indexer, where n is number of workers, each pulling in all the deleted metrics, and then checking to see if the metric is in it's sack, and if not, moving on. that's a lot of extra, wasted work. we could reduce that work by adding sack information to indexer ;) but that will still add significantly more calls to indexer (which we could reduce by not triggering cleanup every job interval) >> >> alternatively, this could be solved by keeping per-metric locks in >> addition to per-sack locks. this would effectively double the number of >> active locks we have so instead of each metricd worker having a single >> per-sack lock, it will also have a per-metric lock for whatever metric >> it may be publishing at the time. > > If we got a timeout set for scenario 3, I'm not that worried. I guess > worst thing is that people would be unhappy with the API spending time > doing computation anyway so we'd need to rework how refresh work or add > an ability to disable it. > refresh is currently disabled by default so i think we're ok. what's the timeout for? timeout api's attempt to aggregate metric? i think it's a bad experience if we add any timeout since i assume it will still return what it can return and then the results become somewhat ambiguous. now that i think about it more this issue still exists in per-metric scenario (but to lesser extent). 'refresh' can still be blocked by metricd but it's just a significantly smaller chance and the window for missed unprocessed measures is smaller. -- gord __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev