Hi,

Will be doing a redesign of our current observability topology and just 
started looking into what would be an acceptable solution.

Context:
- Kubernetes
- 40+ clusters
- 50+ leaf prom instances per cluster, ~2k total
- 6 root level prom instance monitoring leaves (3x2 regional, zone 
redundant), 240 total.
- 6 alert managers per cluster, processing alerts from leaf and root proms 
(3x2 regional, zone redundant), 240 total.
- Root-level instances are monitored by a HA cortex cluster 

----------------------------------------------------------------------------------------------------

*Pros of the current setup:*

   - it's very robust
   - easy to configure
   - easy to setup 


*Issues with it:*

   - Lack of global view
   - clusters are already in overlapping regions and there will be even 
   more overlap, leading to a high amount of alert duplication
   - traceability 
   - promotes monkey patching, because we have to introduce tags and 
   software constructs for deduplication and grouping


----------------------------------------------------------------------------------------------------


Potential solutions I was thinking of:


   1. move up alert manages to a higher, only regional layer, without them 
   gossiping to each other 
   2. create a clustered HA alert manager setup in 3-5 regions


The ideal solution would probably be [2] because of its simplicity and 
robustness at the same time, however, I have many unknowns here:

- will it bear the load? Currently having around 5k alerts an hour. (not 
sure what gossip AM uses, if it is one of the random variants then load 
probably is no issue)
- bandwidth pressure etc etc


wdyt?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9890bde8-9c3c-485f-9918-6a2e9af785dan%40googlegroups.com.

Reply via email to