[prometheus-users] Re: User level disk usage monitoring and notification - with prometheus and alertmanager

'Brian Candler' via Prometheus Users Thu, 29 Feb 2024 01:23:06 -0800

> I don't think  *condition1* and *condition2* will work as labels and 
label values returned by condition1 and condition2 are different.


condition1 if on (instance,mountpoint) group_left(username) condition2

This assumes that the both expressions have "instance" and "mountpoint" 
labels; these are the only ones considered when matching. It also assumes 
there is a many-to-1 relationship from the left-hand size (users) to right 
hand side (filesystem), and that there is a label "username" that you would 
like carried forward from the LHS into the result.

> So i need 3 rules  - 1 each for server1,server2 and server3

I don't think so. The vector of results can include values for each 
(user,filesystem,instance) on the LHS, and each (filesystem,instnace) on 
the RHS, and alert separately for every filesystem that reaches 90%.

On Wednesday 28 February 2024 at 22:55:11 UTC+7 Puneet Singh wrote:

> Hi All, 
> I have a monitoring requirement related to the user level disk usage and 
> alerting. And i am wondering if prometheus is the correct tool to handle 
> this requirement or,
>   a custom python script (whish uses os, subprocess, smtp module)  to 
> handle monitoring and alerting will be optimial solution in this context?
>
>
> Here is the problem description - 
> In our setup we have 3 servers we have  a single mount point "/", and each 
> user's directory, such as "/home/user1", "/home/user2", and so forth, 
> resides within this mount point.
> [image: Untitled11.png]
>   We enforce disk quotas for individual users, and our goal is to monitor 
> each user's disk usage and trigger alerts to the top 10 users when overall 
> quota exceeds 90%.
>
>
> Challenges:
> 1. Afaik, prometheus monitors the overall storage status and the 
> mountpoint information, so individual user's disk consumption is not being  
> tracked by Prometheus. Example - 
> [image: Untitled12.png]
>
> a) Do i need to write custom exporter here which uses du -sh to figure out 
> the disk usage  ? where 
> user_disk_usage_bytes{*username="ravi"*} 390000    
>
> b) or node exporter can do this?
>
>
>
>
> after data collection, i need to deal with alerting rule 
> 2. Here is the alert condition on the custom exporter-
>
> *condition1:* can help determine the users who have high usage
> topk*( * user_disk_usage_bytes*  /  * *scalar(*
> node_filesystem_size_bytes{instance="server1:9100",mountpoint='/'}*) ) *
>
> *condition2:*  this can help determine if the usage has reached 90% 
> (available space less than 10%)
>  (    node_filesystem_avail_bytes{instance="server1:9100",mountpoint='/'}  
> /   node_filesystem_size_bytes{ instance="server1:9100",mountpoint='/'  }  
>   ) < 0.1
>
> I don't think  *condition1* and *condition2* will work as labels and 
> label values returned by condition1 and condition2 are different.
>
> Is there a way to achieve this with PromQL ?
>
> Now, assuming that i am able to get a list of users if system utilization 
> is 90% as - 
> {username="ravi"}  80
> {username="user1"}  90
> {username="user2"}  70
> {username="user3"}  80
> {username="user4"}  90
>
> the alerting rule will be 
> groups:
> - name: example
>   rules:
>   - alert: Storage space is low on server1
>     expr: *condition1* and *condition2*
>     for: 10m
>     labels: alertname: "Server1's Storage space is running low, Please 
> cleanup the disk space - {{ $labels.username }}"     annotations:
>       summary: "you are using {{ $value }}% space on the / space.please 
> cleanup."
> So i need 3 rules  - 1 each for server1,server2 and server3
>
> 3.  Now alert manager is responsible to sending out the alerts 
> And to send the alert , i think this should be the configuration in 
> current context - 
> [image: Untitled14.png]
> as i have already included username in the alert name , and by default 
> grouping of alert happens by alertname so i think with this setting 1:1 
> email should be sent to each user.
>
>
>
> Apologies for the lengthy post , but I have tried expressing the flow to 
> solve this problem based on my understanding of Prometheus so far.
>
> I would greatly appreciate any insights, recommendations, or best 
> practices i can get can offer in achieving dynamic user disk usage 
> monitoring with Prometheus and Alert Manager.
>
> Thank you in advance .
>
> Best regards,
> Puneet
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/049b709b-8a09-4a49-9a71-f29a24314f30n%40googlegroups.com.

[prometheus-users] Re: User level disk usage monitoring and notification - with prometheus and alertmanager

Reply via email to