[prometheus-users] Re: User level disk usage monitoring and notification - with prometheus and alertmanager

Puneet Singh Thu, 29 Feb 2024 03:46:11 -0800

Thank you Brian,
I will explore the group_left query once i have the data flowing in .


The default node exporter has the ability to report the disk usage at user 
level in my context?  - by extending it via any flag  ( i came across the 
text collector and i plan to explore that.)
 or writing the custom exporter would be the optimal workaround?


Regards
Puneet
On Thursday 29 February 2024 at 14:53:01 UTC+5:30 Brian Candler wrote:

> > I don't think  *condition1* and *condition2* will work as labels and 
> label values returned by condition1 and condition2 are different.
>
> condition1 if on (instance,mountpoint) group_left(username) condition2
>
> This assumes that the both expressions have "instance" and "mountpoint" 
> labels; these are the only ones considered when matching. It also assumes 
> there is a many-to-1 relationship from the left-hand size (users) to right 
> hand side (filesystem), and that there is a label "username" that you would 
> like carried forward from the LHS into the result.
>
> > So i need 3 rules  - 1 each for server1,server2 and server3
>
> I don't think so. The vector of results can include values for each 
> (user,filesystem,instance) on the LHS, and each (filesystem,instnace) on 
> the RHS, and alert separately for every filesystem that reaches 90%.
>
> On Wednesday 28 February 2024 at 22:55:11 UTC+7 Puneet Singh wrote:
>
>> Hi All, 
>> I have a monitoring requirement related to the user level disk usage and 
>> alerting. And i am wondering if prometheus is the correct tool to handle 
>> this requirement or,
>>   a custom python script (whish uses os, subprocess, smtp module)  to 
>> handle monitoring and alerting will be optimial solution in this context?
>>
>>
>> Here is the problem description - 
>> In our setup we have 3 servers we have  a single mount point "/", and 
>> each user's directory, such as "/home/user1", "/home/user2", and so forth, 
>> resides within this mount point.
>> [image: Untitled11.png]
>>   We enforce disk quotas for individual users, and our goal is to monitor 
>> each user's disk usage and trigger alerts to the top 10 users when overall 
>> quota exceeds 90%.
>>
>>
>> Challenges:
>> 1. Afaik, prometheus monitors the overall storage status and the 
>> mountpoint information, so individual user's disk consumption is not being  
>> tracked by Prometheus. Example - 
>> [image: Untitled12.png]
>>
>> a) Do i need to write custom exporter here which uses du -sh to figure 
>> out the disk usage  ? where 
>> user_disk_usage_bytes{*username="ravi"*} 390000    
>>
>> b) or node exporter can do this?
>>
>>
>>
>>
>> after data collection, i need to deal with alerting rule 
>> 2. Here is the alert condition on the custom exporter-
>>
>> *condition1:* can help determine the users who have high usage
>> topk*( * user_disk_usage_bytes*  /  * *scalar(*
>> node_filesystem_size_bytes{instance="server1:9100",mountpoint='/'}*) ) *
>>
>> *condition2:*  this can help determine if the usage has reached 90% 
>> (available space less than 10%)
>>  (    
>> node_filesystem_avail_bytes{instance="server1:9100",mountpoint='/'}  /  
>>  node_filesystem_size_bytes{ instance="server1:9100",mountpoint='/'  }    ) 
>> < 0.1
>>
>> I don't think  *condition1* and *condition2* will work as labels and 
>> label values returned by condition1 and condition2 are different.
>>
>> Is there a way to achieve this with PromQL ?
>>
>> Now, assuming that i am able to get a list of users if system utilization 
>> is 90% as - 
>> {username="ravi"}  80
>> {username="user1"}  90
>> {username="user2"}  70
>> {username="user3"}  80
>> {username="user4"}  90
>>
>> the alerting rule will be 
>> groups:
>> - name: example
>>   rules:
>>   - alert: Storage space is low on server1
>>     expr: *condition1* and *condition2*
>>     for: 10m
>>     labels: alertname: "Server1's Storage space is running low, Please 
>> cleanup the disk space - {{ $labels.username }}"     annotations:
>>       summary: "you are using {{ $value }}% space on the / space.please 
>> cleanup."
>> So i need 3 rules  - 1 each for server1,server2 and server3
>>
>> 3.  Now alert manager is responsible to sending out the alerts 
>> And to send the alert , i think this should be the configuration in 
>> current context - 
>> [image: Untitled14.png]
>> as i have already included username in the alert name , and by default 
>> grouping of alert happens by alertname so i think with this setting 1:1 
>> email should be sent to each user.
>>
>>
>>
>> Apologies for the lengthy post , but I have tried expressing the flow to 
>> solve this problem based on my understanding of Prometheus so far.
>>
>> I would greatly appreciate any insights, recommendations, or best 
>> practices i can get can offer in achieving dynamic user disk usage 
>> monitoring with Prometheus and Alert Manager.
>>
>> Thank you in advance .
>>
>> Best regards,
>> Puneet
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e06a8b93-e006-4e3e-be0f-227a3159b911n%40googlegroups.com.

[prometheus-users] Re: User level disk usage monitoring and notification - with prometheus and alertmanager

Reply via email to