Thank you Brian,
I will explore the group_left query once i have the data flowing in .
The default node exporter has the ability to report the disk usage at user
level in my context? - by extending it via any flag ( i came across the
text collector and i plan to explore that.)
or writing the custom exporter would be the optimal workaround?
Regards
Puneet
On Thursday 29 February 2024 at 14:53:01 UTC+5:30 Brian Candler wrote:
> > I don't think *condition1* and *condition2* will work as labels and
> label values returned by condition1 and condition2 are different.
>
> condition1 if on (instance,mountpoint) group_left(username) condition2
>
> This assumes that the both expressions have "instance" and "mountpoint"
> labels; these are the only ones considered when matching. It also assumes
> there is a many-to-1 relationship from the left-hand size (users) to right
> hand side (filesystem), and that there is a label "username" that you would
> like carried forward from the LHS into the result.
>
> > So i need 3 rules - 1 each for server1,server2 and server3
>
> I don't think so. The vector of results can include values for each
> (user,filesystem,instance) on the LHS, and each (filesystem,instnace) on
> the RHS, and alert separately for every filesystem that reaches 90%.
>
> On Wednesday 28 February 2024 at 22:55:11 UTC+7 Puneet Singh wrote:
>
>> Hi All,
>> I have a monitoring requirement related to the user level disk usage and
>> alerting. And i am wondering if prometheus is the correct tool to handle
>> this requirement or,
>> a custom python script (whish uses os, subprocess, smtp module) to
>> handle monitoring and alerting will be optimial solution in this context?
>>
>>
>> Here is the problem description -
>> In our setup we have 3 servers we have a single mount point "/", and
>> each user's directory, such as "/home/user1", "/home/user2", and so forth,
>> resides within this mount point.
>> [image: Untitled11.png]
>> We enforce disk quotas for individual users, and our goal is to monitor
>> each user's disk usage and trigger alerts to the top 10 users when overall
>> quota exceeds 90%.
>>
>>
>> Challenges:
>> 1. Afaik, prometheus monitors the overall storage status and the
>> mountpoint information, so individual user's disk consumption is not being
>> tracked by Prometheus. Example -
>> [image: Untitled12.png]
>>
>> a) Do i need to write custom exporter here which uses du -sh to figure
>> out the disk usage ? where
>> user_disk_usage_bytes{*username="ravi"*} 390000
>>
>> b) or node exporter can do this?
>>
>>
>>
>>
>> after data collection, i need to deal with alerting rule
>> 2. Here is the alert condition on the custom exporter-
>>
>> *condition1:* can help determine the users who have high usage
>> topk*( * user_disk_usage_bytes* / * *scalar(*
>> node_filesystem_size_bytes{instance="server1:9100",mountpoint='/'}*) ) *
>>
>> *condition2:* this can help determine if the usage has reached 90%
>> (available space less than 10%)
>> (
>> node_filesystem_avail_bytes{instance="server1:9100",mountpoint='/'} /
>> node_filesystem_size_bytes{ instance="server1:9100",mountpoint='/' } )
>> < 0.1
>>
>> I don't think *condition1* and *condition2* will work as labels and
>> label values returned by condition1 and condition2 are different.
>>
>> Is there a way to achieve this with PromQL ?
>>
>> Now, assuming that i am able to get a list of users if system utilization
>> is 90% as -
>> {username="ravi"} 80
>> {username="user1"} 90
>> {username="user2"} 70
>> {username="user3"} 80
>> {username="user4"} 90
>>
>> the alerting rule will be
>> groups:
>> - name: example
>> rules:
>> - alert: Storage space is low on server1
>> expr: *condition1* and *condition2*
>> for: 10m
>> labels: alertname: "Server1's Storage space is running low, Please
>> cleanup the disk space - {{ $labels.username }}" annotations:
>> summary: "you are using {{ $value }}% space on the / space.please
>> cleanup."
>> So i need 3 rules - 1 each for server1,server2 and server3
>>
>> 3. Now alert manager is responsible to sending out the alerts
>> And to send the alert , i think this should be the configuration in
>> current context -
>> [image: Untitled14.png]
>> as i have already included username in the alert name , and by default
>> grouping of alert happens by alertname so i think with this setting 1:1
>> email should be sent to each user.
>>
>>
>>
>> Apologies for the lengthy post , but I have tried expressing the flow to
>> solve this problem based on my understanding of Prometheus so far.
>>
>> I would greatly appreciate any insights, recommendations, or best
>> practices i can get can offer in achieving dynamic user disk usage
>> monitoring with Prometheus and Alert Manager.
>>
>> Thank you in advance .
>>
>> Best regards,
>> Puneet
>>
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/e06a8b93-e006-4e3e-be0f-227a3159b911n%40googlegroups.com.