AlertManager and Lambda

Stuart Pelton Tue, 10 Jan 2023 01:29:48 -0800

Hi Brian,

Thanks for getting back to me, ok so seems that all the above queries when 
run, come back with empty results - does that mean there are no alerts or 
errors or does it mean that there is no data collected?
I can see from the metrics page


# HELP cloudwatch_requests_total API requests made to CloudWatch 
# TYPE cloudwatch_requests_total counter 
cloudwatch_requests_total{action="listMetrics",namespace="AWS/RDS",} 
1164900.0 
cloudwatch_requests_total{action="getMetricStatistics",namespace="Account",} 
3880587.0 
cloudwatch_requests_total{action="listMetrics",namespace="AWS/ApplicationELB",} 
582450.0 
cloudwatch_requests_total{action="getMetricStatistics",namespace="AWS/ApplicationELB",}
 
9809199.0 
cloudwatch_requests_total{action="listMetrics",namespace="AWS/StorageGateway",} 
291225.0 
cloudwatch_requests_total{action="listMetrics",namespace="Account",} 
1456125.0 
cloudwatch_requests_total{action="listMetrics",namespace="System/Linux",} 
291225.0 
cloudwatch_requests_total{action="getMetricStatistics",namespace="AWS/RDS",} 
3.9647125E7 
cloudwatch_requests_total{action="listMetrics",namespace="AWS/Usage",} 
291225.0 
cloudwatch_requests_total{action="getMetricStatistics",namespace="AWS/Lambda",} 
291225.0 
cloudwatch_requests_total{action="listMetrics",namespace="AWS/Lambda",} 
291225.0 


and get the below
# HELP aws_lambda_concurrent_executions_average CloudWatch metric 
AWS/Lambda ConcurrentExecutions Dimensions: [] Statistic: Average Unit: 
Count 
# TYPE aws_lambda_concurrent_executions_average gauge 
aws_lambda_concurrent_executions_average{job="aws_lambda",instance="",} 
4.781021897810219 1673340420000

When i run that it pulls back data within the graph, but if i run 
"aws_lambda_concurrent_executions_average{job="aws_lambda",instance=""," 
and add into the relevant lambda instance its pulls back nothing.

do you have any examples of configs I could take a look at and learn from?

Stu

On Monday, 9 January 2023 at 15:03:03 UTC Brian Candler wrote:

> You'll have to find out which bit isn't working:
> - the data collection from aws?
> - the alerting expression?
> - the alert delivery from alertmanager to pagerduty?
>
> Start by doing PromQL queries in the prometheus' own web interface:
>
>     aws_lambda_errors_sum     # do you get any results?
>
>     aws_lambda_errors_sum{functionname="function-QueueProcessor"}    # do 
> you get any results?
>
>     (aws_lambda_errors_sum{functionname="function-QueueProcessor"} offset 
> 8m) > 0    # do you get any results when there's an error?
>
> (This last expression, by the way, is a bit silly.   By including "offset 
> 8m" you've just delayed the alert by 8 minutes; but you've not done 
> anything to hide spurious alerts.  You'll still get the same alert, just 8 
> minutes late!)
>
> If you *do* get data from the last query at times when there's a problem, 
> then you focus on why your alerts aren't being delivered.  There are 
> metrics from alertmanager(*) which will tell you how many alerts have been 
> received, how many delivery attempts have been made, and how many delivery 
> failures there have been.
>
> If you *don't* get data from these queries, then your problem is with 
> data collection - dig further on that side.
>
> Good luck,
>
> Brian.
>
> (*) Since you didn't show the full prometheus.yml I don't know if you're 
> collecting alertmanager metrics.  You'd need something like this:
>
>   - job_name: alertmanager
>     scrape_interval: 1m
>     static_configs:
>       - targets: ['localhost:9093']
>
> Then you can find out what metrics are being collected with this query:
>
> {job="alertmanager"}
>
> On Monday, 9 January 2023 at 11:50:59 UTC [email protected] 
> wrote:
>
>> Hello all,
>> i have inherited a prometheus system to look after, this was setup by the 
>> person before and just finding my feet with how it all works.
>>
>> So as far as I can see , CloudWatch Exporter gets the info from 
>> cloudwatch > passes to Prometheus then to alert manager which then posts to 
>> in this case PagerDuty.
>>
>> My question is - the Lambda does not seem to report when there is an 
>> issue so unsure if this is setup correctly?
>>
>> Does anyone have any examples of Cloudwatch exporter and prometheus alert 
>> files they can show as an example for scraping Lambda errors, I have the 
>> below setup but they dont seem to work? (sorry noob to this) or is there a 
>> better option than cloudwatch exporter?
>>
>> CloudWatch Exporter file:
>>
>>   - aws_namespace: AWS/Lambda
>>     aws_metric_name: ConcurrentExecutions
>>     aws_dimensions: []
>>     aws_statistics: [Average]
>>   - aws_namespace: AWS/Lambda
>>     aws_metric_name: Errors
>>     aws_dimensions: [FunctionName,Resource]
>>     aws_statistics: [Sum]
>>
>>   - aws_namespace: AWS/Lambda
>>     aws_metric_name: lambda_auth_errors
>>     aws_dimensions: []
>>     aws_statistics: [Sum]
>>
>> Prometheus yml file content
>>
>>   #QUEUEPROCESSOR_ERRORS
>>   - alert: FUNCTION-QUEUEPROCESSOR_ERRORS
>>     expr: (aws_lambda_errors_sum{functionname="function-QueueProcessor"} 
>> offset 8m) > 0
>>     labels:
>>       severity: error
>>       capability: function
>>       service: aws/lambda
>>     annotations:
>>       summary: "Multiple LAMBDA Errors "
>>       description: "There has been more than 1 LAMBDA errors within 30 
>> minutes for Function Capability"
>>       category: "Software/System"
>>       subcategory: "Problem/Bug"
>>       instance: "Function Capability - P"
>>       environment: "Production"
>>
>> AlertManager yml content
>>
>>     #FUNCTION-LAMBDA
>>     - match:
>>         capability: function
>>         service: aws/lambda
>>       receiver: function-lambda
>>
>>
>> #FUNCTION-LAMBDA
>> - name: 'function-lambda'
>>   pagerduty_configs:
>>   - routing_key: 'xxxxxxx'
>>     severity: '{{if .CommonLabels.severity }}{{ .CommonLabels.severity | 
>> toLower}}{{ else }}error{{ end}}'
>>     description: '[FIRING:{{ .Alerts.Firing | len }}] {{ 
>> .CommonAnnotations.summary }}'
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e1fac223-bff8-4773-85bf-bea4af96afc8n%40googlegroups.com.

[prometheus-users] Re: Cloudwatch exporter / Promentheus / AlertManager and Lambda

Reply via email to