AlertManager and Lambda

Stuart Pelton Tue, 10 Jan 2023 03:03:52 -0800

Hi, found the issue and now its pulling back the lambda data ok - seems the 
config.yml had some spaces in that it didnt like


<phew>

On Tuesday, 10 January 2023 at 09:29:43 UTC Stuart Pelton wrote:

> Hi Brian,
>
> Thanks for getting back to me, ok so seems that all the above queries when 
> run, come back with empty results - does that mean there are no alerts or 
> errors or does it mean that there is no data collected?
> I can see from the metrics page 
>
> # HELP cloudwatch_requests_total API requests made to CloudWatch 
> # TYPE cloudwatch_requests_total counter 
> cloudwatch_requests_total{action="listMetrics",namespace="AWS/RDS",} 
> 1164900.0 
> cloudwatch_requests_total{action="getMetricStatistics",namespace="Account",} 
> 3880587.0 
> cloudwatch_requests_total{action="listMetrics",namespace="AWS/ApplicationELB",}
>  
> 582450.0 
> cloudwatch_requests_total{action="getMetricStatistics",namespace="AWS/ApplicationELB",}
>  
> 9809199.0 
> cloudwatch_requests_total{action="listMetrics",namespace="AWS/StorageGateway",}
>  
> 291225.0 
> cloudwatch_requests_total{action="listMetrics",namespace="Account",} 
> 1456125.0 
> cloudwatch_requests_total{action="listMetrics",namespace="System/Linux",} 
> 291225.0 
> cloudwatch_requests_total{action="getMetricStatistics",namespace="AWS/RDS",} 
> 3.9647125E7 
> cloudwatch_requests_total{action="listMetrics",namespace="AWS/Usage",} 
> 291225.0 
> cloudwatch_requests_total{action="getMetricStatistics",namespace="AWS/Lambda",}
>  
> 291225.0 
> cloudwatch_requests_total{action="listMetrics",namespace="AWS/Lambda",} 
> 291225.0 
>
>
> and get the below
> # HELP aws_lambda_concurrent_executions_average CloudWatch metric 
> AWS/Lambda ConcurrentExecutions Dimensions: [] Statistic: Average Unit: 
> Count 
> # TYPE aws_lambda_concurrent_executions_average gauge 
> aws_lambda_concurrent_executions_average{job="aws_lambda",instance="",} 
> 4.781021897810219 1673340420000
>
> When i run that it pulls back data within the graph, but if i run 
> "aws_lambda_concurrent_executions_average{job="aws_lambda",instance=""," 
> and add into the relevant lambda instance its pulls back nothing.
>
> do you have any examples of configs I could take a look at and learn from?
>
> Stu
>
> On Monday, 9 January 2023 at 15:03:03 UTC Brian Candler wrote:
>
>> You'll have to find out which bit isn't working:
>> - the data collection from aws?
>> - the alerting expression?
>> - the alert delivery from alertmanager to pagerduty?
>>
>> Start by doing PromQL queries in the prometheus' own web interface:
>>
>>     aws_lambda_errors_sum     # do you get any results?
>>
>>     aws_lambda_errors_sum{functionname="function-QueueProcessor"}    # do 
>> you get any results?
>>
>>     (aws_lambda_errors_sum{functionname="function-QueueProcessor"} offset 
>> 8m) > 0    # do you get any results when there's an error?
>>
>> (This last expression, by the way, is a bit silly.   By including "offset 
>> 8m" you've just delayed the alert by 8 minutes; but you've not done 
>> anything to hide spurious alerts.  You'll still get the same alert, just 8 
>> minutes late!)
>>
>> If you *do* get data from the last query at times when there's a 
>> problem, then you focus on why your alerts aren't being delivered.  There 
>> are metrics from alertmanager(*) which will tell you how many alerts have 
>> been received, how many delivery attempts have been made, and how many 
>> delivery failures there have been.
>>
>> If you *don't* get data from these queries, then your problem is with 
>> data collection - dig further on that side.
>>
>> Good luck,
>>
>> Brian.
>>
>> (*) Since you didn't show the full prometheus.yml I don't know if you're 
>> collecting alertmanager metrics.  You'd need something like this:
>>
>>   - job_name: alertmanager
>>     scrape_interval: 1m
>>     static_configs:
>>       - targets: ['localhost:9093']
>>
>> Then you can find out what metrics are being collected with this query:
>>
>> {job="alertmanager"}
>>
>> On Monday, 9 January 2023 at 11:50:59 UTC [email protected] 
>> wrote:
>>
>>> Hello all,
>>> i have inherited a prometheus system to look after, this was setup by 
>>> the person before and just finding my feet with how it all works.
>>>
>>> So as far as I can see , CloudWatch Exporter gets the info from 
>>> cloudwatch > passes to Prometheus then to alert manager which then posts to 
>>> in this case PagerDuty.
>>>
>>> My question is - the Lambda does not seem to report when there is an 
>>> issue so unsure if this is setup correctly?
>>>
>>> Does anyone have any examples of Cloudwatch exporter and prometheus 
>>> alert files they can show as an example for scraping Lambda errors, I have 
>>> the below setup but they dont seem to work? (sorry noob to this) or is 
>>> there a better option than cloudwatch exporter?
>>>
>>> CloudWatch Exporter file:
>>>
>>>   - aws_namespace: AWS/Lambda
>>>     aws_metric_name: ConcurrentExecutions
>>>     aws_dimensions: []
>>>     aws_statistics: [Average]
>>>   - aws_namespace: AWS/Lambda
>>>     aws_metric_name: Errors
>>>     aws_dimensions: [FunctionName,Resource]
>>>     aws_statistics: [Sum]
>>>
>>>   - aws_namespace: AWS/Lambda
>>>     aws_metric_name: lambda_auth_errors
>>>     aws_dimensions: []
>>>     aws_statistics: [Sum]
>>>
>>> Prometheus yml file content
>>>
>>>   #QUEUEPROCESSOR_ERRORS
>>>   - alert: FUNCTION-QUEUEPROCESSOR_ERRORS
>>>     expr: (aws_lambda_errors_sum{functionname="function-QueueProcessor"} 
>>> offset 8m) > 0
>>>     labels:
>>>       severity: error
>>>       capability: function
>>>       service: aws/lambda
>>>     annotations:
>>>       summary: "Multiple LAMBDA Errors "
>>>       description: "There has been more than 1 LAMBDA errors within 30 
>>> minutes for Function Capability"
>>>       category: "Software/System"
>>>       subcategory: "Problem/Bug"
>>>       instance: "Function Capability - P"
>>>       environment: "Production"
>>>
>>> AlertManager yml content
>>>
>>>     #FUNCTION-LAMBDA
>>>     - match:
>>>         capability: function
>>>         service: aws/lambda
>>>       receiver: function-lambda
>>>
>>>
>>> #FUNCTION-LAMBDA
>>> - name: 'function-lambda'
>>>   pagerduty_configs:
>>>   - routing_key: 'xxxxxxx'
>>>     severity: '{{if .CommonLabels.severity }}{{ .CommonLabels.severity | 
>>> toLower}}{{ else }}error{{ end}}'
>>>     description: '[FIRING:{{ .Alerts.Firing | len }}] {{ 
>>> .CommonAnnotations.summary }}'
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7d1863a0-1616-401f-b9a1-296125215c8cn%40googlegroups.com.

[prometheus-users] Re: Cloudwatch exporter / Promentheus / AlertManager and Lambda

Reply via email to