You'll have to find out which bit isn't working:
- the data collection from aws?
- the alerting expression?
- the alert delivery from alertmanager to pagerduty?

Start by doing PromQL queries in the prometheus' own web interface:

    aws_lambda_errors_sum     # do you get any results?

    aws_lambda_errors_sum{functionname="function-QueueProcessor"}    # do 
you get any results?

    (aws_lambda_errors_sum{functionname="function-QueueProcessor"} offset 
8m) > 0    # do you get any results when there's an error?

(This last expression, by the way, is a bit silly.   By including "offset 
8m" you've just delayed the alert by 8 minutes; but you've not done 
anything to hide spurious alerts.  You'll still get the same alert, just 8 
minutes late!)

If you *do* get data from the last query at times when there's a problem, 
then you focus on why your alerts aren't being delivered.  There are 
metrics from alertmanager(*) which will tell you how many alerts have been 
received, how many delivery attempts have been made, and how many delivery 
failures there have been.

If you *don't* get data from these queries, then your problem is with data 
collection - dig further on that side.

Good luck,

Brian.

(*) Since you didn't show the full prometheus.yml I don't know if you're 
collecting alertmanager metrics.  You'd need something like this:

  - job_name: alertmanager
    scrape_interval: 1m
    static_configs:
      - targets: ['localhost:9093']

Then you can find out what metrics are being collected with this query:

{job="alertmanager"}

On Monday, 9 January 2023 at 11:50:59 UTC [email protected] wrote:

> Hello all,
> i have inherited a prometheus system to look after, this was setup by the 
> person before and just finding my feet with how it all works.
>
> So as far as I can see , CloudWatch Exporter gets the info from cloudwatch 
> > passes to Prometheus then to alert manager which then posts to in this 
> case PagerDuty.
>
> My question is - the Lambda does not seem to report when there is an issue 
> so unsure if this is setup correctly?
>
> Does anyone have any examples of Cloudwatch exporter and prometheus alert 
> files they can show as an example for scraping Lambda errors, I have the 
> below setup but they dont seem to work? (sorry noob to this) or is there a 
> better option than cloudwatch exporter?
>
> CloudWatch Exporter file:
>
>   - aws_namespace: AWS/Lambda
>     aws_metric_name: ConcurrentExecutions
>     aws_dimensions: []
>     aws_statistics: [Average]
>   - aws_namespace: AWS/Lambda
>     aws_metric_name: Errors
>     aws_dimensions: [FunctionName,Resource]
>     aws_statistics: [Sum]
>
>   - aws_namespace: AWS/Lambda
>     aws_metric_name: lambda_auth_errors
>     aws_dimensions: []
>     aws_statistics: [Sum]
>
> Prometheus yml file content
>
>   #QUEUEPROCESSOR_ERRORS
>   - alert: FUNCTION-QUEUEPROCESSOR_ERRORS
>     expr: (aws_lambda_errors_sum{functionname="function-QueueProcessor"} 
> offset 8m) > 0
>     labels:
>       severity: error
>       capability: function
>       service: aws/lambda
>     annotations:
>       summary: "Multiple LAMBDA Errors "
>       description: "There has been more than 1 LAMBDA errors within 30 
> minutes for Function Capability"
>       category: "Software/System"
>       subcategory: "Problem/Bug"
>       instance: "Function Capability - P"
>       environment: "Production"
>
> AlertManager yml content
>
>     #FUNCTION-LAMBDA
>     - match:
>         capability: function
>         service: aws/lambda
>       receiver: function-lambda
>
>
> #FUNCTION-LAMBDA
> - name: 'function-lambda'
>   pagerduty_configs:
>   - routing_key: 'xxxxxxx'
>     severity: '{{if .CommonLabels.severity }}{{ .CommonLabels.severity | 
> toLower}}{{ else }}error{{ end}}'
>     description: '[FIRING:{{ .Alerts.Firing | len }}] {{ 
> .CommonAnnotations.summary }}'
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/bb83720e-e9c3-49bc-a637-4d53d9865b9fn%40googlegroups.com.

Reply via email to