This is an automated email from the ASF dual-hosted git repository. astefanutti pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/camel-k.git
commit a63eef06989eeb6c67e475665aad20355c5476f3 Author: Antonin Stefanutti <[email protected]> AuthorDate: Tue Nov 17 10:15:18 2020 +0100 chore(doc): Add Camel K operator monitoring documentation --- docs/modules/ROOT/nav.adoc | 2 + .../{monitoring.adoc => integration.adoc} | 84 +--------- .../ROOT/pages/observability/monitoring.adoc | 141 +---------------- .../modules/ROOT/pages/observability/operator.adoc | 176 +++++++++++++++++++++ 4 files changed, 187 insertions(+), 216 deletions(-) diff --git a/docs/modules/ROOT/nav.adoc b/docs/modules/ROOT/nav.adoc index df04c1d..e0a91a7 100644 --- a/docs/modules/ROOT/nav.adoc +++ b/docs/modules/ROOT/nav.adoc @@ -24,6 +24,8 @@ ** xref:configuration/configmap-secret.adoc[ConfigMap/Secret] * Observability ** xref:observability/monitoring.adoc[Monitoring] +*** xref:observability/operator.adoc[Operator Monitoring] +*** xref:observability/integration.adoc[Integration Monitoring] * xref:traits:traits.adoc[Traits] // Start of autogenerated code - DO NOT EDIT! (trait-nav) ** xref:traits:3scale.adoc[3scale] diff --git a/docs/modules/ROOT/pages/observability/monitoring.adoc b/docs/modules/ROOT/pages/observability/integration.adoc similarity index 64% copy from docs/modules/ROOT/pages/observability/monitoring.adoc copy to docs/modules/ROOT/pages/observability/integration.adoc index 15c6ce7..c5d3294 100644 --- a/docs/modules/ROOT/pages/observability/monitoring.adoc +++ b/docs/modules/ROOT/pages/observability/integration.adoc @@ -1,82 +1,7 @@ -[[monitoring]] -= Camel K Monitoring +[[integration-monitoring]] += Camel K Integration Monitoring -The Camel K monitoring architecture relies on https://prometheus.io[Prometheus] and the eponymous operator. - -The https://github.com/coreos/prometheus-operator[Prometheus Operator] serves to make running Prometheus on top of Kubernetes as easy as possible, while preserving Kubernetes-native configuration options. - -[[prerequisites]] -== Prerequisites - -To take full advantage of the Camel K monitoring capabilities, it is recommended to have a Prometheus Operator instance, that can be configured to integrate Camel K integrations. - -[[kubernetes]] -=== Kubernetes - -You can deploy the Prometheus operator by running: - -[source,sh] ----- -$ kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.38.0/bundle.yaml ----- - -WARNING: Beware this installs the operator in the `default` namespace. You must download the file locally and replace the `namespace` fields to deploy the resources into another namespace. - -Then, you can create a `Prometheus` resource, that the operator will use as configuration to deploy a managed Prometheus instance: - -[source,sh] ----- -$ cat <<EOF | kubectl apply -f - -apiVersion: monitoring.coreos.com/v1 -kind: Prometheus -metadata: - name: prometheus -spec: - serviceMonitorSelector: - matchExpressions: - - key: camel.apache.org/integration - operator: Exists -EOF ----- - -By default, the Prometheus instance discovers applications to be monitored in the same namespace. -You can use the `serviceMonitorNamespaceSelector` field from the `Prometheus` resource to enable cross-namespace monitoring. -You may also need to specify a ServiceAccount with the `serviceAccountName` field, that's bound to a Role with the necessary permissions. - -[[openshift]] -=== OpenShift - -Starting OpenShift 4.3, the Prometheus Operator, that's already deployed as part of the monitoring stack, can be used to https://docs.openshift.com/container-platform/4.3/monitoring/monitoring-your-own-services.html[monitor application services]. -This needs to be enabled by following these instructions: - -. Check whether the `cluster-monitoring-config` ConfigMap object exists in the `openshift-monitoring` project: - - $ oc -n openshift-monitoring edit configmap cluster-monitoring-config - -. If it does not exist, create it: - - $ oc -n openshift-monitoring create configmap cluster-monitoring-config - -. Start editing the cluster-monitoring-config ConfigMap: - - $ oc -n openshift-monitoring edit configmap cluster-monitoring-config - -. Set the `techPreviewUserWorkload` setting to `true` under `data/config.yaml`: -+ -[source,yaml] ----- -apiVersion: v1 -kind: ConfigMap -metadata: - name: cluster-monitoring-config - namespace: openshift-monitoring -data: - config.yaml: | - techPreviewUserWorkload: - enabled: true ----- - -On OpenShift versions prior to 4.3, or if you do not want to change your cluster monitoring stack configuration, you can refer to the <<Kubernetes>> section in order to deploy a separate Prometheus Operator instance. +NOTE: The Camel K monitoring architecture relies on https://prometheus.io[Prometheus] and the eponymous operator. Make sure you've checked the xref:observability/monitoring.adoc#prerequisites[Camel K monitoring prerequisites]. [[instrumentation]] == Instrumentation @@ -179,7 +104,8 @@ spec: EOF ---- -More information can be found in the Prometheus Operator https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/user-guides/alerting.md[Alerting] user guide. You can also find more details in https://docs.openshift.com/container-platform/4.4/monitoring/monitoring-your-own-services.html#creating-alerting-rules_monitoring-your-own-services[Creating alerting rules] from the OpenShift documentation. +More information can be found in the Prometheus Operator https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/user-guides/alerting.md[Alerting] user guide. +You can also find more details in https://docs.openshift.com/container-platform/4.4/monitoring/monitoring-your-own-services.html#creating-alerting-rules_monitoring-your-own-services[Creating alerting rules] from the OpenShift documentation. == Autoscaling diff --git a/docs/modules/ROOT/pages/observability/monitoring.adoc b/docs/modules/ROOT/pages/observability/monitoring.adoc index 15c6ce7..b990ee2 100644 --- a/docs/modules/ROOT/pages/observability/monitoring.adoc +++ b/docs/modules/ROOT/pages/observability/monitoring.adoc @@ -72,145 +72,12 @@ metadata: namespace: openshift-monitoring data: config.yaml: | - techPreviewUserWorkload: - enabled: true + enableUserWorkload: true ---- On OpenShift versions prior to 4.3, or if you do not want to change your cluster monitoring stack configuration, you can refer to the <<Kubernetes>> section in order to deploy a separate Prometheus Operator instance. -[[instrumentation]] -== Instrumentation +=== What's Next -The xref:traits:prometheus.adoc[Prometheus trait] automates the configuration of integration pods to expose a _metrics_ endpoint, that can be discovered and scraped by a Prometheus server. - -The Prometheus trait can be enabled when running an integration, e.g.: - -[source,sh] ----- -$ kamel run -t prometheus.enabled=true ... ----- - -Alternatively, the Prometheus trait can be enabled globally once, by updating the integration platform, e.g.: - -[source,sh] ----- -$ kubectl patch ip camel-k --type=merge -p '{"spec":{"traits":{"prometheus":{"configuration":{"enabled":"true"}}}}}' ----- - -The underlying instrumentation mechanism depends on the configured integration runtime. -As a result, the set of registered metrics, as well as the naming convention they follow, also depends on it. - -=== Main - -When the default, a.k.a. _main_, runtime is configured for the integration, the https://github.com/prometheus/jmx_exporter[JMX exporter] is responsible for collecting and exposing metrics from JMX mBeans. - -A custom configuration for the JMX exporter can be used by setting the `prometheus.configmap` parameter from the Prometheus trait with the name of a ConfigMap containing a `prometheus-jmx-exporter.yaml` key, e.g.: - -[source,sh] ----- -$ kamel run -t prometheus.enabled=true -t prometheus.configmap=<jmx_exporter_config>... ----- - -Otherwise, the Prometheus trait uses a default configuration. - -=== Quarkus - -When the Quarkus runtime is configured for the integration, the xref:latest@camel-quarkus::reference/extensions/microprofile-metrics.adoc[Camel Quarkus MicroProfile Metrics extension] is responsible for collecting and exposing metrics in the https://github.com/OpenObservability/OpenMetrics[OpenMetrics] text format. - -The MicroProfile Metrics extension registers and exposes the following metrics out-of-the-box: - -* https://github.com/eclipse/microprofile-metrics/blob/master/spec/src/main/asciidoc/required-metrics.adoc#required-metrics[JVM and operating system related metrics] - -* xref:latest@camel-quarkus::reference/extensions/microprofile-metrics.adoc#_camel_route_metrics[Camel specific metrics] - -It is possible to extend this set of metrics by using either, or both: - -* The xref:latest@components::microprofile-metrics-component.adoc[MicroProfile Metrics component] - -* The https://github.com/eclipse/microprofile-metrics/blob/master/spec/src/main/asciidoc/app-programming-model.adoc#annotations[MicroProfile Metrics annotations], in external dependencies - -== Discovery - -The Prometheus trait automatically configures the resources necessary for the Prometheus Operator to reconcile, so that the managed Prometheus instance can scrape the integration _metrics_ endpoint. - -By default, the Prometheus trait creates a `ServiceMonitor` resource, with the `camel.apache.org/integration` label, which must match the `serviceMonitorSelector` field from the `Prometheus` resource. -Additional labels can be specified with the `service-monitor-labels` parameter from the Prometheus trait, e.g.: - -[source,sh] ----- -$ kamel run -t prometheus.service-monitor-labels="label_to_be_match_by=prometheus_selector" ... ----- - -The creation of the `ServiceMonitor` resource can be disabled using the `service-monitor` parameter, e.g.: - -[source,sh] ----- -$ kamel run -t prometheus.service-monitor=false ... ----- - -More information can be found in the xref:traits:prometheus.adoc[Prometheus trait] documentation. - -The Prometheus Operator https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/user-guides/getting-started.md#related-resources[getting started] guide documents the discovery mechanism, as well as the relationship between the operator resources. - -In case your integration metrics are not discovered, you may want to rely on https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/troubleshooting.md#troubleshooting-servicemonitor-changes[Troubleshooting `ServiceMonitor` changes]. - -== Alerting - -The Prometheus Operator declares the `AlertManager` resource that can be used to configure _Alertmanager_ instances, along with `Prometheus` instances. - -Assuming an `AlertManager` resource already exists in your cluster, you can register a `PrometheusRule` resource that is used by Prometheus to trigger alerts, e.g.: - -[source,sh] ----- -$ cat <<EOF | kubectl apply -f - -apiVersion: monitoring.coreos.com/v1 -kind: PrometheusRule -metadata: - labels: - prometheus: example - role: alert-rules - name: prometheus-rules -spec: - groups: - - name: camel-k.rules - rules: - - alert: CamelKAlert - expr: application_camel_context_exchanges_failed_total > 0 -EOF ----- - -More information can be found in the Prometheus Operator https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/user-guides/alerting.md[Alerting] user guide. You can also find more details in https://docs.openshift.com/container-platform/4.4/monitoring/monitoring-your-own-services.html#creating-alerting-rules_monitoring-your-own-services[Creating alerting rules] from the OpenShift documentation. - -== Autoscaling - -Integration metrics can be exported for horizontal pod autoscaling (HPA), using the https://github.com/DirectXMan12/k8s-prometheus-adapter[custom metrics Prometheus adapter]. -If you have an OpenShift cluster, you can follow https://docs.openshift.com/container-platform/4.4/monitoring/exposing-custom-application-metrics-for-autoscaling.html[Exposing custom application metrics for autoscaling] to set it up. - -Assuming you have the Prometheus adapter up and running, you can create a `HorizontalPodAutoscaler` resource, e.g.: - -[source,sh] ----- -$ cat <<EOF | kubectl apply -f - -apiVersion: autoscaling/v2beta2 -kind: HorizontalPodAutoscaler -metadata: - name: camel-k-autoscaler -spec: - scaleTargetRef: - apiVersion: camel.apache.org/v1 - kind: Integration - name: example - minReplicas: 1 - maxReplicas: 10 - metrics: - - type: Pods - pods: - metric: - name: application_camel_context_exchanges_inflight_count - target: - type: AverageValue - averageValue: 1k -EOF ----- - -More information can be found in https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/[Horizontal Pod Autoscaler] from the Kubernetes documentation. +- xref:observability/operator.adoc[Camel K operator monitoring] +- xref:observability/integration.adoc[Camel K integration monitoring] diff --git a/docs/modules/ROOT/pages/observability/operator.adoc b/docs/modules/ROOT/pages/observability/operator.adoc new file mode 100644 index 0000000..a46e6f5 --- /dev/null +++ b/docs/modules/ROOT/pages/observability/operator.adoc @@ -0,0 +1,176 @@ +[[operator-monitoring]] += Camel K Operator Monitoring + +NOTE: The Camel K monitoring architecture relies on https://prometheus.io[Prometheus] and the eponymous operator. Make sure you've checked the xref:observability/monitoring.adoc#prerequisites[Camel K monitoring prerequisites]. + +[[installation]] +== Installation + +The `kamel install` command provides the `--monitoring` option flag, that can be used to automatically creates the default resources required to monitor the Camel K operator, e.g.: + +[source,sh] +---- +$ kamel install --monitoring=true +---- + +This creates: + +* a `PodMonitor` resource targeting the operator _metrics_ endpoint, so that the Prometheus server can scrape the <<metrics>> exposed by the operator; +* a `PrometheusRule` resource with default alerting rules based on the exposed metrics. The <<alerting>> provides more details about these default rules. + +The `kamel install` command also provides the `--monitoring-port` option, that can be used to change the port of the operator monitoring endpoint, e.g.: + +[source,sh] +---- +$ kamel install --monitoring=true --monitoring-port=8888 +---- + +You can refer to the <<discovery>> and <<alerting>> sections in case you don't want to rely on the default monitoring configuration. + +[[metrics]] +== Metrics + +The Camel K operator monitoring endpoint exposes the following metrics: + +.Camel K operator metrics +|=== +|Name |Type |Description |Buckets |Labels + +| `camel_k_reconciliation_duration_seconds` +| `HistogramVec` +| Reconciliation request duration +| 0.25s, 0.5s, 1s, 5s +| `namespace`, `group`, `version`, `kind`, `result`: `Reconciled`\|`Errored`\|`Requeued`, `tag`: `""`\|`PlatformError`\|`UserError` + +| `camel_k_build_duration_seconds` +| `HistogramVec` +| Build duration +| 30s, 1m, 1.5m, 2m, 5m, 10m +| `result`: `Succeeded`\|`Error` + +| `camel_k_build_recovery_attempts` +| `Histogram` +| Build recovery attempts +| 0, 1, 2, 3, 4, 5 +| `result`: `Succeeded`\|`Error` + +| `camel_k_build_queue_duration_seconds` +| `Histogram` +| Build queue duration +| 5s, 15s, 30s, 1m, 5m, +| N/A + +| `camel_k_integration_first_readiness_seconds` +| `Histogram` +| Time to first integration readiness +| 5s, 10s, 30s, 1m, 2m +| N/A + +|=== + +[[discovery]] +== Discovery + +A `PodMonitor` resource must be created for the Prometheus Operator to reconcile, so that the managed Prometheus instance can scrape the Camel K operator _metrics_ endpoint. + +As an example, hereafter is the `PodMonitor` resource that is created when executing the `kamel install --monitoring=true` command: + +.operator-pod-monitor.yaml +[source,yaml] +---- +apiVersion: monitoring.coreos.com/v1 +kind: PodMonitor +metadata: + name: camel-k-operator + labels: # <1> + ... +spec: + selector: + matchLabels: # <2> + app: "camel-k" + camel.apache.org/component: operator + podMetricsEndpoints: + - port: metrics +---- +<1> The labels must match the `podMonitorSelector` field from the `Prometheus` resource +<2> This label selector matches the Camel K operator Deployment labels + +The Prometheus Operator https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/user-guides/getting-started.md#related-resources[getting started] guide documents the discovery mechanism, as well as the relationship between the operator resources. + +In case your operator metrics are not discovered, you may want to rely on https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/troubleshooting.md#troubleshooting-servicemonitor-changes[Troubleshooting `ServiceMonitor` changes], which also applies to `PodMonitor` resources troubleshooting. + +[[alerting]] +== Alerting + +NOTE: The Prometheus Operator declares the `AlertManager` resource that can be used to configure _Alertmanager_ instances, along with `Prometheus` instances. The following section assumes an `AlertManager` resource already exists in your cluster. + +A `PrometheusRule` resource can be created for the Prometheus Operator to reconcile, so that the managed AlertManager instance can trigger alerts, based on the metrics exposed by the Camel K operator. + +As an example, hereafter is the alerting rules that are defined in `PrometheusRule` resource that is created when executing the `kamel install --monitoring=true` command: + +.Camel K operator alerts +|=== +|Name |Severity |Description + +| `CamelKReconciliationDuration` +| warning +| More than 10% of the reconciliation requests have their duration above 0.5s over at least 1 min. + +| `CamelKReconciliationFailure` +| warning +| More that 1% of the reconciliation requests have failed over at least 10 min. + +| `CamelKSuccessBuildDuration2m` +| warning +| More that 10% of the successful builds have their duration above 2 min over at least 1 min. + +| `CamelKSuccessBuildDuration5m` +| critical +| More than 1% of the successful builds have their duration above 5 min over at least 1 min. + +| `CamelKBuildError` +| critical +| More that 1% of the builds for have errored over at least 10 min. + +| `CamelKBuildQueueDuration1m` +| warning +| More that 1% of the builds have been queued for more than 1 min over at least 1 min. + +| `CamelKBuildQueueDuration5m` +| critical +| More that 1% of the builds have been queued for more than 5 min over at least 1 min. + +|=== + +You can register your own `PrometheusRule` resources, to be used by Prometheus AlertManager instances to trigger alerts, e.g.: + +[source,yaml] +---- +apiVersion: monitoring.coreos.com/v1 +kind: PrometheusRule +metadata: + name: camel-k-alerts +spec: + groups: + - name: camel-k-alerts + rules: + - alert: CamelKIntegrationTimeToReadiness + expr: | + ( + 1 - sum(rate(camel_k_integration_first_readiness_seconds_bucket{le="60"}[5m])) by (job) + / + sum(rate(camel_k_integration_first_readiness_seconds_count[5m])) by (job) + ) + * 100 + > 10 + for: 1m + labels: + severity: warning + annotations: + message: | + {{ printf "%0.0f" $value }}% of the integrations + for {{ $labels.job }} have their first time to readiness above 1m. +---- + +More information can be found in the Prometheus Operator https://github.com/coreos/prometheus-operator/blob/v0.38.0/Documentation/user-guides/alerting.md[Alerting] user guide. +You can also find more details in https://docs.openshift.com/container-platform/4.4/monitoring/monitoring-your-own-services.html#creating-alerting-rules_monitoring-your-own-services[Creating alerting rules] from the OpenShift documentation.
