Skip to main content

Metrics

Canary Checker works well with Prometheus and exports metrics for every check, the standard metrics included are:

MetricTypeDescription
canary_checkGaugeSet to 0 when passing and 1 when failing
canary_check_success_countCounter
canary_check_failed_countCounter
canary_check_infoInfo
canary_check_durationHistogramHistogram of canary durations

Some checks like pod and http expose additional metrics.

Custom Metrics

Canary checker can export custom metrics from any check type, replacing and/or consolidating multiple standalone Prometheus Exporters into a single exporter.

In the example below, exchange rates against USD are exported by first calling an HTTP api and then using the values from the JSON response to create the metrics:

exchange-rates-exporter.yaml
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: exchange-rates
spec:
schedule: "every 30 @minutes"
http:
- name: exchange-rates
url: https://api.frankfurter.app/latest?from=USD&to=GBP,EUR,ILS
metrics:
- name: exchange_rate
type: gauge
value: json.rates.GBP
labels:
- name: "from"
value: "USD"
- name: to
value: GBP

- name: exchange_rate
type: gauge
value: json.rates.EUR
labels:
- name: "from"
value: "USD"
- name: to
value: EUR

- name: exchange_rate
type: gauge
value: json.rates.ILS
labels:
- name: "from"
value: "USD"
- name: to
value: ILS
- name: exchange_rate_api
type: histogram
value: elapsed.getMilliseconds()

Which would output:

exchange_rate{from=USD, to=GBP} 0.819
exchange_rate{from=USD, to=EUR} 0.949
exchange_rate{from=USD, to=ILS} 3.849
exchange_rate_api 260.000

Fields

FieldDescriptionSchemeRequired
metrics[].nameName of the metricstringYes
metrics[].valueAn expression to derive the metric value fromCEL with Check Context that returns floatYes
metrics[].typePrometheus Metric Typecounter, guage, histogramYes
metrics[].labels[].nameName of the labelstringYes
metrics[].labels[].valueA static value for the label valuefloat
metrics[].labels[].valueExprAn expression to derive the label value fromCEL with Check Context
metrics[].labels[].labelsLabels for prometheus metric (values can be templated)map[string]string

Expressions can make use of the following variables:

Check Context

FieldsDescriptionScheme
*All fields from the check resultSee result variables section of the check
last_result.resultsThe last result
check.nameCheck namestring
check.descriptionCheck descriptionstring
check.labelsDynamic labels attached to the checkmap[string]string
check.endpointEndpoint (usually a URL)string
check.durationDuration in millisecondsint64
canary.nameCanary namestring
canary.namespaceCanary namespacestring
canary.labelsLabels attached to the canary CRD (if any)map[string]string

Prometheus Operator

The helm chart can install a ServiceMonitor for the prometheus operator, by enabling the serviceMonitor flag

--set serviceMonitor=true

Grafana

Default grafana dashboards are available. After you deploy Grafana, these dashboards can be installed with

--set grafanaDashboards=true --set serviceMonitor=true

Stateful Metrics

Metrics can be generated from time based data, e.g. logs per minute, logins per second by using the output of one check execution as the input to the next.

apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: "container-log-counts"
spec:
# The schedule can be as short or as long as you want, the query will always search for log
# since the last query
schedule: "@every 5m"
http:
- name: container_log_volume
url: "http://elasticsearch.canaries.svc.cluster.local:9200/logstash-*/_search"
headers:
- name: Content-Type
value: application/json
templateBody: true
test:
# if no logs are found, fail the health check
expr: json.?aggregations.logs.doc_count.orValue(0) > 0
# query for log counts by namespace, container and pod that have been created since the last check
body: >-
{
"size": 0,
"aggs": {
"logs": {
"filter": {
"range": {
"@timestamp" : {
{{- if last_result.results.max }}
"gte": "{{ last_result.results.max }}"
{{- else }}
"gte": "now-5m"
{{- end }}
}
}
},
"aggs": {
"age": {
"max": {
"field": "@timestamp"
}
},
"labels": {
"multi_terms": {
"terms": [
{ "field": "kubernetes_namespace_name.keyword"},
{ "field": "kubernetes_container_name.keyword"},
{ "field": "kubernetes_pod_name.keyword"}
],
"size": 1000
}
}
}
}
}
}
transform:
# Save the maximum age for usage in subsequent queries and create a metric for each pair
expr: |
json.orValue(null) != null ?
[{
'detail': { 'max': string(json.?aggregations.logs.age.value_as_string.orValue(last_result().?results.max.orValue(time.Now()))) },
'metrics': json.?aggregations.logs.labels.buckets.orValue([]).map(k, {
'name': "namespace_log_count",
'type': "counter",
'value': double(k.doc_count),
'labels': {
"namespace": k.key[0],
"container": k.key[1],
"pod": k.key[2]
}
})
}].toJSON()
: '{}'

This snippet retrieves the last_result.results.max value from the last execution ensuring data is not duplicated or missed

"@timestamp" : {
{{- if last_result.results.max }}
"gte": "{{ last_result.results.max }}"
{{- else }}
"gte": "now-5m"
{{- end }}
}

The max value is saved in the transform section using:

#...
'detail': { 'max': string(json.?aggregations.logs.age.value_as_string.orValue(last_result().?results.max.orValue(time.Now()))) },
#...