# Agent

### Setting Up With Thanos

#### Configuring external\_labels for Thanos

Due to the way we extract node and container information from the Prometheus metrics aggregated by Thanos, we recommend that the keys configured for `podLabel`, `nodeLabel` and `containerLabel` be renamed if already used in your Prometheus external labels.

#### Setting up the thanos receive component

Due to availability concerns with using the sidecar, we HEAVILY RECOMMEND using the `thanos receive` component which emulates all the realtime results of Prometheus that the agent needs to get metrics.

Kindly review the setup docs [here](https://thanos.io/tip/components/receive.md/) and configure the remote\_write sections of your prometheus installation(s).

Here's a sample of the prometheus configuration desired to work with the `thanos receive` component

```yaml
# prometheus.yaml
remote_write:
  url: <thanos-receive-url>/api/v1/receive
  headers:
    - THANOS-TENANT: <replex-cluster-name>
```

#### Configuring the Querier With The Replex Agent

The Querier / Query Gateway is a part of the [thanos components](https://thanos.io/v0.6/thanos/getting-started.md/#components) which provides a Prometheus compatible endpoint that works just fine with the replex agent.

If you have a Thanos instance set up, verify there is support for access to this endpoint using a simple curl command, replace `THANOS_QUERIER_URL` with the url of your thanos instance

```bash
curl $THANOS_QUERIER_URL/api/v1/query?query=up
```

This should provide reasonable output similar to this which indicates that endpoint is prometheus compatible and would work with the agent.

```
up{instance="replex.io:9090", job="prometheus"} 1
up{instance="replex.io:9091", job="pushgateway"} 1
up{instance="replex.io:9093", job="alertmanager"} 1
up{instance="replex.io:9100", job="node"} 1
```

From the previous guide, provide the `THANOS_QUERIER_URL` under the [prometheus.url](https://docs.replex.io/#metric-provider-parameters) variable as depicted in the agent docs presented in this section, and you're up and running with Thanos on Replex.

### Configuring Self-Signed SSL Certificates (On-Prem Only)

For on-prem pushgateway deployments, if the puhgateway is served with a self-signed SSL certificate, the agent may encounter errors when trying to sync with the pushgateway.

To resolve this, you can use the `sslCertificate` Helm chart parameter to pass your certificate into the agent.

Example:

```yaml
sslCertificate: “-----BEGIN CERTIFICATE-----\nMIIC1TCCAb2gAwIBAgIJAKbCs/2knCwGMA0GCSqGSIb3DQEBBQUAMBoxGDAWBgNV\nZAeRdaEZS6Bs\n-----END CERTIFICATE——"
```

#### Filesystem Metrics (Prometheus)

This section is only for setups using prometheus as metrics provider. We use metrics from different sources for collecting PVC informations. The default setting uses [cAdvisor's](https://github.com/google/cadvisor) `kubelet_volume*` metrics.

**cAdvisor**

The default setup uses the `cAdvisor` metrics to get the PVC informations. In that case the `METRICS_FILESYSTEM` environment variable can be left at the default value that is `cadvisor`.

Metrics used:

| Storage Metric | `cAdvisor` Metrics                    |
| -------------- | ------------------------------------- |
| Capacity       | `kubelet_volume_stats_capacity_bytes` |
| Used           | `kubelet_volume_stats_used_bytes`     |

**CSI**

If `kubelet_volume*` metrics are not available and you are using [CSI](https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/) plugins, you must set the `METRICS_FILESYSTEM` environment variable to `csi`. In that case [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) is required. For `csi`, we get the PVC informations from the `node_exporter` and `kube-state-metrics` metrics.

Metrics used:

| Storage Metric | `node_exporter` Metrics                                     | `kube-state-metrics` Metrics      |
| -------------- | ----------------------------------------------------------- | --------------------------------- |
| Capacity       | `node_filesystem_size_bytes`                                | `kube_persistentvolumeclaim_info` |
| Used           | `node_filesystem_size_bytes` - `node_filesystem_free_bytes` | `kube_persistentvolumeclaim_info` |

### Exposed Metrics

The Agent self exposes metrics. The metrics can be accessed via the `/metrics` route on port `:8083`.

| Metric                              | Type    | Labels                           | Description                                                                                 |
| ----------------------------------- | ------- | -------------------------------- | ------------------------------------------------------------------------------------------- |
| `replex_agent_provider_status`      | gauge   | `name`: metrics provider name    | Indicates whether or not a metrics provider is reachable. 1 if it is reachable and 0 if not |
| `replex_agent_sync_duration_count`  | counter | `agent_version`, `response_code` | Total number of times the metrics were synchronized with the Replex server                  |
| `replex_agent_sync_duration_sum`    | gauge   | `agent_version`, `response_code` | The total duration of all sync requests to the Replex server in seconds                     |
| `replex_agent_retry_cache_size`     | gauge   | `cluster_id`                     | Count of cached metrics that are waiting to be re-sent to the replex server                 |
| `replex_agent_failed_metrics_total` | counter | `cluster_id`                     | The total count of once failed metrics                                                      |

### Used Metrics

These are the metrics the agent currently uses:

| Property | Description            | Prometheus                                                             | Instana `(plugin: metric)` | [Stackdriver](https://cloud.google.com/monitoring/api/metrics_kubernetes) | [Datadog](https://docs.datadoghq.com/integrations/kubernetes/) |
| -------- | ---------------------- | ---------------------------------------------------------------------- | -------------------------- | ------------------------------------------------------------------------- | -------------------------------------------------------------- |
| 1        | Container CPU Usage    | container\_cpu\_usage\_seconds\_total                                  | *docker*: cpu.total\_usage | kubernetes.io/container/cpu/core\_usage\_time                             | kubernetes.cpu.usage.total                                     |
| 2        | Container MEM Usage    | container\_memory\_working\_set\_bytes                                 | *docker*: memory.usage     | kubernetes.io/container/memory/used\_bytes                                | kubernetes.memory.working\_set                                 |
| 3        | Node CPU Usage         | node\_cpu\_seconds\_total                                              | *docker*: cpu.total\_usage | kubernetes.io/node/cpu/core\_usage\_time                                  | kubernetes.cpu.usage.total                                     |
| 4        | Node MEM Usage         | node\_memory\_MemTotal\_bytes - node\_memory\_MemAvailable\_bytes      | *docker*: memory.usage     | kubernetes.io/node/memory/used\_bytes                                     | kubernetes.memory.usage                                        |
| 5        | Storage (Capacity)     | kubelet\_volume\_stats\_capacity\_bytes, node\_filesystem\_size\_bytes | -                          | kubernetes.io/pod/volume/total\_bytes                                     | kubernetes.kubelet.volume.stats.capacity\_bytes                |
| 6        | Storage (Used)         | kubelet\_volume\_stats\_used\_bytes, node\_filesystem\_free\_bytes     | -                          | kubernetes.io/pod/volume/used\_bytes                                      | kubernetes.kubelet.volume.stats.used\_bytes                    |
| 7        | Disk Capacity          | container\_fs\_limit\_bytes                                            | -                          | -                                                                         | system.disk.total                                              |
| 8        | Disk Used              | container\_fs\_usage\_bytes                                            | -                          | -                                                                         | system.disk.used                                               |
| 9        | Network I/O (Received) | container\_network\_receive\_bytes\_total                              | *docker*: network.rx.bytes | kubernetes.io/pod/network/received\_bytes\_count                          | kubernetes.network.rx\_bytes                                   |
| 10       | Network I/O (Sent)     | container\_network\_transmit\_bytes\_total                             | *docker*: network.tx.bytes | kubernetes.io/pod/network/sent\_bytes\_count                              | kubernetes.network.tx\_bytes                                   |
| 11       | Disk I/O (Written)     | container\_fs\_writes\_bytes\_total                                    | *docker*: blkio.blk\_write | -                                                                         | kubernetes.io.write\_bytes                                     |
| 12       | Disk I/O (Read)        | container\_fs\_reads\_bytes\_total                                     | *docker*: blkio.blk\_read  | -                                                                         | kubernetes.io.read\_bytes                                      |

### Environment Variables

|    | Variable                          | Required                                        | Default                              | Comment                                                                            |
| -- | --------------------------------- | ----------------------------------------------- | ------------------------------------ | ---------------------------------------------------------------------------------- |
| 1  | REPLEX\_TOKEN                     | Yes                                             |                                      |                                                                                    |
| 2  | METRIC\_PROVIDER                  | Yes                                             |                                      | Options: `prometheus`, `datadog`, `stackdriver`, `instana`, `thanos`               |
| 3  | CLUSTER\_ID                       | If "KUBERNETES\_INFO\_PROVIDER" == "kubernetes" |                                      |                                                                                    |
| 4  | CLUSTER\_NAME                     | If "KUBERNETES\_INFO\_PROVIDER" == "kubernetes" |                                      |                                                                                    |
| 5  | PUSHGATEWAY\_URL                  | No                                              | <https://pushgateway.replex.io/push> |                                                                                    |
| 6  | PROMETHEUS\_SERVER\_URL           | If "METRIC\_PROVIDER" == "prometheus"           |                                      |                                                                                    |
| 7  | DATADOG\_API\_KEY                 | If "METRIC\_PROVIDER" == "datadog"              |                                      |                                                                                    |
| 8  | DATADOG\_APPLICATION\_KEY         | If "METRIC\_PROVIDER" == "datadog"              |                                      |                                                                                    |
| 9  | DATADOG\_SITE                     | No                                              | com                                  | Options: `com`, `eu`                                                               |
| 10 | GCP\_PROJECT\_ID                  | If "METRIC\_PROVIDER" == "stackdriver"          |                                      |                                                                                    |
| 11 | INSTANA\_BASE\_URL                | If "METRIC\_PROVIDER" == "instana"              |                                      | Format: `https://tenant-unit.instana.io`                                           |
| 12 | INSTANA\_API\_TOKEN               | If "METRIC\_PROVIDER" == "instana"              |                                      |                                                                                    |
| 13 | KUBERNETES\_INFO\_PROVIDER        | No                                              | kubernetes                           | Options: `kubernetes`, `instana`                                                   |
| 14 | INSTANA\_CLUSTER\_ID              | No                                              |                                      |                                                                                    |
| 15 | ONLY\_USE\_READY\_NODES           | No                                              | false                                | Track only nodes that are in "Ready" state                                         |
| 16 | PROMETHEUS\_NODE\_LABEL           | No                                              | node                                 | The label that represents the node in the Prometheus metrics                       |
| 17 | PROMETHEUS\_CONTAINER\_LABEL      | No                                              | container                            | The label that represents the container in the Prometheus metrics                  |
| 18 | PROMETHEUS\_POD\_LABEL            | No                                              | pod                                  | The label that represents the pod in the Prometheus metrics                        |
| 19 | CLOUD\_PROVIDER\_OVERRIDE         | No                                              | Detecting automatically              | Options: `aws`, `azure`, `gce`, `custom`, `alibaba`                                |
| 20 | USE\_CONTROL\_PLANE\_COST         | No                                              | false                                | Track costs of the Kubernetes Control Plane                                        |
| 21 | METRICS\_FILESYSTEM               | No                                              | cadvisor                             | Specify the filesystem metric source. Options: `cadvisor`, `csi`                   |
| 22 | SYNC\_INTERVAL\_SECONDS           | No                                              | 300                                  |                                                                                    |
| 23 | LOG\_LEVEL                        | No                                              | 3                                    | Higher value means higher verbosity                                                |
| 24 | METRICS\_RETRY\_INTERVAL\_SECONDS | No                                              | 300                                  |                                                                                    |
| 25 | METRICS\_CACHE\_DISK              | No                                              | true                                 | Cache failed metrics on disk                                                       |
| 26 | METRICS\_CACHE\_DISK\_DIR         | No                                              | /data/metrics                        | Directory to cache metrics if `METRICS_CACHE_DISK` == `true`                       |
| 27 | PROMETHEUS\_BEARER\_TOKEN         | No                                              |                                      | Prometheus server requests bearer token. Only if `METRIC_PROVIDER` == `prometheus` |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.replex.io/concepts/agent.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
