# Quick Start Guide

### Purpose

The Replex Kubernetes Agent collects metadata from the local cluster via the kubernetes API and metrics from a metric provider (a local Prometheus instance, Datadog, Stackdriver or Instana) and sends this information to the replex pushgateway which stores it for further use.

### Requirements

* A Kubernetes cluster to install the agent.
* A metric provider. Either:
  * Prometheus instance that is running in the cluster and accessible via URL.
  * A thanos instance configured with a querier component
  * A Datadog account with API key and Application key provided.
  * A Stackdriver account.
  * An Instana account with an API token.
* Helm 3 for the installation.
* A replex token.

### Required parameters for the installation

If only the Replex agent is to be used for data collection in the clusters, it must be installed in each individual cluster. The `kubernetesInfoProvider` parameter can be left at the default value of `kubernetes`.

The helm installation requires a few mandatory parameters:

| Parameter                      | Description                                                                                                                                                                                                                                                                                                                |
| ------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| cluster.id                     | A string that identifies the cluster uniquely. This is used in order to distinguish two clusters that might have the same name (e.g. If a cluster called "development" is destroyed and later replaced by a new cluster that is also called "development" this ID should be different to identify both clusters uniquely). |
| cluster.name                   | A human readable name for the cluster.                                                                                                                                                                                                                                                                                     |
| replex.token                   | A token that is provided by Replex and is used by the pushgateway to authenticate requests from the agent. If the agent is installed in multiple clusters the same token can be used for all deployments.                                                                                                                  |
| metrics.provider               | The metric provider to be used. Can be either *prometheus*, *thanos*, *stackdriver*, *instana* or *datadog*.                                                                                                                                                                                                               |
| (Optional) pushgateway.url     | The full URL to the replex pushgateway. Format: <https://pushgateway.client.com/push> (Note the `/push` path).                                                                                                                                                                                                             |
| (Optional) onlyUseReadyNodes   | Consider nodes without 'Ready' status as not running at all.                                                                                                                                                                                                                                                               |
| (Optional) useControlPlaneCost | Track costs of the Kubernetes Control Plane.                                                                                                                                                                                                                                                                               |

### Metric Provider Parameters

* **prometheus** or **thanos**:

  | Parameter                 | Description                                                                                                            | Default   |
  | ------------------------- | ---------------------------------------------------------------------------------------------------------------------- | --------- |
  | prometheus.url            | The API URL of the local Prometheus instance, for example <http://prometheus-server.monitoring.svc.cluster.local:9090> |           |
  | prometheus.nodeLabel      | Sets the value which represents the 'node label'. This label is usually `node` or `instance`.                          | node      |
  | prometheus.containerLabel | Sets the value which represents the 'container label'. This label is usually `container` or `container_name`.          | container |
  | prometheus.podLabel       | Sets the value which represents the 'pod label'. This label is usually `pod` or `pod_name`.                            | pod       |

  Parameters `prometheus.*Label` allow overriding default labels in Prometheus installation. This will be the label that contains actual entity name in a Prometheus time-series like `container_cpu_usage_seconds_total`.
* **datadog**:

  | Parameter               | Description                                                                       | Default |
  | ----------------------- | --------------------------------------------------------------------------------- | ------- |
  | datadog.apiKey          | Your Datadog API Key.                                                             |         |
  | datadog.applicationKey  | Your Datadog Application Key.                                                     |         |
  | (Optional) datadog.site | Either `com` if you are on Datadog US site or `eu` if you are on Datadog EU site. | com     |

  **Note**: Retrieve your Datadog keys from [the Datadog settings page](https://app.datadoghq.com/account/settings#api).
* **stackdriver**:

  | Parameter             | Description                                                                                                         | Default |
  | --------------------- | ------------------------------------------------------------------------------------------------------------------- | ------- |
  | stackdriver.projectId | Your GCP project ID. The agent will authenticate with Google using the service account provided by the environment. |         |
* **instana**:

  | Parameter        | Description                                                                    | Default |
  | ---------------- | ------------------------------------------------------------------------------ | ------- |
  | instana.baseUrl  | This is the base URL of a tenant unit, e.g. <https://test-example.instana.io>. |         |
  | instana.apiToken | Valid Instana API token.                                                       |         |

#### Instana Single-Agent Setup

This section is relevant if you already use Instana to monitor your Kubernetes clusters. With a single deployment of the replex Agent, replex can pull all metrics and kubernetes information from the Instana API for all your clusters. This way, you don't need to install the replex Agent on all the clusters that you want to monitor costs for.

To activate this feature it is required to set the Helm parameter **kubernetesInfoProvider** to `instana`. Additionally, the Instana base URL (`instana.baseUrl`) and API Token (`instana.apiToken`) are required.

For the single agent setup, the `cluster.id` and `cluster.name` parameters should not be set since the agent automatically uses the cluster IDs from Instana.

### Installation

1. Add the Replex Helm repository

   ```bash
   helm repo add replex https://registry.replex.io/chartrepo/public
   ```
2. Create a namespace to install the agent in

   ```bash
   kubectl create namespace replex-k8s-agent
   ```
3. Create a file called `values.yaml` to specify Helm parameters. The file should look like this:
   * **prometheus** or **thanos**:

     ```yaml
     cluster:
       id: <cluster-id>
       name: <cluster-name>
     replex:
       token: <replex-token>
     metrics:
       provider: prometheus
     prometheus:
       url: <prometheus-url>
       containerLabel: <prometheus-container-label>
       podLabel: <prometheus-pod-label>
       nodeLabel: <prometheus-node-label>
     ```
   * **datadog**:

     ```yaml
     cluster:
       id: <cluster-id>
       name: <cluster-name>
     replex:
       token: <replex-token>
     metrics:
       provider: datadog
     datadog:
       apiKey: <datadog-api-key>
       applicationKey: <datadog-application-key>
       site: <datadog-site>
     ```
   * **stackdriver**:

     ```yaml
     cluster:
       id: <cluster-id>
       name: <cluster-name>
     replex:
       token: <replex-token>
     metrics:
       provider: stackdriver
     stackdriver:
       projectId: <gcp-project-id>
     ```
   * **instana**:

     ```yaml
     cluster:
       id: <cluster-id>
       name: <cluster-name>
     replex:
       token: <replex-token>
     metrics:
       provider: instana
     instana:
       baseUrl: <base-url>
       apiToken: <api-token>
     ```
4. Installation with Helm

   ```bash
   helm install <release-name> replex/replex-k8s-agent --namespace replex-k8s-agent -f values.yaml
   ```

   Where `release-name` is any arbitrary string to identify the Helm installation e.g. `replex-agent`.
5. Installation without Helm

   You will still need Helm to render the template locally. However, the installation itself will be done with `kubectl` directly, not with Helm:

   ```bash
   helm template replex replex/replex-k8s-agent --namespace replex-k8s-agent -f values.yaml | kubectl apply -f -
   ```

   After these steps the agent will start sending data to the pushgateway.

### Updating

To update the agent, follow these steps:

1. Update the Helm repo.

   ```
   helm repo update
   ```
2. Upgrade the replex release.

   ```
   helm upgrade <release-name> replex/replex-k8s-agent --namespace <namespace> -f values.yaml
   ```

### Logging

The replex-kubernetes-agent runs as a pod in the namespace it was installed in and its logs can be consulted using kubectl.

For example, `kubectl -n replex-k8s-agent logs <pod-name>`

### Retry policy

Replex agent has a retry policy when one of metric push fails. By default, retry policy is turned on, so if one of metric pushes to Pushgateway fails, it will be stored on disk in `persistentVolume.mountPath` dir. If `retry.diskCache` is false, metrics will be stored in agents memory. `retry.intervalSeconds` stands for the interval when agent will try to resend metrics from the cache, by default it is 5 minutes (300s).

To set up retry policy next three variables can be used:

* (Optional) **retry.intervalSeconds**: Interval between a retry to push metrics that failed previously.
* (Optional) **retry.diskCache**: Cache failed metrics on disk.
* (Optional) **persistentVolume.mountPath**: If `retry.diskCache` is true, path to store metrics.

### Chart Values

| Key                               | Type   | Default                                        | Description                                                                                                                                               |
| --------------------------------- | ------ | ---------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| cloudProviderOverride             | string | `""`                                           | force agent to use specified cloud provider                                                                                                               |
| cluster.id                        | string | `""`                                           | Unique cluster identifier                                                                                                                                 |
| cluster.name                      | string | `""`                                           | Cluster name displayed in dashboard                                                                                                                       |
| datadog.apiKey                    | string | `""`                                           | Your Datadog API Key                                                                                                                                      |
| datadog.applicationKey            | string | `""`                                           | Your Datadog Application Key                                                                                                                              |
| datadog.site                      | string | `"com"`                                        | Either `com` if you are on Datadog US site or `eu` if you are on Datadog EU site.                                                                         |
| extraInitContainers               | object | `{}`                                           | Add init containers to replex agent container                                                                                                             |
| extraVolumeMounts                 | object | `{}`                                           | Add extra volumeMounts to replex agent container                                                                                                          |
| extraVolumes                      | list   | `[]`                                           | Add extra volumes to replex agent container                                                                                                               |
| image.pullPolicy                  | string | `"Always"`                                     | Image pull policy                                                                                                                                         |
| image.repository                  | string | `"registry.replex.io/public/replex-k8s-agent"` | Image repository                                                                                                                                          |
| image.tag                         | string | `""`                                           | Override AppVersion with specific tag from image repository                                                                                               |
| instana.apiToken                  | string | `""`                                           | Valid Instana API token                                                                                                                                   |
| instana.baseUrl                   | string | `""`                                           | Base URL of a tenant unit, of form: <https://test-example.instana.io>                                                                                     |
| kubernetesInfoProvider            | string | `"kubernetes"`                                 | Specify Kubernetes info provider. Options are \`kubernetes' or 'instana'                                                                                  |
| metrics.filesystem                | string | `"cadvisor"`                                   | Either `cadvisor` for using cAdvisor filesystem metrics or `csi` if using CSI drivers. `csi` requires <https://github.com/kubernetes/kube-state-metrics>. |
| metrics.provider                  | string | `""`                                           | Metrics provider to use.                                                                                                                                  |
| nodeSelector                      | object | `{}`                                           | Pod node selector Key-Value pair                                                                                                                          |
| onlyUseReadyNodes                 | bool   | `false`                                        | Consider nodes without 'Ready' status as not running at all.                                                                                              |
| persistentVolume.mountPath        | string | `"/data/metrics"`                              | Persistent volume mount path                                                                                                                              |
| persistentVolume.size             | string | `"10Gi"`                                       | Size of agent persistent volume                                                                                                                           |
| persistentVolume.storageClassName | string | `""`                                           | name of the storage class to configure the PVC on                                                                                                         |
| prometheus.bearerToken            | string | `""`                                           | Bearer token for prometheus server requests authentication.                                                                                               |
| prometheus.containerLabel         | string | `"container"`                                  | The label representing "container name" on prometheus time series.                                                                                        |
| prometheus.nodeLabel              | string | `"node"`                                       | The label representing "node name" on prometheus time series.                                                                                             |
| prometheus.podLabel               | string | `"pod"`                                        | The label representing "pod name" on prometheus time series.                                                                                              |
| prometheus.url                    | string | `""`                                           | URL of the Prometheus instance                                                                                                                            |
| pushgateway.url                   | string | `""`                                           | Full URL to the replex pushgateway. Format: <https://pushgateway.client.com/push> (Note the `/push` path).                                                |
| replex.token                      | string | `""`                                           | Agent authentication token.                                                                                                                               |
| resources.requests.cpu            | string | `"50m"`                                        | Specify the cpu units requests of the agent container                                                                                                     |
| resources.requests.memory         | string | `"100Mi"`                                      | Specify the memory bytes requests of the agent container                                                                                                  |
| retry.diskCache                   | bool   | `true`                                         |                                                                                                                                                           |
| retry.intervalSeconds             | int    | `300`                                          |                                                                                                                                                           |
| securityContext                   | object | `{}`                                           | Deployment security context                                                                                                                               |
| sslCertificate                    | string | `""`                                           | SSL certificate string. Can be used to add a custom ssl certificate to the agent.                                                                         |
| stackdriver.projectId             | string | `""`                                           | Your GCP project ID                                                                                                                                       |
| tokenSecret.create                | bool   | `true`                                         | Set to 'false' to skip creation of Secret object                                                                                                          |
| tokenSecret.key                   | string | `""`                                           | Override key in token Secret object (for custom secret keys)                                                                                              |
| tokenSecret.name                  | string | `""`                                           | Override name of token Secret object                                                                                                                      |
| tolerations                       | list   | `[]`                                           | array of tolerations for the pod scheduling                                                                                                               |
| useControlPlaneCost               | string | `""`                                           | (bool) Track costs of the Kubernetes Control Plane.                                                                                                       |
