Cluster API Operator

The Cluster API Operator is a Kubernetes Operator designed to empower cluster administrators to handle the lifecycle of Cluster API providers within a management cluster using a declarative approach. It aims to improve user experience in deploying and managing Cluster API, making it easier to handle day-to-day tasks and automate workflows with GitOps.

This operator leverages a declarative API and extends the capabilities of the clusterctl CLI, allowing greater flexibility and configuration options for cluster administrators.

Features

Offers a declarative API that simplifies the management of Cluster API providers and enables GitOps workflows.
Facilitates provider upgrades and downgrades making it more convenient for distributed teams and CI pipelines.
Aims to support air-gapped environments without direct access to GitHub/GitLab.
Leverages controller-runtime configuration API for a more flexible Cluster API providers setup.
Provides a transparent and effective way to interact with various Cluster API components on the management cluster.

Getting started

User guide

This section contains quick start and concepts relevant to a new operator user.

Concepts

CoreProvider

A component responsible for providing the fundamental building blocks of the Cluster API. It defines and implements the main Cluster API resources such as Clusters, Machines, and MachineSets, and manages their lifecycle. This includes:

Defining the main Cluster API resources and their schemas.
Implementing the logic for creating, updating, and deleting these resources.
Managing the overall lifecycle of Clusters, Machines, and MachineSets.
Providing the base upon which other providers like BootstrapProvider and InfrastructureProvider build.

BootstrapProvider

A component responsible for turning a server into a Kubernetes node as well as for:

Generating the cluster certificates, if not otherwise specified
Initializing the control plane, and gating the creation of other nodes until it is complete
Joining control plane and worker nodes to the cluster

ControlPlaneProvider

A component responsible for managing the control plane of a Kubernetes cluster. This includes:

Provisioning the control plane nodes.
Managing the lifecycle of the control plane, including upgrades and scaling.

InfrastructureProvider

A component responsible for the provisioning of infrastructure/computational resources required by the Cluster or by Machines (e.g. VMs, networking, etc.). For example, cloud Infrastructure Providers include AWS, Azure, and Google, and bare metal Infrastructure Providers include VMware, MAAS, and metal3.io.

AddonProvider

A component that extends the functionality of Cluster API by providing a solution for managing the installation, configuration, upgrade, and deletion of Cluster add-ons using Helm charts.

IPAMProvider

A component that manages pools of IP addresses using Kubernetes resources. It serves as a reference implementation for IPAM providers, but can also be used as a simple replacement for DHCP.

Quickstart

This is a quickstart guide for getting Cluster API Operator up and running on your Kubernetes cluster.

For more detailed information, please refer to the full documentation.

Prerequisites

Running Kubernetes cluster.
kubectl for interacting with the management cluster.
Cert Manager for managing operator certificates.
Helm for installing operator on the cluster (optional).

Install and configure Cluster API Operator

Configuring credential for cloud providers

Instead of using environment variables as clusterctl does, Cluster API Operator uses Kubernetes secrets to store credentials for cloud providers. Refer to provider documentation on which credentials are required.

This example uses AWS provider, but the same approach can be used for other providers.

export CREDENTIALS_SECRET_NAME="credentials-secret"
export CREDENTIALS_SECRET_NAMESPACE="default"

kubectl create secret generic "${CREDENTIALS_SECRET_NAME}" --from-literal=AWS_B64ENCODED_CREDENTIALS="${AWS_B64ENCODED_CREDENTIALS}" --namespace "${CREDENTIALS_SECRET_NAMESPACE}"

Installing Cluster API Operator

Add helm repository:

helm repo add capi-operator https://kubernetes-sigs.github.io/cluster-api-operator
helm repo update

Deploy Cluster API components with docker provider using a single command during operator installation

helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system --set infrastructure=docker --set configSecret.name=${CREDENTIALS_SECRET_NAME} --set configSecret.namespace=${CREDENTIALS_SECRET_NAMESPACE}  --wait --timeout 90s

Docker provider can be replaced by any provider supported by clusterctl.

Other options for installing Cluster API Operator are described in installation documentation.

Example API Usage

Deploy latest version of core Cluster API components:

apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: CoreProvider
metadata:
  name: cluster-api
  namespace: capi-system

Deploy Cluster API AWS provider with specific version, custom manager options and flags:

---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
 name: aws
 namespace: capa-system
spec:
 version: v2.1.4
 configSecret:
   name: aws-variables

Installation

This section describes cluster-api-operator components installation instructions.

Prerequisites

Before installing the Cluster API Operator, you must first ensure that cert-manager is installed, as the operator does not manage cert-manager installations. To install cert-manager, run the following command:

kubectl apply -f https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml

Wait for cert-manager to be ready before proceeding.

After cert-manager is successfully installed, you can proceed installing the Cluster API operator.

Plugin installation

The cluster-api-operator plugin can be installed using krew, the kubectl plugin manager.

Prerequisites

krew installed on your system. See the krew installation guide for instructions.

Steps

Add the cluster-api-operator plugin index to krew:

kubectl krew index add operator https://github.com/kubernetes-sigs/cluster-api-operator.git

Install the cluster-api-operator plugin:

kubectl krew install operator/clusterctl-operator

Verify the installation:

kubectl operator

This should print help information for the kubectl operator plugin.

The cluster-api-operator plugin is now installed and ready to use with kubectl.

Optionally: installing as a `clusterctl` plugin

Typically the plugin is installed under ~/.krew/bin/kubectl-operator, which would be present under your $PATH after correct krew installation. If you want to use plugin with clusterctl, you need to rename this file to be prefixed with clusterctl- instead, like so:

cp ~/.krew/bin/kubectl-operator ~/.krew/bin/clusterctl-operator

After that plugin is available to use as a clusterctl plugin:

clusterctl operator --help

Upgrade

To upgrade your plugin with the new release of cluster-api-operator you will need to run:

kubectl krew upgrade

Using Manifests from Release Assets

You can install the Cluster API operator directly by applying the latest release assets:

kubectl apply -f https://github.com/kubernetes-sigs/cluster-api-operator/releases/latest/download/operator-components.yaml

Using Helm Charts

Alternatively, you can install the Cluster API operator using Helm charts:

helm repo add capi-operator https://kubernetes-sigs.github.io/cluster-api-operator
helm repo update
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system

Installing providers using Helm chart

The operator Helm chart supports a "quickstart" option for bootstrapping a management cluster. The user experience is relatively similar to clusterctl init:

helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system --set infrastructure=docker:v1.4.2  --wait --timeout 90s # core Cluster API with kubeadm bootstrap and control plane providers will also be installed

helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system —set infrastructure="docker;azure"  --wait --timeout 90s # core Cluster API with kubeadm bootstrap and control plane providers will also be installed

helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system —set infrastructure="capd-custom-ns:docker:v1.4.2;capz-custom-ns:azure:v1.10.0"  --wait --timeout 90s # core Cluster API with kubeadm bootstrap and control plane providers will also be installed

helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system --set core=cluster-api:v1.4.2 --set controlPlane=kubeadm:v1.4.2 --set bootstrap=kubeadm:v1.4.2  --set infrastructure=docker:v1.4.2  --wait --timeout 90s

For more complex operations, please refer to our API documentation.

Topics

This section contains information about enabling and configuring various features of Cluster API Operator.

Cluster API Provider Lifecycle

This section contains lifecycle operations a user can perform on a provider manifest, such as:

Install
Upgrade
Modify
Delete

Installing a Provider

To install a new Cluster API provider with the Cluster API Operator, create a provider object as shown in the first example API usage for creating the secret with variables and the provider itself.

The operator processes a provider object by applying the following rules:

The CoreProvider is installed first; other providers will be requeued until the core provider exists.
Before installing any provider, the following pre-flight checks are executed:
No other instance of the same provider (same Kind, same name) should exist in any namespace.
The Cluster API contract (e.g., v1beta1) must match the contract of the core provider.
The operator sets conditions on the provider object to surface any installation issues, including pre-flight checks and/or order of installation.
If the FetchConfiguration is not defined, the operator applies the embedded fetch configuration for the given kind and ObjectMeta.Name specified in the Cluster API code.

The installation process, managed by the operator, aligns with the implementation underlying the clusterctl init command and includes these steps:

Fetching provider artifacts (the components.yaml and metadata.yaml files).
Applying image overrides, if any.
Replacing variables in the infrastructure-components from EnvVar and Secret.
Applying the resulting YAML to the cluster.

Differences between the operator and clusterctl init include:

The operator installs one provider at a time while clusterctl init installs a group of providers in a single operation.
The operator stores fetched artifacts in a config map for reuse during subsequent reconciliations.
The operator uses a Secret, while clusterctl init relies on environment variables and a local configuration file.

Upgrading a Provider

To trigger an upgrade for a Cluster API provider, change the spec.Version field. All providers must follow the golden rule of respecting the same Cluster API contract supported by the core provider.

The operator performs the upgrade by:

Deleting the current provider components, while preserving CRDs, namespaces, and user objects.
Installing the new provider components.

Differences between the operator and clusterctl upgrade apply include:

The operator upgrades one provider at a time while clusterctl upgrade apply upgrades a group of providers in a single operation.
With the declarative approach, users are responsible for manually editing the Provider objects' YAML, while clusterctl upgrade apply --contract automatically determines the latest available versions for each provider.

Modifying a Provider

In addition to changing a provider version (upgrades), the operator supports modifying other provider fields such as controller flags and variables. This can be achieved through kubectl edit or kubectl apply to the provider object.

The operation works similarly to upgrades: The current provider instance is deleted while preserving CRDs, namespaces, and user objects. Then, a new provider instance with the updated flags/variables is installed.

Note: clusterctl currently does not support this operation.

Deleting a Provider

To remove the installed providers and all related kubernetes objects just delete the following CRs:

kubectl delete infrastructureprovider azure
kubectl delete coreprovider cluster-api

Configuration

This section contains a list of frequent configuration tasks for CAPI Operator providers.

Air-gapped Environment

To install Cluster API providers in an air-gapped environment using the operator, address the following issues:

Configure the operator for an air-gapped environment:
- Manually fetch and store a helm chart for the operator.
- Provide image overrides for the operator in from an accessible image repository.
Configure providers for an air-gapped environment:
- Provide fetch configuration for each provider from an accessible location (e.g., an internal GitHub repository) or from pre-created ConfigMaps within the cluster.
- Provide image overrides for each provider to pull images from an accessible image repository.

Example Usage:

As an admin, I need to fetch the Azure provider components from within the cluster because I am working in an air-gapped environment.

In this example, there is a ConfigMap in the capz-system namespace that defines the components and metadata of the provider.

The Azure InfrastructureProvider is configured with a fetchConfig specifying the label selector, allowing the operator to determine the available versions of the Azure provider. Since the provider's version is marked as v1.9.3, the operator uses the components information from the ConfigMap with matching label to install the Azure provider.

---
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    provider-components: azure
  name: v1.9.3
  namespace: capz-system
data:
  components: |
    # Components for v1.9.3 YAML go here
  metadata: |
    # Metadata information goes here
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
  name: azure
  namespace: capz-system
spec:
  version: v1.9.3
  configSecret:
    name: azure-variables
  fetchConfig:
    selector:
      matchLabels:
        provider-components: azure

Situation when manifests do not fit into configmap

There is a limit on the maximum size of a configmap - 1MiB. If the manifests do not fit into this size, Kubernetes will generate an error and provider installation fail. To avoid this, you can archive the manifests and put them in the configmap that way.

For example, you have two files: components.yaml and metadata.yaml. To create a working config map you need:

Archive components.yaml using gzip cli tool

gzip -c components.yaml > components.gz

Create a configmap manifest from the archived data

kubectl create configmap v1.9.3 --namespace=capz-system --from-file=components=components.gz --from-file=metadata=metadata.yaml --dry-run=client -o yaml > configmap.yaml

Edit the file by adding "provider.cluster.x-k8s.io/compressed: true" annotation

yq eval -i '.metadata.annotations += {"provider.cluster.x-k8s.io/compressed": "true"}' configmap.yaml

Note: without this annotation operator won't be able to determine if the data is compressed or not.

Add labels that will be used to match the configmap in fetchConfig section of the provider

yq eval -i '.metadata.labels += {"my-label": "label-value"}' configmap.yaml

Create a configmap in your kubernetes cluster using kubectl

kubectl create -f configmap.yaml

Injecting additional manifests

It is possible to inject additional manifests when installing/upgrading a provider. This can be useful when you need to add extra RBAC resources to the provider controller, for example. The field AdditionalManifests is a reference to a ConfigMap that contains additional manifests, which will be applied together with the provider components. The key for storing these manifests has to be manifests. The manifests are applied only once when a certain release is installed/upgraded. If the namespace is not specified, the namespace of the provider will be used. There is no validation of the YAML content inside the ConfigMap.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: additional-manifests
  namespace: capi-system
data:
  manifests: |
    # Additional manifests go here
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: CoreProvider
metadata:
  name: cluster-api
  namespace: capi-system
spec:
  additionalManifests:
    name: additional-manifests

Examples of API Usage

In this section we provide some concrete examples of CAPI Operator API usage for various use-cases.

As an admin, I want to install the aws infrastructure provider with specific controller flags.

apiVersion: v1
kind: Secret
metadata:
 name: aws-variables
 namespace: capa-system
type: Opaque
data:
 AWS_B64ENCODED_CREDENTIALS: ...
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
 name: aws
 namespace: capa-system
spec:
 version: v2.1.4
 configSecret:
   name: aws-variables
 manager:
   # These top level controller manager flags, supported by all the providers.
   # These flags come with sensible defaults, thus requiring no or minimal
   # changes for the most common scenarios.
   metrics:
    bindAddress: ":8181"
   syncPeriod: "500s"
 fetchConfig:
   url: https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases
 deployment:
   containers:
   - name: manager
     args:
      # These are controller flags that are specific to a provider; usage
      # is reserved for advanced scenarios only.
      "--awscluster-concurrency": "12"
      "--awsmachine-concurrency": "11"

As an admin, I want to install aws infrastructure provider but override the container image of the CAPA deployment.

---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
 name: aws
 namespace: capa-system
spec:
 version: v2.1.4
 configSecret:
   name: aws-variables
 deployment:
   containers:
   - name: manager
     imageUrl: "gcr.io/myregistry/capa-controller:v2.1.4-foo"

As an admin, I want to change the resource limits for the manager pod in my control plane provider deployment.

---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: ControlPlaneProvider
metadata:
 name: kubeadm
 namespace: capi-kubeadm-control-plane-system
spec:
 version: v1.4.3
 configSecret: 
   name: capi-variables
 deployment:
   containers:
   - name: manager
     resources:
       limits:
         cpu: 100m
         memory: 30Mi
       requests:
         cpu: 100m
         memory: 20Mi

As an admin, I would like to fetch my azure provider components from a specific repository which is not the default.

---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
 name: myazure
 namespace: capz-system
spec:
 version: v1.9.3
 configSecret:
   name: azure-variables
 fetchConfig:
   url: https://github.com/myorg/awesome-azure-provider/releases

As an admin, I would like to use the default fetch configurations by simply specifying the expected Cluster API provider names such as aws, vsphere, azure, kubeadm, talos, or cluster-api instead of having to explicitly specify the fetch configuration. In the example below, since we are using 'vsphere' as the name of the InfrastructureProvider the operator will fetch it's configuration from url: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/releases by default.

See more examples in the air-gapped environment section

---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
 name: vsphere
 namespace: capv-system
spec:
 version: v1.6.1
 configSecret:
   name: vsphere-variables

Patching provider manifests

Provider manifests can be patched using JSON merge patches. This can be useful when you need to modify the provider manifests that are fetched from the repository. In order to provider manifests spec.ResourcePatches has to be used where an array of patches can be specified:

---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: CoreProvider
metadata:
  name: cluster-api
  namespace: capi-system
spec:
  resourcePatches:
    - |
apiVersion: v1
kind: Service
metadata:
labels:
    test-label: test-value

More information about JSON merge patches can be found here https://datatracker.ietf.org/doc/html/rfc7396

There are couple of rules for the patch to match a manifest:

The kind field must match the target object.
If apiVersion is specified it will only be applied to matching objects.
If metadata.name and metadata.namespace not specified, the patch will be applied to all objects of the specified kind.
If metadata.name is specified, the patch will be applied to the object with the specified name. This is for cluster scoped objects.
If both metadata.name and metadata.namespace are specified, the patch will be applied to the object with the specified name and namespace.

Provider Spec

ProviderSpec: desired state of the Provider, consisting of:
- Version (string): provider version (e.g., "v0.1.0")
- Manager (optional ManagerSpec): controller manager properties for the provider
- Deployment (optional DeploymentSpec): deployment properties for the provider
- ConfigSecret (optional SecretReference): reference to the config secret
- FetchConfig (optional FetchConfiguration): how the operator will fetch components and metadata
YAML example:
```
...
spec:
 version: "v0.1.0"
 manager:
   maxConcurrentReconciles: 5
 deployment:
   replicas: 1
 configSecret:
   name: "provider-secret"
 fetchConfig:
   url: "https://github.com/owner/repo/releases"
...
```
ManagerSpec: controller manager properties for the provider, consisting of:
- ProfilerAddress (optional string): pprof profiler bind address (e.g., "localhost:6060")
- MaxConcurrentReconciles (optional int): maximum number of concurrent reconciles
- Verbosity (optional int): logs verbosity
- FeatureGates (optional map[string]bool): provider specific feature flags
YAML example:
```
...
spec:
 manager:
   profilerAddress: "localhost:6060"
   maxConcurrentReconciles: 5
   verbosity: 1
   featureGates:
     FeatureA: true
     FeatureB: false
...
```

DeploymentSpec: deployment properties for the provider, consisting of:

Replicas (optional int): number of desired pods
NodeSelector (optional map[string]string): node label selector
Tolerations (optional []corev1.Toleration): pod tolerations
Affinity (optional corev1.Affinity): pod scheduling constraints
Containers (optional []ContainerSpec): list of deployment containers
ServiceAccountName (optional string): pod service account
ImagePullSecrets (optional []corev1.LocalObjectReference): list of image pull secrets specified in the Deployment

YAML example:

...
spec:
  deployment:
    replicas: 2
    nodeSelector:
      disktype: ssd
    tolerations:
    - key: "example"
      operator: "Exists"
      effect: "NoSchedule"
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: "example"
              operator: "In"
              values:
              - "true"
    containers:
      - name: "containerA"
        imageUrl: "example.com/repo/image-name:v1.0.0"
        args:
          exampleArg: "value"
 ...

ContainerSpec: container properties for the provider, consisting of:

Name (string): container name
ImageURL (optional string): container image URL
Args (optional map[string]string): extra provider specific flags
Env (optional []corev1.EnvVar): environment variables
Resources (optional corev1.ResourceRequirements): compute resources
Command (optional []string): override container's entrypoint array

YAML example:

...
spec:
  deployment:
    containers:
      - name: "example-container"
        imageUrl: "example.com/repo/image-name:v1.0.0"
        args:
          exampleArg: "value"
        env:
          - name: "EXAMPLE_ENV"
            value: "example-value"
        resources:
          limits:
            cpu: "1"
            memory: "1Gi"
          requests:
            cpu: "500m"
            memory: "500Mi"
        command:
          - "/bin/bash"
...

FetchConfiguration: components and metadata fetch options, consisting of:
- URL (optional string): URL for remote Github repository releases (e.g., "https://github.com/owner/repo/releases")
- Selector (optional metav1.LabelSelector): label selector to use for fetching provider components and metadata from ConfigMaps stored in the cluster
YAML example:
```
...
spec:
  fetchConfig:
    url: "https://github.com/owner/repo/releases"
    selector:
      matchLabels:
...
```
SecretReference: pointer to a secret object, consisting of:

Name (string): name of the secret
Namespace (optional string): namespace of the secret, defaults to the provider object namespace

YAML example:
```
...
spec:
  configSecret:
    name: capa-secret
    namespace: capa-system
...
```

Deleting providers

To remove all installed providers and all related kubernetes objects just delete the following CRs:

kubectl delete coreprovider --all --all-namespaces
kubectl delete infrastructureprovider --all --all-namespaces
kubectl delete bootstrapprovider --all --all-namespaces
kubectl delete controlplaneprovider --all --all-namespaces
kubectl delete ipamprovider --all --all-namespaces
kubectl delete addonprovider --all --all-namespaces

Basic Cluster API provider installation

This section provides an example to a CAPZ provider installation.

Installing the CoreProvider

The first step is to install the CoreProvider, which is responsible for managing the Cluster API CRDs and the Cluster API controller.

You can utilize any existing namespace for providers in your Kubernetes operator. However, before creating a provider object, make sure the specified namespace has been created. In the example below, we use the capi-system namespace. You can create this namespace through either the Command Line Interface (CLI) by running kubectl create namespace capi-system, or by using the declarative approach described in the official Kubernetes documentation.

Example:

apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: CoreProvider
metadata:
  name: cluster-api
  namespace: capi-system
spec:
  version: v1.4.3

Note: Only one CoreProvider can be installed at the same time on a single cluster.

Installing Azure Infrastructure Provider

Next, install Azure Infrastructure Provider. Before that ensure that capz-system namespace exists.

Since the provider requires variables to be set, create a secret containing them in the same namespace as the provider. It is also recommended to include a github-token in the secret. This token is used to fetch the provider repository, and it is required for the provider to be installed. The operator may exceed the rate limit of the GitHub API without the token. Like clusterctl, the token needs only the repo scope.

---
apiVersion: v1
kind: Secret
metadata:
  name: azure-variables
  namespace: capz-system
type: Opaque
stringData:
  AZURE_CLIENT_ID_B64: Zm9vCg==
  AZURE_CLIENT_SECRET_B64: Zm9vCg==
  AZURE_SUBSCRIPTION_ID_B64: Zm9vCg==
  AZURE_TENANT_ID_B64: Zm9vCg==
  github-token: ghp_fff
---
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: InfrastructureProvider
metadata:
 name: azure
 namespace: capz-system
spec:
 version: v1.9.3
 configSecret:
   name: azure-variables

Developer

This section contains regular developer tasks, such as:

Release
Development guide
Version migration

Releasing New Versions

This document describes the release process for the Cluster API Operator.

Create a new release branch and cut a release tag.

git checkout -b release-0.1
git push -u upstream release-0.1

# Export the tag of the release to be cut, e.g.:
export RELEASE_TAG=v0.1.1

# Create tags locally
# Warning: The test tag MUST NOT be an annotated tag.
git tag -s -a ${RELEASE_TAG} -m ${RELEASE_TAG}
git tag test/${RELEASE_TAG}

# Push tags
# Note: `upstream` must be the remote pointing to `github.com/kubernetes-sigs/cluster-api-operator`.
git push upstream ${RELEASE_TAG}
git push upstream test/${RELEASE_TAG}

Note: You may encounter an ioctl error during tagging. To resolve this, you need to set the GPG_TTY environment variable as export GPG_TTY=$(tty).

This will trigger a release GitHub action that creates a release with operator components and the Helm chart. Concurrently, a Prow job will start to publish operator images to the staging registry.

Wait for the images to appear in the staging registry.
Create a GitHub Personal access token if you don't already have one. We're going to use this for opening a PR to promote the images from staging to production.

export GITHUB_TOKEN=<your GH token>
export USER_FORK=<your GH account name>
make promote-images

After it has been tested, merge the PR and verify that the image is present in the production registry.

docker pull registry.k8s.io/capi-operator/cluster-api-operator:${RELEASE_TAG}

Switch back to the main branch and update index.yaml and clusterctl-operator.yaml. These are the sources for the operator Helm chart repository and the local krew plugin manifest index, respectively.

git checkout main
make update-helm-plugin-repo

Create a PR with the changes.

Setup jobs and dashboards for a new release branch

The goal of this task is to have test coverage for the new release branch and results in testgrid. We are currently running CI jobs only in main and latest stable release branch (i.e release-0.5 will be used as an example below) and all configurations are hosted in test-infra repo.

Create new jobs based on the jobs running against our main branch:
1. Copy test-infra/config/jobs/kubernetes-sigs/cluster-api-operator/cluster-api-operator-periodics-main.yaml to test-infra/config/jobs/kubernetes-sigs/cluster-api-operator/cluster-api-operator-periodics-release-0-5.yaml.
2. Copy test-infra/config/jobs/kubernetes-sigs/cluster-api-operator/cluster-api-operator-presubmits-main.yaml to test-infra/config/jobs/kubernetes-sigs/cluster-api-operator/cluster-api-operator-presubmits-release-0-5.yaml.
3. Modify the following:
  1. Rename the jobs, e.g.: periodic-cluster-api-operator-test-main => periodic-cluster-api-operator-test-release-0-5.
  2. Change annotations.testgrid-dashboards to sig-cluster-lifecycle-cluster-api-operator-0.5.
  3. Change annotations.testgrid-tab-name, e.g. capi-operator-test-main => capi-operator-test-release-0-5.
  4. For periodics additionally:
    - Change extra_refs[].base_ref to release-0.5 (for repo: cluster-api-operator).
  5. For presubmits additionally: Adjust branches: ^main$ => ^release-0.5$.
Create a new dashboard for the new branch in: test-infra/config/testgrids/kubernetes/sig-cluster-lifecycle/config.yaml (dashboard_groups and dashboards).
- Add a new entry sig-cluster-lifecycle-cluster-api-operator-0.5 in both dashboard_groups and dashboards lists.
Remove tests for previous release branch.
- For example, let's assume we just created tests for v0.5, then we can now drop test coverage for the release-0.4 branch.
Verify the jobs and dashboards a day later by taking a look at: https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-operator-0.5.

Prior art: https://github.com/kubernetes/test-infra/pull/30372

Version migration

This section provides an overview of relevant changes between versions of Cluster API Operator and their direct successors.

v1alpha1 to v1alpha2

Cluster API Operator v1alpha1 compared to v1alpha2

This document provides an overview over relevant changes between Cluster API Operator API v1alpha1 and v1alpha2 for consumers of our Go API.

Changes by Kind

The changes below affect all v1alpha1 provider kinds: CoreProvider, ControlPlaneProvider, BootstrapPrivider and InfrastructureProvider.

API Changes

This section describes changes that were introduced in v1alpha2 API and how to update your templates to the new version.

ImageMeta -> imageURL conversion

In v1alpha1 we use ImageMeta object that consists of 3 parts:

Repository (optional string): image registry (e.g., "example.com/repo")
Name (optional string): image name (e.g., "provider-image")
Tag (optional string): image tag (e.g., "v1.0.0")

In v1alpha2 it is just a string, which represents the URL, e.g. example.com/repo/image-name:v1.0.0.

Example:

v1alpha1

spec:
 deployment:
   containers:
   - name: manager
     image:
       repository: "example.com/repo"
       name: "image-name"
       tag: "v1.0.0"

v1alpha2

spec:
 deployment:
   containers:
   - name: manager
     imageURL: "example.com/repo/image-name:v1.0.0"

secretName/secretNamespace -> configSecret conversion

In v1alpha1 we have 2 separate top-level fields to point to a config secret: secretName and secretNamespace. In v1alpha2 we reworked them into an object configSecret that has 2 fields: name and namespace.

Example:

v1alpha1

spec:
 secretName: azure-variables
 secretNamespace: capz-system

v1alpha2

spec:
 configSecret:
   name: azure-variables
   namespace: capz-system

Developer Guide

Prerequisites

Docker

Iterating on the Cluster API Operator involves repeatedly building Docker containers.

A Cluster

You'll likely want an existing cluster as your management cluster. The easiest way to do this is with kind v0.9 or newer, as explained in the quick start.

Make sure your cluster is set as the default for kubectl. If it's not, you will need to modify subsequent kubectl commands below.

kubectl

kubectl for interacting with the management cluster.

Helm

Helm for installing operator on the cluster (optional).

A container registry

If you're using kind, you'll need a way to push your images to a registry so they can be pulled. You can instead side-load all images, but the registry workflow is lower-friction.

Most users test with GCR, but you could also use something like Docker Hub. If you choose not to use GCR, you'll need to set the REGISTRY environment variable.

Kustomize

You'll need to install kustomize. There is a version of kustomize built into kubectl, but it does not have all the features of kustomize v3 and will not work.

Kubebuilder

You'll need to install kubebuilder.

Cert-Manager

You'll need to deploy cert-manager components on your management cluster, using kubectl

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.2/cert-manager.yaml

Ensure the cert-manager webhook service is ready before creating the Cluster API Operator components.

This can be done by following instructions for manual verification from the cert-manager website. Note: make sure to follow instructions for the release of cert-manager you are installing.

Development

Option 1: Tilt

Tilt is a tool for quickly building, pushing, and reloading Docker containers as part of a Kubernetes deployment.

Once you have a running Kubernetes cluster, you can run:

tilt up

That's it! Tilt will automatically reload the deployment to your local cluster every time you make a code change.

Option 2: The kustomize way

# Build all the images
make docker-build

# Push images
make docker-push

# Apply the manifests
kustomize build config/default | ./hack/tools/bin/envsubst | kubectl apply -f -

Reference

API Reference

Cluster API Operator currently exposes the following APIs:

Cluster API Operator Custom Resource Definitions (CRDs): documentation
Golang APIs: godoc

Glossary

The lexicon used in this document is described in more detail here. Any discrepancies should be rectified in the main Cluster API glossary.

Code of Conduct

Kubernetes Community Code of Conduct

Please refer to our Kubernetes Community Code of Conduct

Contributing

Contributing Guidelines

Welcome to Kubernetes. We are excited about the prospect of you joining our community! The Kubernetes community abides by the CNCF code of conduct. Here is an excerpt:

As contributors and maintainers of this project, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.

Getting Started

We have full documentation on how to get started contributing here:

Contributor License Agreement Kubernetes projects require that you sign a Contributor License Agreement (CLA) before we can accept your pull requests
Kubernetes Contributor Guide - Main contributor documentation, or you can just jump directly to the contributing section
Contributor Cheat Sheet - Common resources for existing developers

Mentorship

Mentoring Initiatives - We have a diverse set of mentorship programs available that are always looking for volunteers!

CI Jobs

This document intends to provide an overview over our jobs running via Prow, GitHub actions and Google Cloud Build. It also documents the cluster-api-operator specific configuration in test-infra.

Builds and Tests running on the main branch

NOTE: To see which test jobs execute which tests or e2e tests, you can click on the links which lead to the respective test overviews in testgrid.

The dashboards for the ProwJobs can be found here: https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-operator

More details about ProwJob configurations can be found here.

Presubmits

Prow Presubmits:

mandatory for merge, always run:
- pull-cluster-api-operator-build-main ./scripts/ci-build.sh
- pull-cluster-api-operator-make-main ./scripts/ci-make.sh
- pull-cluster-api-operator-verify-main ./scripts/ci-verify.sh
mandatory for merge, run if go code changes:
- pull-cluster-api-operator-test-main ./scripts/ci-test.sh
- pull-cluster-api-operator-e2e-main ./scripts/ci-e2e.sh
optional for merge, run if go code changes:
- pull-cluster-api-operator-apidiff-main ./scripts/ci-apidiff.sh

GitHub Presubmit Workflows:

PR golangci-lint: golangci/golangci-lint-action
- Runs golangci-lint. Can be run locally via make lint.
PR verify: kubernetes-sigs/kubebuilder-release-tools verifier
- Verifies the PR titles have a valid format, i.e. contains one of the valid icons.
- Verifies the PR description is valid, i.e. is long enough.
PR dependabot (run on dependabot PRs)
- Regenerates Go modules and code.

Other Github workflows

release (runs when tags are pushed)
- Creates a GitHub release with release notes for the tag.
book publishing
- Deploys operator book to GitHub Pages

Postsubmits

Prow Postsubmits:

post-cluster-api-operator-push-images Google Cloud Build: make release-staging

Periodics

Prow Periodics:

periodic-cluster-api-operator-test-main ./scripts/ci-test.sh
periodic-cluster-api-operator-e2e-main ./scripts/ci-e2e.sh

Test-infra configuration

config/jobs/image-pushing/k8s-staging-cluster-api.yaml
- Configures postsubmit job to push images and manifests.
config/jobs/kubernetes-sigs/cluster-api-operator/
- Configures Cluster API Operator presubmit and periodic jobs.
config/testgrids/kubernetes/sig-cluster-lifecycle/config.yaml
- Configures Cluster API Operator testgrid dashboards.
config/prow/plugins.yaml
- approve: disable auto-approval of PR authors, ignore GitHub reviews (/approve is explicitly required)
- lgtm: enables retaining lgtm through squash
- require_matching_label: configures needs-triage
- plugins: enables require-matching-label plugin
- external_plugins: enables cherrypicker plugin
label_sync/labels.yaml
- Configures labels for the cluster-api-operator repository.

Provider List

The Cluster API Operator introduces new API types: CoreProvider, BootstrapProvider, ControlPlaneProvider, InfrastructureProvider, AddonProvider and IPAMProvider. These five provider types share common Spec and Status types, ProviderSpec and ProviderStatus, respectively.

The CRDs are scoped to be namespaced, allowing RBAC restrictions to be enforced if needed. This scoping also enables the installation of multiple versions of controllers (grouped within namespaces) in the same management cluster.

Related Golang structs can be found in the Cluster API Operator repository.

Below are the new API types being defined, with shared types used for Spec and Status among the different provider types—Core, Bootstrap, ControlPlane, and Infrastructure:

CoreProvider

type CoreProvider struct {
  metav1.TypeMeta   `json:",inline"`
  metav1.ObjectMeta `json:"metadata,omitempty"`

  Spec   ProviderSpec   `json:"spec,omitempty"`
  Status ProviderStatus `json:"status,omitempty"`
}

BootstrapProvider

type BootstrapProvider struct {
  metav1.TypeMeta   `json:",inline"`
  metav1.ObjectMeta `json:"metadata,omitempty"`

  Spec   ProviderSpec   `json:"spec,omitempty"`
  Status ProviderStatus `json:"status,omitempty"`
}

ControlPlaneProvider

type ControlPlaneProvider struct {
  metav1.TypeMeta   `json:",inline"`
  metav1.ObjectMeta `json:"metadata,omitempty"`

  Spec   ProviderSpec   `json:"spec,omitempty"`
  Status ProviderStatus `json:"status,omitempty"`
}

InfrastructureProvider

type InfrastructureProvider struct {
  metav1.TypeMeta   `json:",inline"`
  metav1.ObjectMeta `json:"metadata,omitempty"`

  Spec   ProviderSpec   `json:"spec,omitempty"`
  Status ProviderStatus `json:"status,omitempty"`
}

AddonProvider

type AddonProvider struct {
 metav1.TypeMeta   `json:",inline"`
 metav1.ObjectMeta `json:"metadata,omitempty"`

 Spec   AddonProviderSpec   `json:"spec,omitempty"`
 Status AddonProviderStatus `json:"status,omitempty"`
}

IPAMProvider

type IPAMProvider struct {
 metav1.TypeMeta   `json:",inline"`
 metav1.ObjectMeta `json:"metadata,omitempty"`

 Spec   IPAMProviderSpec   `json:"spec,omitempty"`
 Status IPAMProviderStatus `json:"status,omitempty"`
}

The following sections provide details about ProviderSpec and ProviderStatus, which are shared among all the provider types.

Provider Status

ProviderStatus: observed state of the Provider, consisting of:

Contract (optional string): core provider contract being adhered to (e.g., "v1beta1")
Conditions (optional clusterv1.Conditions): current service state of the provider
ObservedGeneration (optional int64): latest generation observed by the controller

InstalledVersion (optional string): version of the provider that is installed

YAML example:

status:
  contract: "v1beta1"
  conditions:
    - type: "Ready"
      status: "True"
      reason: "ProviderAvailable"
      message: "Provider is available and ready"
  observedGeneration: 1
  installedVersion: "v0.1.0"