The Terraform VMware Cloud Director Supplier v3.11.0 now helps putting in and managing Container Service Extension (CSE) 4.1, with a brand new set of enhancements, the brand new vcd_rde_behavior_invocation
knowledge supply and up to date guides for VMware Cloud
Director customers to deploy the required parts.
On this weblog put up, we might be putting in CSE 4.1 in an current VCD and creating and managing a TKGm cluster.
Getting ready the set up
To start with, we should be sure that all of the stipulations listed within the Terraform VCD Supplier documentation are met. CSE 4.1 requires at the least VCD 10.4.2, we are able to examine our VCD model within the popup that reveals up by clicking the About possibility inside the assistance “(?)” button subsequent to our username within the prime proper nook:

Test that you simply even have ALB controllers out there to be consumed from VMware Cloud Director, because the created clusters require them for load-balancing functions.
Step 1: Putting in the stipulations
Step one of the set up mimics the UI wizard step wherein stipulations are created:

We’ll do that precise step programmatically with Terraform. To do this, let’s clone the terraform-provider-vcd repository so we are able to obtain the required schemas, entities, and examples:
|
git clone https://github.com/vmware/terraform-provider-vcd.git cd terraform–supplier–vcd git checkout v3.11.0 cd examples/container–service–extension/v4.1/set up/step1 |
If we open 3.11-cse-install-2-cse-server-prerequisites.tf
we are able to see that these configuration recordsdata create all of the RDE framework parts that CSE makes use of to work, consuming the schemas which can be hosted within the GitHub repository, plus all of the rights and roles which can be wanted. We gained’t customise something inside these recordsdata, as they create the identical objects because the UI wizard step proven within the above screenshot, which doesn’t enable customization both.
Now we open 3.11-cse-install-3-cse-server-settings.tf
, this one is equal to the next UI wizard step:

We are able to observe that the UI wizard permits us to set some configuration parameters, and if we glance to terraform.tfvars.instance
we are going to observe that the requested configuration values match.
Earlier than making use of all of the Terraform configuration recordsdata which can be out there on this folder, we are going to rename terraform.tfvars.instance
to terraform.tfvars
, and we are going to set the variables with right values. The defaults that we are able to see in variables.tf
and terraform.tfvars.instance
match with these of the UI wizard, which must be good for CSE 4.1. In our case, our VMware Cloud Director has full Web entry, so we aren’t setting any customized Docker registry or certificates right here.
We must also keep in mind that the terraform.tfvars.instance
is asking for a username and password to create a person that might be used to provision API tokens for the CSE Server to run. We additionally go away these as they’re, as we just like the "cse_admin"
username.
As soon as we overview the configuration, we are able to safely full this step by operating:
|
terraform init terraform apply |
The plan ought to show all the weather which can be going to be created. We full the operation (by writing sure
to the immediate) so step one of the set up is completed. This may be simply checked within the UI as now the wizard doesn’t ask us to finish this step, as an alternative, it reveals the CSE Server configuration we simply utilized:

Step 2: Configuring VMware Cloud Director and operating the CSE Server
We transfer to the following step, which is positioned at examples/container-service-extension/v4.1/set up/step2
of our cloned repository.
|
cd examples/container–service–extension/v4.1/set up/step2 |
This step is probably the most customizable one, because it relies on our particular wants. Ideally, because the CSE documentation implies, there must be two Organizations: Options Group
and Tenant Group
, with Web entry so all of the required Docker pictures and packages may be downloaded (or with entry to an inner Docker registry if we had chosen a customized registry within the earlier step).
We are able to examine the totally different recordsdata out there and alter every part that doesn’t match with our wants. For instance, if we already had the Group VDCs created, we may change from utilizing sources to utilizing knowledge sources as an alternative.
In our case, the VMware Cloud Director equipment the place we’re putting in CSE 4.1 is empty, so we have to create every part from scratch. That is what the recordsdata on this folder do, they create a primary and minimal set of parts to make CSE 4.1 work.

Similar as earlier than, we rename terraform.tfvars.instance
to terraform.tfvars
and examine the file contents so we are able to set the proper configuration. As we talked about, organising the variables of this step relies on our wants and the way we wish to arrange the networking, the NSX ALB, and which TKGm OVAs we wish to present to our tenants. We must also remember that some constraints should be met, just like the VM Sizing Insurance policies which can be required for CSE to work being printed to the VDCs, so let’s learn and perceive the set up information for that goal.
As soon as we overview the configuration, we are able to full this step by operating:
|
terraform init terraform apply |
Now we should always overview that the plan is right and matches to what we wish to obtain. It ought to create the 2 required Organizations, our VDCs, and most significantly, the networking configuration ought to enable Web visitors to retrieve the required packages for the TKGm clusters to be provisioned with out points (do not forget that within the earlier step, we didn’t set any inner registry nor certificates). We full the operation (by writing sure
to the immediate) so the second step of the set up is completed.
We are able to additionally double-check that every part is right within the UI, or do a connectivity check by deploying a VM and utilizing the console to ping an outside-world web site.


Cluster creation with Terraform
Provided that we’ve got completed the set up course of and we nonetheless have the cloned repository from the earlier steps, we transfer to examples/container-service-extension/v4.1/cluster
.
cd examples/container-service-extension/v4.1/cluster
The cluster is created by the configuration file 3.11-cluster-creation.tf
, by additionally utilizing the RDE framework. We encourage the readers to examine each the vcd_rde
documentation and the cluster administration information earlier than continuing, because it’s essential to understand how this useful resource works in Terraform, and most significantly, how CSE 4.1 makes use of it.
We’ll open 3.11-cluster-creation.tf
and examine it, to instantly see that it makes use of the JSON template positioned at examples/container-service-extension/v4.1/entities/tkgmcluster.json.template
. That is the payload that the CSE 4.1 RDE requires to initialize a TKGm cluster. We are able to customise this JSON to our wants, for instance, we are going to take away the defaultStorageClassOptions
block from it as we gained’t use storage in our clusters.
The preliminary JSON template tkgmcluster.json.template
seems like this now:
,
“capiYaml”: $capi_yaml
}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
{ “apiVersion”: “capvcd.vmware.com/v1.1”, “form”: “CAPVCDCluster”, “title”: “$title”, “metadata”: “title”: “$title”, “orgName”: “$org”, “web site”: “$vcd_url”, “virtualDataCenterName”: “$vdc” , “spec”: “vcdKe”: “isVCDKECluster”: true, “markForDelete”: $delete, “forceDelete”: $force_delete, “autoRepairOnErrors”: $auto_repair_on_errors, “safe”: “apiToken”: “$api_token”
, “capiYaml”: $capi_yaml
} |
There’s nothing else that we are able to customise there, so we go away it like that.
The subsequent factor that we discover is that we want a sound CAPVCD YAML, we are able to obtain it from right here. We’ll deploy a v1.25.7 Tanzu cluster, so we obtain this one to start out making ready it.
We open it with our editor and add the required snippets as said in the documentation. We begin with the form: Cluster
blocks which can be required by the CSE Server to provision clusters:
|
apiVersion: cluster.x–k8s.io/v1beta1 form: Cluster metadata: title: $CLUSTER_NAME namespace: $TARGET_NAMESPACE labels: # We add this block cluster–function.tkg.tanzu.vmware.com/administration: “” tanzuKubernetesRelease: $TKR_VERSION tkg.tanzu.vmware.com/cluster–title: $CLUSTER_NAME annotations: # We add this block TKGVERSION: $TKGVERSION # … |
We added the 2 labels
and annotations
blocks, with the required placeholders TKR_VERSION
, CLUSTER_NAME
, and TKGVERSION
. These placeholders are used to set the values by way of Terraform configuration.
Now we add the Machine Well being Test block, which is able to enable to make use of one of many new highly effective options of CSE 4.1, that remediates nodes in failed standing by changing them, enabling cluster self-healing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
apiVersion: cluster.x–k8s.io/v1beta1 form: MachineHealthCheck metadata: title: $CLUSTER_NAME namespace: $TARGET_NAMESPACE labels: clusterctl.cluster.x–k8s.io: “” clusterctl.cluster.x–k8s.io/transfer: “” spec: clusterName: $CLUSTER_NAME maxUnhealthy: $MAX_UNHEALTHY_NODE_PERCENTAGE% nodeStartupTimeout: $NODE_STARTUP_TIMEOUTs selector: matchLabels: cluster.x–k8s.io/cluster–title: $CLUSTER_NAME unhealthyConditions: – kind: Prepared standing: Unknown timeout: $NODE_UNKNOWN_TIMEOUTs – kind: Prepared standing: “False” timeout: $NODE_NOT_READY_TIMEOUTs — |
Discover that the timeouts have an s
because the values launched throughout set up have been in seconds. If we hadn’t put the worth in seconds, or we put the worth like 15m
, we are able to take away the s
suffix from these block choices.
Let’s add the final elements, that are most related when specifying customized certificates throughout the set up course of. In form: KubeadmConfigTemplate
we should add the preKubeadmCommands
and useExperimentalRetryJoin
blocks beneath the spec
> customers
part:
|
preKubeadmCommands: – mv /and many others/ssl/certs/custom_certificate_*.crt /usr/native/share/ca–certificates && replace–ca–certificates useExperimentalRetryJoin: true |
In form: KubeadmControlPlane
we should add the preKubeadmCommands
and controllerManager
blocks contained in the kubeadmConfigSpec
part:
|
preKubeadmCommands: – mv /and many others/ssl/certs/custom_certificate_*.crt /usr/native/share/ca–certificates && replace–ca–certificates controllerManager: extraArgs: allow–hostpath–provisioner: “true” |
As soon as it’s accomplished, the ensuing YAML must be just like the one already offered within the examples/cluster
folder, cluster-template-v1.25.7.yaml
, because it makes use of the identical model of Tanzu and has all of those additions already launched. It is a good train to examine whether or not our YAML is right earlier than continuing additional.
After we overview the crafted YAML, let’s create a tenant person with the Kubernetes Cluster Creator
function. This person might be required to provision clusters:
useful resource “vcd_org_user” “cluster_author”
title = “cluster_author”
password = “dummyPassword” # This one must be most likely a smart variable and a bit safer.
function = knowledge.vcd_global_role.k8s_cluster_author.title
|
knowledge “vcd_global_role” “k8s_cluster_author” title = “Kubernetes Cluster Creator”
useful resource “vcd_org_user” “cluster_author” title = “cluster_author” password = “dummyPassword” # This one must be most likely a smart variable and a bit safer. function = knowledge.vcd_global_role.k8s_cluster_author.title
|
Now, we are able to full the customization of the configuration file 3.11-cluster-creation.tf
by renaming terraform.tfvars.instance
to terraform.tfvars
and configuring the parameters of our cluster. Let’s examine ours:
cluster_author_token_file = “cse_cluster_author_api_token.json”
k8s_cluster_name = “instance”
cluster_organization = “tenant_org”
cluster_vdc = “tenant_vdc”
cluster_routed_network = “tenant_net_routed”
control_plane_machine_count = “1”
worker_machine_count = “1”
control_plane_sizing_policy = “TKG small”
control_plane_placement_policy = “”””
control_plane_storage_profile = “*”
worker_sizing_policy = “TKG small”
worker_placement_policy = “”””
worker_storage_profile = “*”
disk_size = “20Gi”
tkgm_catalog = “tkgm_catalog”
tkgm_ova_name = “ubuntu-2004-kube-v1.25.7+vmware.2-tkg.1-8a74b9f12e488c54605b3537acb683bc”
pod_cidr = “100.96.0.0/11”
service_cidr = “100.64.0.0/13”
tkr_version = “v1.25.7—vmware.2-tkg.1”
tkg_version = “v2.2.0”
auto_repair_on_errors = true
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
|
vcd_url = “https://…” cluster_author_user = “cluster_author” cluster_author_password = “dummyPassword”
cluster_author_token_file = “cse_cluster_author_api_token.json”
k8s_cluster_name = “instance” cluster_organization = “tenant_org” cluster_vdc = “tenant_vdc” cluster_routed_network = “tenant_net_routed”
control_plane_machine_count = “1” worker_machine_count = “1”
control_plane_sizing_policy = “TKG small” control_plane_placement_policy = “””” control_plane_storage_profile = “*”
worker_sizing_policy = “TKG small” worker_placement_policy = “””” worker_storage_profile = “*”
disk_size = “20Gi” tkgm_catalog = “tkgm_catalog” tkgm_ova_name = “ubuntu-2004-kube-v1.25.7+vmware.2-tkg.1-8a74b9f12e488c54605b3537acb683bc”
pod_cidr = “100.96.0.0/11” service_cidr = “100.64.0.0/13”
tkr_version = “v1.25.7—vmware.2-tkg.1” tkg_version = “v2.2.0”
auto_repair_on_errors = true |
We are able to discover that control_plane_placement_policy = """"
, that is to keep away from errors once we don’t wish to use a VM Placement Coverage. We are able to examine that the downloaded CAPVCD YAML forces us to position double quotes on this worth when it’s not used.
The tkr_version
and tkg_version
values have been obtained from the already offered in the documentation.
As soon as we’re pleased with the totally different choices, we apply the configuration:
|
terraform init terraform apply |
Now we should always overview the plan as a lot as attainable to forestall errors. It ought to create the vcd_rde
useful resource with the weather we offered.
We full the operation (by writing sure
to the immediate) so the cluster ought to begin getting created. We are able to monitor the method both in UI or with the 2 outputs offered for instance:
output “computed_k8s_cluster_events”
worth = native.has_status && !native.being_deleted ? native.k8s_cluster_computed[”standing”][”vcdKe”][”eventSet”] : null
|
locals tobool(jsondecode(vcd_rde.k8s_cluster_instance.input_entity)[“spec”][“vcdKe”][“forceDelete”]) has_status = lookup(native.k8s_cluster_computed, “standing”, null) != null
output “computed_k8s_cluster_status” worth = native.has_status && !native.being_deleted ? native.k8s_cluster_computed[“standing”][“vcdKe”][“state”] : null
output “computed_k8s_cluster_events” worth = native.has_status && !native.being_deleted ? native.k8s_cluster_computed[“standing”][“vcdKe”][“eventSet”] : null
|
Then we are able to do terraform refresh
as many instances as we wish, to watch the occasions with:
|
terraform output computed_k8s_cluster_status terraform output computed_k8s_cluster_events |
As soon as computed_k8s_cluster_status
states provisioned
, this step might be completed and the cluster might be prepared to make use of. Let’s retrieve the Kubeconfig, which in CSE 4.1 is completed fully otherwise than in 4.0, as we’re required to invoke a Conduct to get it. In 3.11-cluster-creation.tf
we are able to see a commented part that has a vcd_rde_behavior_invocation
knowledge supply. If we uncomment these and do one other terraform apply
, we should always be capable to get the Kubeconfig by operating
|
terraform output kubeconfig |
We are able to put it aside to a file to start out interacting with our cluster and kubectl
.
Cluster replace
Instance use case: we realized that our cluster is simply too small, so we have to scale it up. We’ll arrange 3 employee nodes.
To replace it, we have to make certain that it’s in provisioned
standing. For that, we are able to use the identical mechanism that we used when the cluster creation began:
|
terraform output computed_k8s_cluster_status |
This could show provisioned
. If that’s the case, we are able to proceed with the replace.
As with the cluster creation, we first want to know how the vcd_rde
useful resource works to keep away from errors, so it’s inspired to examine each the vcd_rde
documentation and the cluster administration information earlier than continuing. The essential thought is that we should replace the input_entity
argument with the data that CSE saves within the computed_entity
attribute, in any other case, we may break the cluster.
To do this, we are able to use the next output that may return the computed_entity
attribute:
|
output “computed_k8s_cluster” worth = vcd_rde.k8s_cluster_instance.computed_entity # References the created cluster
|
Then we run this command to reserve it to a file for a greater studying:
|
terraform output –json computed_k8s_cluster > computed.json |
Let’s open computed.json
for inspection. We are able to simply see that it seems just about the identical as tkgmcluster.json.template
however with the addition of an enormous "standing"
object that comprises important details about the cluster. This should be despatched again on updates, so we copy the entire "standing"
object as it’s and we place it within the authentic tkgmcluster.json.template
.
After that, we are able to change worker_machine_count = 1
to worker_machine_count = 3
within the current terraform.tfvars, to finish the replace course of with:
Now it’s essential to confirm and make certain that the output plan reveals that the "standing"
is being added to the input_entity
payload. If that isn’t the case, we should always cease the operation instantly and examine what went flawed. If "standing"
is seen within the plan as being added, you may full the replace operation by writing sure
to the immediate.
Cluster deletion
The principle thought of deleting a TKGm cluster is that we should always not use terraform destroy
for that, even when that’s the first thought we bear in mind. The reason being that the CSE Server creates plenty of components (VMs, Digital Providers, and many others) that may be in an “orphan” state if we simply delete the cluster RDE. We have to let the CSE Server do the cleanup for us.
For that matter, the vcd_rde
current in 3.11-cluster-creation.tf
comprises two particular arguments, that mimic the deletion possibility from UI:
|
delete = false # Make this true to delete the cluster force_delete = false # Make this true to forcefully delete the cluster |
To set off an asynchronous deletion course of we should always change them to true
and execute terraform apply
to carry out an replace. We should additionally introduce the latest "standing"
object to the tkgmcluster.json.template
when making use of, just about like within the replace situation described within the earlier part.
Ultimate ideas
We hope you loved the method of putting in CSE 4.1 in your VMware Cloud Director equipment. For a greater understanding of the method, please learn the present set up and cluster administration guides.