Hive Integration - Retry to Install a cluster
In a case where the cluster installation has failed, the user may want to restart the installation. A failed install may happen for multiple reasons; make sure you understand the root cause before trying again.
To restart the installation, the user will need to:
1. Delete the cluster AgentClusterInstall resource.
2. Delete all the cluster BareMetalHost resources (a single resource in the case of SNO).
3. Recreate the AgentClusterInstall resource that was deleted in step 1
4. Recreate the BareMetalHost resources that were deleted in step 2
:warning: Note:
Recreating AgentClusterInstall will de-register and re-register the backend cluster, which will trigger a discovery image generation for InfraEnv.
Baremetal Agent Controller (a.k.a BMAC) is inspecting InfraEnv for any changes to status.isoDownloadURL, and will pick up the newly generated discovery image.
If you boot your machines in other methods (boot it yourself), make sure you use the new image for that.
This document will capture the changes before (failed install) and after (resources recreated) to demonstrate this change to the image, but you won't have to do that when you reattempt the installation.
Baseline
How may a failed installation look?
For that look at the AgentClusterInstall conditions:
$ kubectl -n test-namespace get agentclusterinstalls.extensions.hive.openshift.io test-agent-cluster-install -o=jsonpath="{.metadata.name}{'\n'}{range .status.conditions[*]}{.type}{'\t'}{.message}{'\n'}"
test-agent-cluster-install
SpecSynced The Spec has been successfully applied
Validated The cluster's validations are passing
RequirementsMet The cluster installation stopped
Completed The installation has failed: cluster has hosts in error
Failed The installation failed: cluster has hosts in error
Stopped The installation has stopped due to error
Capture current discovery image URL
Expect URLs to match.
InfraEnv
$ kubectl -n test-namespace get infraenvs.agent-install.openshift.io test-infraenv -o=jsonpath="{.status.createdTime}{'\n'}{.status.isoDownloadURL}{'\n'}"
2021-06-23T14:24:57Z
https://assisted-service-assisted-installer.apps.ostest.test.metalkube.org/api/assisted-install/v1/clusters/2748ddac-0ac9-489b-a38c-ce0d29d22b02/downloads/image?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbHVzdGVyX2lkIjoiMjc0OGRkYWMtMGFjOS00ODliLWEzOGMtY2UwZDI5ZDIyYjAyIn0.L67oWuxClinXtCiqRcieOS4vAJCFNVztAE_A2TYnBYJawhAox6NfiuxUih2TKwZxbNVCOwLdQXt_5rjYL6Xn5g
BareMetalHost
$ kubectl -n test-namespace get baremetalhosts.metal3.io ostest-extraworker-3 -o=jsonpath="{.spec.image.url}{'\n'}"
https://assisted-service-assisted-installer.apps.ostest.test.metalkube.org/api/assisted-install/v1/clusters/2748ddac-0ac9-489b-a38c-ce0d29d22b02/downloads/image?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbHVzdGVyX2lkIjoiMjc0OGRkYWMtMGFjOS00ODliLWEzOGMtY2UwZDI5ZDIyYjAyIn0.L67oWuxClinXtCiqRcieOS4vAJCFNVztAE_A2TYnBYJawhAox6NfiuxUih2TKwZxbNVCOwLdQXt_5rjYL6Xn5g
Delete and Recreate
- As mentioned above, Delete both
AgentClusterInstalland all the clusterBareMetalHostresources. - Create
AgentClusterInstall - The
InfraEnvcontroller:
3.1. Gets notified for a successful backend cluster registration.
3.2. Reconcile and send a request for the backend to generate a discovery image.
- Inspect
InfraEnvfor:
4.1. Changes to status.isoDownloadURL, cluster-id and token.
4.2. Notice that status.createdTime was updated.
$ kubectl -n test-namespace get infraenvs.agent-install.openshift.io test-infraenv -o=jsonpath="{.status.createdTime}{'\n'}{.status.isoDownloadURL}{'\n'}"
2021-06-24T10:31:16Z
https://assisted-service-assisted-installer.apps.ostest.test.metalkube.org/api/assisted-install/v1/clusters/21ade42e-1c78-48b0-bde7-e875632527c1/downloads/image?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbHVzdGVyX2lkIjoiMjFhZGU0MmUtMWM3OC00OGIwLWJkZTctZTg3NTYzMjUyN2MxIn0.IbVClRJQm8nihs7N2B9hiJ523qioKKqymaxGWkQPCIdnMspx_pfWRUeieYyEVDUExLeBPuFlwb84mLPCuZCLzg
- Inspect AgentClusterInstall conditions
$ kubectl -n test-namespace get agentclusterinstalls.extensions.hive.openshift.io test-agent-cluster-install
-o=jsonpath="{.metadata.name}{'\n'}{range .status.conditions[*]}{.type}{'\t'}{.message}{'\n'}"
test-agent-cluster-install
SpecSynced The Spec has been successfully applied
RequirementsMet The cluster is not ready to begin the installation
Validated The cluster's validations are failing: Single-node clusters must have a single master node and no workers.
Completed The installation has not yet started
Failed The installation has not failed
Stopped The installation is waiting to start or in progress
- Create
BareMetalHostresource(s). Note thatBMACwill wait for theInfraEnvimage to be at least 1 minute old. - Check
BareMetalHostevents:
$ kubectl -n test-namespace describe baremetalhosts.metal3.io ostest-extraworker-3
<...>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Registered 61s metal3-baremetal-controller Registered new host
Normal BMCAccessValidated 50s metal3-baremetal-controller Verified access to BMC
Normal InspectionSkipped 50s metal3-baremetal-controller disabled by annotation
Normal ProfileSet 50s metal3-baremetal-controller Hardware profile set: unknown
Normal ProvisioningStarted 49s metal3-baremetal-controller Image provisioning started for https://assisted-service-assisted-installer.apps.ostest.test.metalkube.org/api/assisted-install/v1/clusters/21ade42e-1c78-48b0-bde7-e875632527c1/downloads/image?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbHVzdGVyX2lkIjoiMjFhZGU0MmUtMWM3OC00OGIwLWJkZTctZTg3NTYzMjUyN2MxIn0.IbVClRJQm8nihs7N2B9hiJ523qioKKqymaxGWkQPCIdnMspx_pfWRUeieYyEVDUExLeBPuFlwb84mLPCuZCLzg
Normal ProvisioningComplete 39s metal3-baremetal-controller Image provisioning completed for https://assisted-service-assisted-installer.apps.ostest.test.metalkube.org/api/assisted-install/v1/clusters/21ade42e-1c78-48b0-bde7-e875632527c1/downloads/image?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbHVzdGVyX2lkIjoiMjFhZGU0MmUtMWM3OC00OGIwLWJkZTctZTg3NTYzMjUyN2MxIn0.IbVClRJQm8nihs7N2B9hiJ523qioKKqymaxGWkQPCIdnMspx_pfWRUeieYyEVDUExLeBPuFlwb84mLPCuZCLzg
- Check backend cluster events:
$ curl -s -k $(kubectl -n test-namespace get agentclusterinstalls.extensions.hive.openshift.io test-agent-cluster-install -o=jsonpath="{.status.debugInfo.eventsURL}") | jq "."
First, expect:
{
"cluster_id": "21ade42e-1c78-48b0-bde7-e875632527c1",
"event_time": "2021-06-24T10:49:19.232Z",
"message": "Started image download (image type is \"minimal-iso\")",
"severity": "info"
}
Lastly, when installed:
{
"cluster_id": "21ade42e-1c78-48b0-bde7-e875632527c1",
"event_time": "2021-06-24T11:34:08.644Z",
"message": "Successfully finished installing cluster test-cluster",
"severity": "info"
}