Hi, MOSIP team. I am about to install MOSIP on my local machine. I am following the instructions here On-Prem without DNS Installation Guidelines - MOSIP Docs 1.2.0
To setup the environment quickly, I do not run the Wireguard steps, since the ReadMe mentioned that the steps about Wireguard could be skipped if I only run it in local environement. I run the steps in rancher with no error. And then I run into mosip part. I create 6 vms, one for mosip nginx, 5 for mosip nodes. After running rke config, now I am stuck at running rke up in mosip folder. It always failed with meesage
msg=“Failed to upgrade hosts: 192.168.2.116 with error [host 192.168.2.116 not ready]”
Could you help me on how to deal with this error?
Following is the whole message for rke up in mosip folder
time=“2023-10-23T11:31:31+08:00” level=info msg=“Running RKE version: v1.3.24”
time=“2023-10-23T11:31:31+08:00” level=info msg=“Initiating Kubernetes cluster”
time=“2023-10-23T11:31:31+08:00” level=info msg=“[certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates”
time=“2023-10-23T11:31:31+08:00” level=info msg=“[certificates] Generating admin certificates and kubeconfig”
time=“2023-10-23T11:31:31+08:00” level=info msg=“Successfully Deployed state file at [./cluster.rkestate]”
time=“2023-10-23T11:31:31+08:00” level=info msg=“Building Kubernetes cluster”
time=“2023-10-23T11:31:31+08:00” level=info msg=“[dialer] Setup tunnel for host [192.168.2.180]”
time=“2023-10-23T11:31:31+08:00” level=info msg=“[dialer] Setup tunnel for host [192.168.2.107]”
time=“2023-10-23T11:31:31+08:00” level=info msg=“[dialer] Setup tunnel for host [192.168.2.172]”
time=“2023-10-23T11:31:31+08:00” level=info msg=“[dialer] Setup tunnel for host [192.168.2.206]”
time=“2023-10-23T11:31:31+08:00” level=info msg=“[dialer] Setup tunnel for host [192.168.2.116]”
time=“2023-10-23T11:31:32+08:00” level=info msg=“[network] No hosts added existing cluster, skipping port check”
time=“2023-10-23T11:31:32+08:00” level=info msg=“[certificates] Deploying kubernetes certificates to Cluster nodes”
time=“2023-10-23T11:31:32+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.180], try #1”
time=“2023-10-23T11:31:32+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.107], try #1”
time=“2023-10-23T11:31:32+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:32+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.206], try #1”
time=“2023-10-23T11:31:32+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.172], try #1”
time=“2023-10-23T11:31:32+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.180]”
time=“2023-10-23T11:31:32+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.206]”
time=“2023-10-23T11:31:32+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”
time=“2023-10-23T11:31:32+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.107]”
time=“2023-10-23T11:31:32+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.172]”
time=“2023-10-23T11:31:32+08:00” level=info msg=“Starting container [cert-deployer] on host [192.168.2.180], try #1”
time=“2023-10-23T11:31:32+08:00” level=info msg=“Starting container [cert-deployer] on host [192.168.2.107], try #1”
time=“2023-10-23T11:31:33+08:00” level=info msg=“Starting container [cert-deployer] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:33+08:00” level=info msg=“Starting container [cert-deployer] on host [192.168.2.206], try #1”
time=“2023-10-23T11:31:33+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.180], try #1”
time=“2023-10-23T11:31:33+08:00” level=info msg=“Starting container [cert-deployer] on host [192.168.2.172], try #1”
time=“2023-10-23T11:31:33+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.107], try #1”
time=“2023-10-23T11:31:33+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:33+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.206], try #1”
time=“2023-10-23T11:31:33+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.172], try #1”
time=“2023-10-23T11:31:38+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.180], try #1”
time=“2023-10-23T11:31:38+08:00” level=info msg=“Removing container [cert-deployer] on host [192.168.2.180], try #1”
time=“2023-10-23T11:31:38+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.107], try #1”
time=“2023-10-23T11:31:38+08:00” level=info msg=“Removing container [cert-deployer] on host [192.168.2.107], try #1”
time=“2023-10-23T11:31:38+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:38+08:00” level=info msg=“Removing container [cert-deployer] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:38+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.206], try #1”
time=“2023-10-23T11:31:38+08:00” level=info msg=“Removing container [cert-deployer] on host [192.168.2.206], try #1”
time=“2023-10-23T11:31:38+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.172], try #1”
time=“2023-10-23T11:31:38+08:00” level=info msg=“Removing container [cert-deployer] on host [192.168.2.172], try #1”
time=“2023-10-23T11:31:38+08:00” level=info msg=“[reconcile] Rebuilding and updating local kube config”
time=“2023-10-23T11:31:38+08:00” level=info msg=“Successfully Deployed local admin kubeconfig at [./kube_config_cluster.yml]”
time=“2023-10-23T11:31:38+08:00” level=info msg=“[reconcile] host [192.168.2.116] is a control plane node with reachable Kubernetes API endpoint in the cluster”
time=“2023-10-23T11:31:38+08:00” level=info msg=“[certificates] Successfully deployed kubernetes certificates to Cluster nodes”
time=“2023-10-23T11:31:38+08:00” level=info msg=“[file-deploy] Deploying file [/etc/kubernetes/audit-policy.yaml] to node [192.168.2.116]”
time=“2023-10-23T11:31:38+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”
time=“2023-10-23T11:31:39+08:00” level=info msg=“Starting container [file-deployer] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:39+08:00” level=info msg=“Successfully started [file-deployer] container on host [192.168.2.116]”
time=“2023-10-23T11:31:39+08:00” level=info msg=“Waiting for [file-deployer] container to exit on host [192.168.2.116]”
time=“2023-10-23T11:31:39+08:00” level=info msg=“Waiting for [file-deployer] container to exit on host [192.168.2.116]”
time=“2023-10-23T11:31:39+08:00” level=info msg=“Container [file-deployer] is still running on host [192.168.2.116]: stderr: , stdout: ”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Removing container [file-deployer] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[remove/file-deployer] Successfully removed container on host [192.168.2.116]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[/etc/kubernetes/audit-policy.yaml] Successfully deployed audit policy file to Cluster control nodes”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[reconcile] Reconciling cluster state”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[reconcile] Check etcd hosts to be deleted”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[reconcile] Check etcd hosts to be added”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[reconcile] Rebuilding and updating local kube config”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Successfully Deployed local admin kubeconfig at [./kube_config_cluster.yml]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[reconcile] host [192.168.2.116] is a control plane node with reachable Kubernetes API endpoint in the cluster”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[reconcile] Reconciled cluster state successfully”
time=“2023-10-23T11:31:40+08:00” level=info msg=“max_unavailable_worker got rounded down to 0, resetting to 1”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Setting maxUnavailable for worker nodes to: 1”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Setting maxUnavailable for controlplane nodes to: 1”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Pre-pulling kubernetes images”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/hyperkube:v1.24.17-rancher1] exists on host [192.168.2.116]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/hyperkube:v1.24.17-rancher1] exists on host [192.168.2.180]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/hyperkube:v1.24.17-rancher1] exists on host [192.168.2.172]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/hyperkube:v1.24.17-rancher1] exists on host [192.168.2.107]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/hyperkube:v1.24.17-rancher1] exists on host [192.168.2.206]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Kubernetes images pulled successfully”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[etcd] Building up etcd plane…”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Starting container [etcd-fix-perm] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Successfully started [etcd-fix-perm] container on host [192.168.2.116]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Waiting for [etcd-fix-perm] container to exit on host [192.168.2.116]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Waiting for [etcd-fix-perm] container to exit on host [192.168.2.116]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Removing container [etcd-fix-perm] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[remove/etcd-fix-perm] Successfully removed container on host [192.168.2.116]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[etcd] Running rolling snapshot container [etcd-snapshot-once] on host [192.168.2.116]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Removing container [etcd-rolling-snapshots] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[remove/etcd-rolling-snapshots] Successfully removed container on host [192.168.2.116]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”
time=“2023-10-23T11:31:40+08:00” level=info msg=“Starting container [etcd-rolling-snapshots] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:40+08:00” level=info msg=“[etcd] Successfully started [etcd-rolling-snapshots] container on host [192.168.2.116]”
time=“2023-10-23T11:31:45+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”
time=“2023-10-23T11:31:46+08:00” level=info msg=“Starting container [rke-bundle-cert] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:46+08:00” level=info msg=“[certificates] Successfully started [rke-bundle-cert] container on host [192.168.2.116]”
time=“2023-10-23T11:31:46+08:00” level=info msg=“Waiting for [rke-bundle-cert] container to exit on host [192.168.2.116]”
time=“2023-10-23T11:31:46+08:00” level=info msg=“[certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [192.168.2.116]”
time=“2023-10-23T11:31:46+08:00” level=info msg=“Removing container [rke-bundle-cert] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:46+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”
time=“2023-10-23T11:31:46+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:46+08:00” level=info msg=“[etcd] Successfully started [rke-log-linker] container on host [192.168.2.116]”
time=“2023-10-23T11:31:46+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:46+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”
time=“2023-10-23T11:31:46+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”
time=“2023-10-23T11:31:46+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:47+08:00” level=info msg=“[etcd] Successfully started [rke-log-linker] container on host [192.168.2.116]”
time=“2023-10-23T11:31:47+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:31:47+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”
time=“2023-10-23T11:31:47+08:00” level=info msg=“[etcd] Successfully started etcd plane… Checking etcd cluster health”
time=“2023-10-23T11:31:47+08:00” level=info msg=“[etcd] etcd host [192.168.2.116] reported healthy=true”
time=“2023-10-23T11:31:47+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #1”
time=“2023-10-23T11:31:52+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #2”
time=“2023-10-23T11:31:57+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #3”
time=“2023-10-23T11:32:02+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #4”
time=“2023-10-23T11:32:07+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #5”
time=“2023-10-23T11:32:12+08:00” level=info msg=“Attempting upgrade of controlplane components on following hosts in NotReady status: 192.168.2.116”
time=“2023-10-23T11:32:12+08:00” level=info msg=“[controlplane] Building up Controller Plane…”
time=“2023-10-23T11:32:12+08:00” level=info msg=“Finding container [service-sidekick] on host [192.168.2.116], try #1”
time=“2023-10-23T11:32:12+08:00” level=info msg=“[sidekick] Sidekick container already created on host [192.168.2.116]”
time=“2023-10-23T11:32:12+08:00” level=info msg=“[healthcheck] Start Healthcheck on service [kube-apiserver] on host [192.168.2.116]”
time=“2023-10-23T11:32:12+08:00” level=info msg=“[healthcheck] service [kube-apiserver] on host [192.168.2.116] is healthy”
time=“2023-10-23T11:32:12+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”
time=“2023-10-23T11:32:12+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:32:12+08:00” level=info msg=“[controlplane] Successfully started [rke-log-linker] container on host [192.168.2.116]”
time=“2023-10-23T11:32:12+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:32:12+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”
time=“2023-10-23T11:32:12+08:00” level=info msg=“[healthcheck] Start Healthcheck on service [kube-controller-manager] on host [192.168.2.116]”
time=“2023-10-23T11:32:12+08:00” level=info msg=“[healthcheck] service [kube-controller-manager] on host [192.168.2.116] is healthy”
time=“2023-10-23T11:32:12+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”
time=“2023-10-23T11:32:13+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:32:13+08:00” level=info msg=“[controlplane] Successfully started [rke-log-linker] container on host [192.168.2.116]”
time=“2023-10-23T11:32:13+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:32:13+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”
time=“2023-10-23T11:32:13+08:00” level=info msg=“[healthcheck] Start Healthcheck on service [kube-scheduler] on host [192.168.2.116]”
time=“2023-10-23T11:32:13+08:00” level=info msg=“[healthcheck] service [kube-scheduler] on host [192.168.2.116] is healthy”
time=“2023-10-23T11:32:13+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”
time=“2023-10-23T11:32:13+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:32:13+08:00” level=info msg=“[controlplane] Successfully started [rke-log-linker] container on host [192.168.2.116]”
time=“2023-10-23T11:32:13+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:32:13+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”
time=“2023-10-23T11:32:13+08:00” level=info msg=“[controlplane] Successfully started Controller Plane…”
time=“2023-10-23T11:32:13+08:00” level=info msg=“[worker] Building up Worker Plane…”
time=“2023-10-23T11:32:13+08:00” level=info msg=“Finding container [service-sidekick] on host [192.168.2.116], try #1”
time=“2023-10-23T11:32:13+08:00” level=info msg=“[sidekick] Sidekick container already created on host [192.168.2.116]”
time=“2023-10-23T11:32:13+08:00” level=info msg=“[healthcheck] Start Healthcheck on service [kubelet] on host [192.168.2.116]”
time=“2023-10-23T11:32:13+08:00” level=info msg=“[healthcheck] service [kubelet] on host [192.168.2.116] is healthy”
time=“2023-10-23T11:32:13+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”
time=“2023-10-23T11:32:14+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:32:14+08:00” level=info msg=“[worker] Successfully started [rke-log-linker] container on host [192.168.2.116]”
time=“2023-10-23T11:32:14+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:32:14+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”
time=“2023-10-23T11:32:14+08:00” level=info msg=“[healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.2.116]”
time=“2023-10-23T11:32:14+08:00” level=info msg=“[healthcheck] service [kube-proxy] on host [192.168.2.116] is healthy”
time=“2023-10-23T11:32:14+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”
time=“2023-10-23T11:32:14+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:32:14+08:00” level=info msg=“[worker] Successfully started [rke-log-linker] container on host [192.168.2.116]”
time=“2023-10-23T11:32:14+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”
time=“2023-10-23T11:32:14+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”
time=“2023-10-23T11:32:14+08:00” level=info msg=“[worker] Successfully started Worker Plane…”
time=“2023-10-23T11:32:14+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #1”
time=“2023-10-23T11:32:19+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #2”
time=“2023-10-23T11:32:24+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #3”
time=“2023-10-23T11:32:29+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #4”
time=“2023-10-23T11:32:34+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #5”
time=“2023-10-23T11:32:39+08:00” level=error msg=“Host 192.168.2.116 failed to report Ready status with error: host 192.168.2.116 not ready”
time=“2023-10-23T11:32:39+08:00” level=info msg=“[controlplane] Processing controlplane hosts for upgrade 1 at a time”
time=“2023-10-23T11:32:39+08:00” level=info msg=“Processing controlplane host 192.168.2.116”
time=“2023-10-23T11:32:39+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #1”
time=“2023-10-23T11:32:44+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #2”
time=“2023-10-23T11:32:49+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #3”
time=“2023-10-23T11:32:54+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #4”
time=“2023-10-23T11:32:59+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #5”
time=“2023-10-23T11:33:04+08:00” level=error msg=“Failed to upgrade hosts: 192.168.2.116 with error [host 192.168.2.116 not ready]”
time=“2023-10-23T11:33:04+08:00” level=fatal msg=“[controlPlane] Failed to upgrade Control Plane: [[host 192.168.2.116 not ready]]”