Failed to install MOSIP

Hi, MOSIP team. I am about to install MOSIP on my local machine. I am following the instructions here On-Prem without DNS Installation Guidelines - MOSIP Docs 1.2.0

To setup the environment quickly, I do not run the Wireguard steps, since the ReadMe mentioned that the steps about Wireguard could be skipped if I only run it in local environement. I run the steps in rancher with no error. And then I run into mosip part. I create 6 vms, one for mosip nginx, 5 for mosip nodes. After running rke config, now I am stuck at running rke up in mosip folder. It always failed with meesage
msg=“Failed to upgrade hosts: 192.168.2.116 with error [host 192.168.2.116 not ready]”

Could you help me on how to deal with this error?

Following is the whole message for rke up in mosip folder

time=“2023-10-23T11:31:31+08:00” level=info msg=“Running RKE version: v1.3.24”

time=“2023-10-23T11:31:31+08:00” level=info msg=“Initiating Kubernetes cluster”

time=“2023-10-23T11:31:31+08:00” level=info msg=“[certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates”

time=“2023-10-23T11:31:31+08:00” level=info msg=“[certificates] Generating admin certificates and kubeconfig”

time=“2023-10-23T11:31:31+08:00” level=info msg=“Successfully Deployed state file at [./cluster.rkestate]”

time=“2023-10-23T11:31:31+08:00” level=info msg=“Building Kubernetes cluster”

time=“2023-10-23T11:31:31+08:00” level=info msg=“[dialer] Setup tunnel for host [192.168.2.180]”

time=“2023-10-23T11:31:31+08:00” level=info msg=“[dialer] Setup tunnel for host [192.168.2.107]”

time=“2023-10-23T11:31:31+08:00” level=info msg=“[dialer] Setup tunnel for host [192.168.2.172]”

time=“2023-10-23T11:31:31+08:00” level=info msg=“[dialer] Setup tunnel for host [192.168.2.206]”

time=“2023-10-23T11:31:31+08:00” level=info msg=“[dialer] Setup tunnel for host [192.168.2.116]”

time=“2023-10-23T11:31:32+08:00” level=info msg=“[network] No hosts added existing cluster, skipping port check”

time=“2023-10-23T11:31:32+08:00” level=info msg=“[certificates] Deploying kubernetes certificates to Cluster nodes”

time=“2023-10-23T11:31:32+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.180], try #1”

time=“2023-10-23T11:31:32+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.107], try #1”

time=“2023-10-23T11:31:32+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:32+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.206], try #1”

time=“2023-10-23T11:31:32+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.172], try #1”

time=“2023-10-23T11:31:32+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.180]”

time=“2023-10-23T11:31:32+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.206]”

time=“2023-10-23T11:31:32+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”

time=“2023-10-23T11:31:32+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.107]”

time=“2023-10-23T11:31:32+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.172]”

time=“2023-10-23T11:31:32+08:00” level=info msg=“Starting container [cert-deployer] on host [192.168.2.180], try #1”

time=“2023-10-23T11:31:32+08:00” level=info msg=“Starting container [cert-deployer] on host [192.168.2.107], try #1”

time=“2023-10-23T11:31:33+08:00” level=info msg=“Starting container [cert-deployer] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:33+08:00” level=info msg=“Starting container [cert-deployer] on host [192.168.2.206], try #1”

time=“2023-10-23T11:31:33+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.180], try #1”

time=“2023-10-23T11:31:33+08:00” level=info msg=“Starting container [cert-deployer] on host [192.168.2.172], try #1”

time=“2023-10-23T11:31:33+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.107], try #1”

time=“2023-10-23T11:31:33+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:33+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.206], try #1”

time=“2023-10-23T11:31:33+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.172], try #1”

time=“2023-10-23T11:31:38+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.180], try #1”

time=“2023-10-23T11:31:38+08:00” level=info msg=“Removing container [cert-deployer] on host [192.168.2.180], try #1”

time=“2023-10-23T11:31:38+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.107], try #1”

time=“2023-10-23T11:31:38+08:00” level=info msg=“Removing container [cert-deployer] on host [192.168.2.107], try #1”

time=“2023-10-23T11:31:38+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:38+08:00” level=info msg=“Removing container [cert-deployer] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:38+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.206], try #1”

time=“2023-10-23T11:31:38+08:00” level=info msg=“Removing container [cert-deployer] on host [192.168.2.206], try #1”

time=“2023-10-23T11:31:38+08:00” level=info msg=“Finding container [cert-deployer] on host [192.168.2.172], try #1”

time=“2023-10-23T11:31:38+08:00” level=info msg=“Removing container [cert-deployer] on host [192.168.2.172], try #1”

time=“2023-10-23T11:31:38+08:00” level=info msg=“[reconcile] Rebuilding and updating local kube config”

time=“2023-10-23T11:31:38+08:00” level=info msg=“Successfully Deployed local admin kubeconfig at [./kube_config_cluster.yml]”

time=“2023-10-23T11:31:38+08:00” level=info msg=“[reconcile] host [192.168.2.116] is a control plane node with reachable Kubernetes API endpoint in the cluster”

time=“2023-10-23T11:31:38+08:00” level=info msg=“[certificates] Successfully deployed kubernetes certificates to Cluster nodes”

time=“2023-10-23T11:31:38+08:00” level=info msg=“[file-deploy] Deploying file [/etc/kubernetes/audit-policy.yaml] to node [192.168.2.116]”

time=“2023-10-23T11:31:38+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”

time=“2023-10-23T11:31:39+08:00” level=info msg=“Starting container [file-deployer] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:39+08:00” level=info msg=“Successfully started [file-deployer] container on host [192.168.2.116]”

time=“2023-10-23T11:31:39+08:00” level=info msg=“Waiting for [file-deployer] container to exit on host [192.168.2.116]”

time=“2023-10-23T11:31:39+08:00” level=info msg=“Waiting for [file-deployer] container to exit on host [192.168.2.116]”

time=“2023-10-23T11:31:39+08:00” level=info msg=“Container [file-deployer] is still running on host [192.168.2.116]: stderr: , stdout: ”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Removing container [file-deployer] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[remove/file-deployer] Successfully removed container on host [192.168.2.116]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[/etc/kubernetes/audit-policy.yaml] Successfully deployed audit policy file to Cluster control nodes”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[reconcile] Reconciling cluster state”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[reconcile] Check etcd hosts to be deleted”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[reconcile] Check etcd hosts to be added”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[reconcile] Rebuilding and updating local kube config”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Successfully Deployed local admin kubeconfig at [./kube_config_cluster.yml]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[reconcile] host [192.168.2.116] is a control plane node with reachable Kubernetes API endpoint in the cluster”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[reconcile] Reconciled cluster state successfully”

time=“2023-10-23T11:31:40+08:00” level=info msg=“max_unavailable_worker got rounded down to 0, resetting to 1”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Setting maxUnavailable for worker nodes to: 1”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Setting maxUnavailable for controlplane nodes to: 1”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Pre-pulling kubernetes images”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/hyperkube:v1.24.17-rancher1] exists on host [192.168.2.116]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/hyperkube:v1.24.17-rancher1] exists on host [192.168.2.180]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/hyperkube:v1.24.17-rancher1] exists on host [192.168.2.172]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/hyperkube:v1.24.17-rancher1] exists on host [192.168.2.107]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/hyperkube:v1.24.17-rancher1] exists on host [192.168.2.206]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Kubernetes images pulled successfully”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[etcd] Building up etcd plane…”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Starting container [etcd-fix-perm] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Successfully started [etcd-fix-perm] container on host [192.168.2.116]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Waiting for [etcd-fix-perm] container to exit on host [192.168.2.116]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Waiting for [etcd-fix-perm] container to exit on host [192.168.2.116]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Removing container [etcd-fix-perm] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[remove/etcd-fix-perm] Successfully removed container on host [192.168.2.116]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[etcd] Running rolling snapshot container [etcd-snapshot-once] on host [192.168.2.116]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Removing container [etcd-rolling-snapshots] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[remove/etcd-rolling-snapshots] Successfully removed container on host [192.168.2.116]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”

time=“2023-10-23T11:31:40+08:00” level=info msg=“Starting container [etcd-rolling-snapshots] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:40+08:00” level=info msg=“[etcd] Successfully started [etcd-rolling-snapshots] container on host [192.168.2.116]”

time=“2023-10-23T11:31:45+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”

time=“2023-10-23T11:31:46+08:00” level=info msg=“Starting container [rke-bundle-cert] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:46+08:00” level=info msg=“[certificates] Successfully started [rke-bundle-cert] container on host [192.168.2.116]”

time=“2023-10-23T11:31:46+08:00” level=info msg=“Waiting for [rke-bundle-cert] container to exit on host [192.168.2.116]”

time=“2023-10-23T11:31:46+08:00” level=info msg=“[certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [192.168.2.116]”

time=“2023-10-23T11:31:46+08:00” level=info msg=“Removing container [rke-bundle-cert] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:46+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”

time=“2023-10-23T11:31:46+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:46+08:00” level=info msg=“[etcd] Successfully started [rke-log-linker] container on host [192.168.2.116]”

time=“2023-10-23T11:31:46+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:46+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”

time=“2023-10-23T11:31:46+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”

time=“2023-10-23T11:31:46+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:47+08:00” level=info msg=“[etcd] Successfully started [rke-log-linker] container on host [192.168.2.116]”

time=“2023-10-23T11:31:47+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:31:47+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”

time=“2023-10-23T11:31:47+08:00” level=info msg=“[etcd] Successfully started etcd plane… Checking etcd cluster health”

time=“2023-10-23T11:31:47+08:00” level=info msg=“[etcd] etcd host [192.168.2.116] reported healthy=true”

time=“2023-10-23T11:31:47+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #1”

time=“2023-10-23T11:31:52+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #2”

time=“2023-10-23T11:31:57+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #3”

time=“2023-10-23T11:32:02+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #4”

time=“2023-10-23T11:32:07+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #5”

time=“2023-10-23T11:32:12+08:00” level=info msg=“Attempting upgrade of controlplane components on following hosts in NotReady status: 192.168.2.116”

time=“2023-10-23T11:32:12+08:00” level=info msg=“[controlplane] Building up Controller Plane…”

time=“2023-10-23T11:32:12+08:00” level=info msg=“Finding container [service-sidekick] on host [192.168.2.116], try #1”

time=“2023-10-23T11:32:12+08:00” level=info msg=“[sidekick] Sidekick container already created on host [192.168.2.116]”

time=“2023-10-23T11:32:12+08:00” level=info msg=“[healthcheck] Start Healthcheck on service [kube-apiserver] on host [192.168.2.116]”

time=“2023-10-23T11:32:12+08:00” level=info msg=“[healthcheck] service [kube-apiserver] on host [192.168.2.116] is healthy”

time=“2023-10-23T11:32:12+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”

time=“2023-10-23T11:32:12+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:32:12+08:00” level=info msg=“[controlplane] Successfully started [rke-log-linker] container on host [192.168.2.116]”

time=“2023-10-23T11:32:12+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:32:12+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”

time=“2023-10-23T11:32:12+08:00” level=info msg=“[healthcheck] Start Healthcheck on service [kube-controller-manager] on host [192.168.2.116]”

time=“2023-10-23T11:32:12+08:00” level=info msg=“[healthcheck] service [kube-controller-manager] on host [192.168.2.116] is healthy”

time=“2023-10-23T11:32:12+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”

time=“2023-10-23T11:32:13+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:32:13+08:00” level=info msg=“[controlplane] Successfully started [rke-log-linker] container on host [192.168.2.116]”

time=“2023-10-23T11:32:13+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:32:13+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”

time=“2023-10-23T11:32:13+08:00” level=info msg=“[healthcheck] Start Healthcheck on service [kube-scheduler] on host [192.168.2.116]”

time=“2023-10-23T11:32:13+08:00” level=info msg=“[healthcheck] service [kube-scheduler] on host [192.168.2.116] is healthy”

time=“2023-10-23T11:32:13+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”

time=“2023-10-23T11:32:13+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:32:13+08:00” level=info msg=“[controlplane] Successfully started [rke-log-linker] container on host [192.168.2.116]”

time=“2023-10-23T11:32:13+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:32:13+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”

time=“2023-10-23T11:32:13+08:00” level=info msg=“[controlplane] Successfully started Controller Plane…”

time=“2023-10-23T11:32:13+08:00” level=info msg=“[worker] Building up Worker Plane…”

time=“2023-10-23T11:32:13+08:00” level=info msg=“Finding container [service-sidekick] on host [192.168.2.116], try #1”

time=“2023-10-23T11:32:13+08:00” level=info msg=“[sidekick] Sidekick container already created on host [192.168.2.116]”

time=“2023-10-23T11:32:13+08:00” level=info msg=“[healthcheck] Start Healthcheck on service [kubelet] on host [192.168.2.116]”

time=“2023-10-23T11:32:13+08:00” level=info msg=“[healthcheck] service [kubelet] on host [192.168.2.116] is healthy”

time=“2023-10-23T11:32:13+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”

time=“2023-10-23T11:32:14+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:32:14+08:00” level=info msg=“[worker] Successfully started [rke-log-linker] container on host [192.168.2.116]”

time=“2023-10-23T11:32:14+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:32:14+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”

time=“2023-10-23T11:32:14+08:00” level=info msg=“[healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.2.116]”

time=“2023-10-23T11:32:14+08:00” level=info msg=“[healthcheck] service [kube-proxy] on host [192.168.2.116] is healthy”

time=“2023-10-23T11:32:14+08:00” level=info msg=“Image [rancher/rke-tools:v0.1.90] exists on host [192.168.2.116]”

time=“2023-10-23T11:32:14+08:00” level=info msg=“Starting container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:32:14+08:00” level=info msg=“[worker] Successfully started [rke-log-linker] container on host [192.168.2.116]”

time=“2023-10-23T11:32:14+08:00” level=info msg=“Removing container [rke-log-linker] on host [192.168.2.116], try #1”

time=“2023-10-23T11:32:14+08:00” level=info msg=“[remove/rke-log-linker] Successfully removed container on host [192.168.2.116]”

time=“2023-10-23T11:32:14+08:00” level=info msg=“[worker] Successfully started Worker Plane…”

time=“2023-10-23T11:32:14+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #1”

time=“2023-10-23T11:32:19+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #2”

time=“2023-10-23T11:32:24+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #3”

time=“2023-10-23T11:32:29+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #4”

time=“2023-10-23T11:32:34+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #5”

time=“2023-10-23T11:32:39+08:00” level=error msg=“Host 192.168.2.116 failed to report Ready status with error: host 192.168.2.116 not ready”

time=“2023-10-23T11:32:39+08:00” level=info msg=“[controlplane] Processing controlplane hosts for upgrade 1 at a time”

time=“2023-10-23T11:32:39+08:00” level=info msg=“Processing controlplane host 192.168.2.116”

time=“2023-10-23T11:32:39+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #1”

time=“2023-10-23T11:32:44+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #2”

time=“2023-10-23T11:32:49+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #3”

time=“2023-10-23T11:32:54+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #4”

time=“2023-10-23T11:32:59+08:00” level=info msg=“[controlplane] Now checking status of node 192.168.2.116, try #5”

time=“2023-10-23T11:33:04+08:00” level=error msg=“Failed to upgrade hosts: 192.168.2.116 with error [host 192.168.2.116 not ready]”

time=“2023-10-23T11:33:04+08:00” level=fatal msg=“[controlPlane] Failed to upgrade Control Plane: [[host 192.168.2.116 not ready]]”

Hi @andymatos

I will ask my team to see why this error is coming after running rke config, and you are getting stuck at running rke up in the MOSIP folder.

Best Regards,
Team MOSIP

Thanks for your response.

I will keep trying to bring it up.

Hello @andymatos . Can you tell us more about your development environment particularly on the hardware you are using, and whether this is on-prem or AWS ?

It is on-prem.
Following the without-DNS guide. I set up 9 VMs in VMWare Worksation.
2 VMs for Observation Cluster nodes, 1 VM for Observation Nginx server.
5 VMs for MOSIP Cluster nodes, 1 VM for MOSIP Nginx server.
All VMs have there static ip in my local network
Since this only used for local testing. I do not setup anything for WireGuard
And in the Observation K8s Cluster setup steps, the rke up command finished with “Finished building Kubernetes cluster successfully”
But in the MOSIP K8s Cluster setup steps, just finished with what I have posted.
I am new to k8s and MOSIP, hope I am not missing something or misunderstanding something.
Actually , I am confused by the install order of the MOSIP parts. From the documents and github ReadMe markdown, it seems the external components of MOSIP is installed after Rancher and MOSIP k8s cluster. But in some posts here in community, it seems the external components should be installed first (MOSIP cluster - NGINX VMs, DNS Requirements, Istio ...using RKE/Rancher - #3 by rcsampang)?

Okay. If your resources are adequate, enough vCPU, RAM, and storage per VM there shouldn’t be a problem. But one possible cause of error is the host 192.168.2.116 is running out of resources.

Would you mind showing your rke config.yml? This should be different from the config.yml you used to create your Rancher cluster.

In my experience, installing the external components before the MOSIP modules is the correct sequence as per https://github.com/mosip/mosip-infra/tree/master/deployment/v3

External components are needed to run the MOSIP cluster.

However, in my case I installed * Monitoring after installing the external components before installing the MOSIP modules.

This is also what I am thinking about, maybe the limit of resource make it not running. Since the Observation Cluster nodes only need 8GB for RAM. And for MOSIP Cluster nodes it needs 32GB RAM for each node.
But due to my limit hardware resources limit in my local environment, I only have 128GB RAM on my physical server. So I could not setup such 5 nodes as recommended in guide for MOSIP Cluster. So I am trying to setup 2 nodes for MOSIP Cluster now, and for both 2 nodes with enough 32GB RAM. However after setup 2 new VMs. Now I am stuck by the error
" Failed to fetch cluster certs from nodes, aborting upgrade: Certificate /etc/kubernetes/.tmp/kube-scheduler.pem is not found"
Do you have any idea on this? I need to call rke remove and start from head again?
Another question is that will the nodes number affect the final deployment?

Unfortunately, without enough resources you would encounter intermittent errors.

I would advise getting hold of enough resources before attempting to install again. You would just be frustrated with random undefined errors due to resources limitation.

You can try the MOSIP sandbox Collab, you can test a MOSIP module in your local environment with dependencies running on Collab - MOSIP provided platform.

https://collab.mosip.net/

Actually, I also want to try the collab environment, but it seems to access the environment, it needs to get the WireGuard config from MOSIP. I have fill up the table for request for the WireGuard information. But I still not get response from MOSIP yet. So I could not try with that collab environment now

Sorry, can’t help you there. Try to send a message to any of the MOSIP developers or to @sanchi-singh24 who seems to be a developer as well as moderator in this forum.

Best regards.

Hi @andymatos

For usage of collab env, the services and the modules in collab are open and don’t require a wireguard access until it’s keycloak and DB.

As I can see through your request the services mentioned by you don’t require wireguard in collab to run them.

These are open services in collab env - PMP ActiveMQ Registration client and e-signet device.

Let me know if you face issue in usage of any of the services in collab.

Best Regards,
Team MOSIP

Actually, our team is working on the software development on passing the MOSIP L0 device compliance test, so what we want is to start offical compliance test as soon as possible. From my current knowledge, the steps should be first setup sandbox in our local environment, and then do some software development work based on something like MDS or reg-client. Is this workflow right? or there is another way to do the complicance test without deploy whole MOSIP services on our local environment?
Since these days, our teams keeps working on deploy the MOSIP sandbox (version 1.2.0.1-B3), but we still could not bring all things up here.
We met serveal problems here, after we solve the problem mentioned in this thread. We also met the problem of pending import of MOSIP cluster into rancher (like what mentioned in this thread https://community.mosip.io/t/error-while-importing-cluster-to-rancher/597?u=andymatos), we tried the solutions mentioned, but no good. so we could not install monitor app on this cluster, and then we deicde to skip the monitor part and keep going on deploy the reset MOSIP services, however during the installation of MOSIP services, it ends with in install key manager with some error that Service Monitor not found. So we guess the Monitor part is necessary for the rest part?
Also we try the collab env, first we visit the PMP part, we could visit the link https://pmp.collab.mosip.net, but it finally end with connection timeout on visiting some urls like https://iam.collab.mosip.net/auth/realms/mosip/protocol/openid-connect/, we do not know where is the problem, we tried both with direct visit or with a speed up proxy, but both could visit the PMP part
So in sum, we want to start complience test asap, we will also keep try to bring MOSIP sandbox up on here, any help on these two things would be greatly appreciated.

Hi @andymatos

As I can see you want to run a compliance test and keep the MOSIP sandbox up, so our compliance tool Kit is available in Synergy env from where you can easily access it - https://synergy.mosip.net/

To know more on CTK @mayuradesh please look into this and guide @andymatos

Best Regards,
Team MOSIP

Thank you.
I have already register a new user successfully in Compliance Toolkit App (mosip.net)
Now I will try to do the complicace test in the Synergy environment by following the guide here How to add more test cases - Compliance Tool Kit (mosip.io).
Thanks for your help

1 Like

That’s great you have registered a new user and you started using CTK in synergy if in case any use comes up let us know.

@mayuradesh Will guide you for CTK services.

@andymatos - Greetings. For compliance tool kit (CTK) on Synergy environment of MOSIP, we will handhold you directly with onboarding, key exchange etc (as per our mail exchange). Rest of the topics continue using this thread.

Hello

For using hosted CTK application please refer to our user guide:

And other sections here Overview - Compliance Tool Kit

The guide to “Add More Testcases in CTK” is to be referred by developers who will be adding new testcases and hosting the CTK solution in their environment.

Hope this help
~ Mayura

Okay, thank you, to pass the test, now we’re developing our own mock service based on this sample( GitHub - mosip/mosip-mock-services). Once we finish integrate our own sdk into mock service and pass the test cases on the synergy environment, we could move to the offical environment, is this workflow right?

Hello Andy,

Below is the high level process to be followed :-

  1. Complete the SBI/SDK interface development at your end.

  2. Once the development is completed , please approach us so that we can onboard your partner details
    at MOSIP Synergy environment, where the interfaces for SBI/SDK could be validated invoking the Compliance Tool Kit

  3. Post the interfaces are successfully validated via the Compliance Tool Kit , we would update your solution as “MOSIP COMPLIANT” at MOSIP Market Place.

I hope you are clear with the process now, else email me if there are any further concerns.

Regards;
Suraj

Thank you. Suraj

Now we’re clear about the process. Once we finish our development, we will go back to you throught email.

1 Like