Recovery scenario on environment with ETCD
This document describes the way how to recovery node in etcd environment.
How to recover existing node on environment with ETCD that has only two nodes
How to recover existing node on environment with ETCD that has three or more nodes
etcdctl client
etcdctl is a command line client for etcd. It can be used in scripts or for administrators to explore an etcd cluster. More info is on the page https://github.com/coreos/etcd/tree/master/etcdctl
Download https://github.com/coreos/etcd/releases/download/v2.2.0/etcd-v2.2.0-windows-amd64.zip (or x32 bit)
How to check health of etcd cluster
The example:
We have CML cluster with three nodes (CML1 - master, CML2 and CML3).
The master node CML1 has IP address 10.0.13.178, etcd member id is 9059d4ef0f24cdac.
The node CML2 has IP address 10.0.13.205, etcd member id is 826b699ddf5b294c.
The node CML3 has IP address 10.0.13.121, etcd member id is d85191183ac893fd.

Find out members of etcd cluster by running following command (during whole this procedure IP address must point to a functional ETCD member):
Run command:
etcdctl.exe --endpoint http://10.0.13.178:2379 cluster-health
Output:
member 826b699ddf5b294c is healthy: got healthy result from http://10.0.13.205:2379
member 9059d4ef0f24cdac is healthy: got healthy result from http://10.0.13.178:2379
member d85191183ac893fd is healthy: got healthy result from http://10.0.13.121:2379
cluster is healthy
How to recover existing node on environment with ETCD that has only two nodes
Use this recovery scenario if you have only two nodes.
If we have only two nodes and one node is crached (reinstalled), we cannot add new node into the cluster. We can only recreated cluster (all data will be lost
).
1 | Stop Terminal Server on both nodes. |
2 | Backup folder TS-XX.XX.XX.XX in "C:\SafeQ5\terminalserver\etcd\" on both nodes. |
3 | Delete folder TS-XX.XX.XX.XX in "C:\SafeQ5\terminalserver\etcd\" on both nodes. |
4 | Start Terminal Server on both nodes. |
5 | Verify that "Offline storage refreshed" can be seen in Terminal Server log at least in ten minutes since the start of Terminal Server. |
6 | Reinstall KM and Sharp devices. |
How to recover existing node on environment with ETCD that has three or more nodes
The recovery scenario works only for three or more nodes in cluster.
The chapter describes the way how to recover the existing node on the environment with ETCD.
The recovery procedure is described for three nodes:
The master node CML1 has IP address 10.0.13.178, etcd member id is 9059d4ef0f24cdac.
The node CML2 has IP address 10.0.13.205, etcd member id is 826b699ddf5b294c.
The node CML3 has IP address 10.0.13.121, etcd member id is d85191183ac893fd. This node will be reinstalled and recovered.
1 | Reinstall SafeQ on CML3. |
2 | Stop Terminal Server on reinstalled node. |
3 | Delete "C:\SafeQ5\terminalserver\etcd\<folder>" on reinstalled node (TS-XX.XX.XX.XX) |
4 | Check health of etcd cluster from command line: Run command: etcdctl.exe --endpoint http://10.0.13.178:2379 cluster-health Output: member 826b699ddf5b294c is healthy: got healthy result from http://10.0.13.205:2379 member 9059d4ef0f24cdac is healthy: got healthy result from http://10.0.13.178:2379 member d85191183ac893fd is unhealthy: got unhealthy result from http://10.0.13.121:2379 cluster is healthy |
5 | Remove the reinstalled node from the cluster: Run command: etcdctl.exe --endpoint http://10.0.13.178:2379 member remove d85191183ac893fd Output: Removed member d85191183ac893fd from cluster |
6 | Verify the proper node has been removed: Run command: etcdctl.exe --endpoint http://10.0.13.178:2379 cluster-health Output: member 826b699ddf5b294c is healthy: got healthy result from http://10.0.13.205:2379 |
7 | Add the reinstalled node to the cluster again. In our sample 10.0.13.121 is a reinstalled node: Run command: etcdctl.exe --endpoint http://10.0.13.178:2379 member add TS-10.0.13.121 http://10.0.13.121:2380 Output: Added member named TS-10.0.13.121 with ID bff50e06caebf570 to cluster |
8 | Verify the proper node has been added: Run command: etcdctl.exe --endpoint http://10.0.13.178:2379 cluster-health Output: member 826b699ddf5b294c is healthy: got healthy result from http://10.0.13.205:2379 |
9 | Run etcd manually from command line: etcd64.exe -name TS-10.0.13.121 -data-dir "C:\SafeQ5\terminalserver\etcd\TS-10.0.13.121" -initial-advertise-peer-urls http://10.0.13.121:2380 -listen-peer-urls http://10.0.13.121:2380 -listen-client-urls http://10.0.13.121:2379,http://127.0.0.1:2379 -advertise-client-urls http://10.0.13.121:2379 -initial-cluster-token safeq-cluster -initial-cluster TS-10.0.13.121=http://10.0.13.121:2380,TS-10.0.13.178=http://10.0.13.178:2380,TS-10.0.13.205=http://10.0.13.205:2380 -initial-cluster-state existing |
10 | Start Terminal Server. |
11 | Verify that "Offline storage refreshed" can be seen in Terminal Server log after start of Terminal Server (it might take up to 10 minutes before this record appears) |