Recovery scenario on environment with ETCD

This document describes the way how to recovery node in etcd environment.

etcdctl client

etcdctl is a command line client for etcd. It can be used in scripts or for administrators to explore an etcd cluster. More info is on the page https://github.com/coreos/etcd/tree/master/etcdctl

Download https://github.com/coreos/etcd/releases/download/v2.2.0/etcd-v2.2.0-windows-amd64.zip (or x32 bit)

 

How to check health of etcd cluster

The example:

We have CML cluster with three nodes (CML1 - master, CML2 and CML3).

The master node CML1 has IP address 10.0.13.178, etcd member id is 9059d4ef0f24cdac.

The node CML2 has IP address 10.0.13.205, etcd member id is 826b699ddf5b294c.

The node CML3 has IP address 10.0.13.121, etcd member id is d85191183ac893fd.

images/download/attachments/70327989/Cluster_OK.png

Find out members of etcd cluster by running following command (during whole this procedure IP address must point to a functional ETCD member):

Run command:

etcdctl.exe --endpoint http://10.0.13.178:2379 cluster-health

Output:

member 826b699ddf5b294c is healthy: got healthy result from http://10.0.13.205:2379
member 9059d4ef0f24cdac is healthy: got healthy result from http://10.0.13.178:2379
member d85191183ac893fd is healthy: got healthy result from http://10.0.13.121:2379
cluster is healthy

 

How to recover existing node on environment with ETCD that has only two nodes

images/s/-3eliqb/8502/404359a7d2ab19c9c7c58d12013124a386b28257/_/images/icons/emoticons/warning.svg Use this recovery scenario if you have only two nodes.

If we have only two nodes and one node is crached (reinstalled), we cannot add new node into the cluster. We can only recreated cluster (all data will be lost images/s/-3eliqb/8502/404359a7d2ab19c9c7c58d12013124a386b28257/_/images/icons/emoticons/warning.svg ).

 

1

Stop Terminal Server on both nodes.

2

Backup folder TS-XX.XX.XX.XX in "C:\SafeQ5\terminalserver\etcd\" on both nodes.

3

Delete folder TS-XX.XX.XX.XX in "C:\SafeQ5\terminalserver\etcd\" on both nodes.

4

Start Terminal Server on both nodes.

5

Verify that "Offline storage refreshed" can be seen in Terminal Server log at least in ten minutes since the start of Terminal Server.

6

Reinstall KM and Sharp devices.

 


How to recover existing node on environment with ETCD that has three or more nodes

images/s/-3eliqb/8502/404359a7d2ab19c9c7c58d12013124a386b28257/_/images/icons/emoticons/warning.svg The recovery scenario works only for three or more nodes in cluster.

The chapter describes the way how to recover the existing node on the environment with ETCD.

The recovery procedure is described for three nodes:

The master node CML1 has IP address 10.0.13.178, etcd member id is 9059d4ef0f24cdac.

The node CML2 has IP address 10.0.13.205, etcd member id is 826b699ddf5b294c.

The node CML3 has IP address 10.0.13.121, etcd member id is d85191183ac893fd. This node will be reinstalled and recovered.

 

1

Reinstall SafeQ on CML3.

2

Stop Terminal Server on reinstalled node.

3

Delete "C:\SafeQ5\terminalserver\etcd\<folder>" on reinstalled node (TS-XX.XX.XX.XX)

4

Check health of etcd cluster from command line:

Run command:

etcdctl.exe --endpoint http://10.0.13.178:2379 cluster-health

Output:

member 826b699ddf5b294c is healthy: got healthy result from http://10.0.13.205:2379

member 9059d4ef0f24cdac is healthy: got healthy result from http://10.0.13.178:2379

member d85191183ac893fd is unhealthy: got unhealthy result from http://10.0.13.121:2379

cluster is healthy

 5

 Remove the reinstalled node from the cluster:

Run command:

etcdctl.exe --endpoint http://10.0.13.178:2379 member remove d85191183ac893fd

Output:

Removed member d85191183ac893fd from cluster

 6

 Verify the proper node has been removed:

Run command:

etcdctl.exe --endpoint http://10.0.13.178:2379 cluster-health

Output:

member 826b699ddf5b294c is healthy: got healthy result from http://10.0.13.205:2379
member 9059d4ef0f24cdac is healthy: got healthy result from http://10.0.13.178:2379
cluster is healthy

 7

 Add the reinstalled node to the cluster again. In our sample 10.0.13.121 is a reinstalled node:

Run command:

etcdctl.exe --endpoint http://10.0.13.178:2379 member add TS-10.0.13.121 http://10.0.13.121:2380

Output:

Added member named TS-10.0.13.121 with ID bff50e06caebf570 to cluster

ETCD_NAME="TS-10.0.13.121"
ETCD_INITIAL_CLUSTER="TS-10.0.13.205=http://10.0.13.205:2380,TS-10.0.13.178=http://10.0.13.178:2380,TS-10.0.13.121=http://10.0.13.121:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

 8

 Verify the proper node has been added:

Run command:

etcdctl.exe --endpoint http://10.0.13.178:2379 cluster-health

Output:

member 826b699ddf5b294c is healthy: got healthy result from http://10.0.13.205:2379
member 9059d4ef0f24cdac is healthy: got healthy result from http://10.0.13.178:2379
member bff50e06caebf570 is unreachable: no available published client urls
cluster is healthy

 9

 Run etcd manually from command line:

etcd64.exe -name TS-10.0.13.121 -data-dir "C:\SafeQ5\terminalserver\etcd\TS-10.0.13.121" -initial-advertise-peer-urls http://10.0.13.121:2380 -listen-peer-urls http://10.0.13.121:2380  -listen-client-urls http://10.0.13.121:2379,http://127.0.0.1:2379 -advertise-client-urls http://10.0.13.121:2379 -initial-cluster-token safeq-cluster -initial-cluster TS-10.0.13.121=http://10.0.13.121:2380,TS-10.0.13.178=http://10.0.13.178:2380,TS-10.0.13.205=http://10.0.13.205:2380 -initial-cluster-state existing

10

Start Terminal Server.

11

Verify that "Offline storage refreshed" can be seen in Terminal Server log after start of Terminal Server (it might take up to 10 minutes before this record appears)