Advanced Health Check

Date and Time settings

Check that all your servers have the same time, date and timezone. It is recommended to use NTP protocol - make sure all servers have NTP enabled and are connected to same NTP server. 

Computer Performance

HDD space and fragmentation

  • Make sure that all partitions used by SafeQ have at least 10GB of free space.

  • Make sure that fragmentation is less than 80% on all partitions used by SafeQ.

HDD in Performance Monitor

Run Perfmon tool that is standard part of all Microsoft Windows.

Counters discussed in this topic can be found under PhysicalDisk object.

Perfmon tool can monitor multiple disks. Please make sure you select only disks used by SafeQ.

Avg. Disk Queue Length metric represents the average number of physical read and write requests that were queued on the selected physical disk during the sampling period. If your I/O system is overloaded, more read/write operations will be waiting. If your disk queue length frequently exceeds a value of 1 during peak usage, then you might have an I/O bottleneck.

Avg. Disk Sec/Transfer metric is the time, in seconds, of the average disk transfer (disk latency). Please compare the output number with below numbers: 

  • < 10 ms - very good performance

  • 10 - 20 ms - good performance, but might indicate minor performance problems

  • 20 - 50 ms - slow performance, needs attention

  • > 50 ms - indicated serious problems

CPU performance in Task Manager

Verify that CPU utilization is less than 90% average over 1 minute in peak hours

Memory performance in Task Manager

Using task manager verify that memory commit is not higher than 80% of total physical memory. Otherwise system starts swapping memory and performance might be degraded.

Network performance in Performance Monitor

Counter discussed in this topic can be found in Perfmon tool under Network Interface object.

There might be multiple network cards. Make sure you inspect network interface that is used by SafeQ server.

Verify that Network Queue Length is 0. If the measured value is higher than 0, then network card might be a bottleneck. Please consult your network administrator.

Database performance

Connect to the database (PostgreSQL Server or Microsoft SQL Server) and verify following queries on all the nodes in the cluster:

Query

Result

SELECT count(*)  FROM cluster_sync_update_10  

query returns result in less than 200ms

SELECT count(*)  FROM cluster_sync_update_10 WHERE server_flag<>0

result is less than 1000 and query finished in less then 200ms

SELECT count(*)  from cluster_sync_update_20 WHERE server_flag<>0

result is less than 50 000 and query finished in less than 1000ms

SELECT count(*)  from cluster_sync_update_30 WHERE server_flag<>0

result is less than 100 000 and query finished in less than 2000ms

SELECT * FROM pg_stat_activity WHERE xact_start is not null  

query returns less than 20 rows (PostgreSQL Only)

 

Virtual Machines

If the YSoft SafeQ server runs as VMware virtual machine, please check also following metrics in vSphere client:

  • There is no CPU or Memory limit set

    • vSphere Client -> tab Resource Allocation -> CPU shows under Resource Settings Limit: unlimited

    • vSphere Client -> tab Resource Allocation -> Memory shows under the Guest Memory\Resource Settings Limit: unlimited

  • Memory Swap in and Balloon  metrics are zero

    • vSphere Client -> tab Performance -> Advanced -> Chart options -> Memory -> Real-time -> Counters: Balloon

    • vSphere Client -> tab Performance -> Advanced -> Chart options -> Memory -> Real-time -> Counters: Swap in

  • CPU ready time for CPU real-time graph is less than 200ms

    • vSphere Client -> tab Performance -> Advanced -> Chart options -> CPU -> Real-time -> Counters: Ready

If you do not have access to vSphere client, you can gather few useful information about host from with-in Windows guest using VMwareToolboxCmd tool:

  • "C:\Program Files\VMware\VMware Tools\VMwareToolboxCmd.exe" help stat
    images/download/attachments/21955737/WNWareToolboxCmd.png