Upgrade Guide for High Availability (HA) Deployment

A comprehensive guide for upgrading ServiceOps applications in High Availability (HA) environments to ensure zero downtime and maintain cluster health during version transitions.

High Availability (HA) upgrades require special consideration to maintain cluster stability and prevent automatic failover during the upgrade process. This guide covers the upgrade process for HA environments, ensuring both master and standby instances are upgraded successfully while maintaining data synchronization.

Key Benefits of HA Upgrades

Zero Downtime: Maintains service availability during upgrades
Cluster Stability: Prevents automatic failover during upgrade process
Data Synchronization: Ensures database and file synchronization across nodes
Rollback Capability: Provides recovery options if upgrade fails
Health Monitoring: Comprehensive cluster health verification

Supported Operating Systems

Ubuntu: 22, 24
RedHat: 9.2, 9.4

Version Compatibility

From version v8.2, incremental upgrades are not required. You can upgrade directly from v8.2 to the latest available release version. Starting from v8.6.0, you can upgrade ServiceOps using a single common installer across supported operating systems.

Pre-Upgrade Requirements

Before initiating the HA upgrade, complete the following checks and actions:

HA Cluster Status: Verify both master and standby nodes are healthy and synchronized.
Network Connectivity: Ensure a stable network connection exists between HA nodes.
Disk Space: Confirm there is sufficient space for upgrade packages and backups on both nodes.
ServiceOps Version: Ensure the current version is compatible with the HA environment.
Stop HA Observer Service: Before upgrading, it is critical to stop the HA Observer service to prevent automatic failover.

Critical Timing

Before stopping the Observer service, monitor the logs to ensure no swapping or synchronization process is in progress. Stopping the Observer service abruptly may lead to instability in the HA cluster.

1. Monitor HA Observer Logs

View the latest log activity to ensure no critical processes are running.

tail -f /opt/HA/logs/ha_<date>.log

2. Stop the Observer Service

Once you confirm that no processes are running, stop the service.

sudo systemctl status ha
sudo systemctl stop ha
sudo systemctl status ha

Application Backup: Backup the application and filedb folder on the master node.
Database Backup: Take a complete database backup from the master node.
VM Snapshot: Take a snapshot of both master and standby VMs for recovery.
Network Connectivity: Test connectivity between HA nodes.
Maintenance Window: Schedule the upgrade during low-usage periods.
Team Notification: Inform stakeholders about the planned upgrade.

Backup Requirements

Always perform complete backups before HA upgrades. Refer to the Backup Process for detailed backup procedures.

Application Upgrade Steps

Step 1: Download the Release Build

Download the latest Common Installer build from the Latest Download Links.

Step 2: Upgrade Master Application Instance

Prepare the Installer

Copy Installer: Copy the MotadataServiceOpsCommonUpgrade installer to the Master ServiceOps application instance.
Login with Root Privileges:
sudo su

Stop Services: Stop the main server and analytics server services:

sudo systemctl stop ft-main-server.service ft-analytics-server.service

Verify Service Status:

sudo systemctl status ft-main-server.service ft-analytics-server.service

Grant Execute Permissions:

chmod 777 MotadataServiceOpsCommonUpgrade_V860

Run the Installer

Execute Upgrade: Run the upgrade installer:
```
./MotadataServiceOpsCommonUpgrade_V860
```
Monitor Upgrade Process: The upgrade process will begin automatically.
Verify Completion: Once the upgrade process is completed successfully, the completion screen will appear.

Step 3: Upgrade Standby Application Instance

To upgrade the ServiceOps standby instance, repeat the same steps used for the master instance:

Prepare the Installer (same as master procedure)
Run the Installer (same as master procedure)

Service Management on Standby Node

After the upgrade on the standby node is complete, ensure all application services (e.g., ft-main-server, ft-analytics-server) remain in a stopped state. The standby node's services should not be running; they will be managed by the HA service once it's restarted.

Database Upgrade Steps (Optional)

Pre-Upgrade Steps

Stop HA Observer Service: Stop the HA Observer service using the below command:
```
sudo systemctl stop ha-observer
```

Step 1: Upgrade Master Node

Refer to the Standalone PostgreSQL Upgrade Guide and upgrade the master node first.

Step 2: Upgrade Slave Node

In HA, the Slave node is in read-only mode. To upgrade it, you must first promote it to a temporary master.

Promote Slave to Master:
- On the Slave node, navigate to the HA directory:
```
cd /opt/HA
```
- Run the master promotion script:
```
sh master.sh
```
  This will convert the slave node to a temporary master mode.
Upgrade the Node: Once the node is in master mode, follow the same Standalone PostgreSQL Upgrade Guide to upgrade it.

Step 3: Reconfigure HA

Once both database nodes are upgraded, you must re-establish the HA replication. Refer to the High Availability (HA) Deployment Setup guide and follow the steps to reconfigure the cluster.

Post-Upgrade Verification

Once the upgrade is completed on all nodes, follow these steps to finalize the process and verify the health of the HA cluster.

Step 1: Start Observer Service

After both Master and Standby servers and their databases are upgraded, start the Observer Service to re-enable the HA functionality.

sudo systemctl start ha
sudo systemctl status ha

Step 2: Check High Availability Cluster Health Status

Ensure the database and the filedb folder are properly synchronized.

Database Sync Check

Check if the Master and Standby databases are in sync by comparing request counts:

sudo -u postgres psql -d flotoitsmdb -tAc "SELECT COUNT(*) FROM apolo.request;"

Test Ticket Validation

Create a test ticket on the Master server and confirm that the same record appears on the Standby to validate synchronization.

FileDB Sync Check

Verify File-Sync Service: Ensure the file-sync.service is running on the Master and stopped on the Standby.
```
sudo systemctl status file-sync
```
Compare File Counts: Compare the number of files in the filedb directory on both Master and Standby:
```
find /opt/flotomate/main-server/filedb -type f | wc -l
```
The file counts should be identical.
Test File Synchronization: Create a test file in the Master's filedb folder and confirm it appears on the Standby node.

Step 3: Service Health Verification

Check All Services: Verify the status of all services on the master node:

sudo systemctl status ft-main-server.service ft-analytics-server.service elasticsearch.service nginx.service postgresql.service file-sync.service

Application Access: Login to the ServiceOps Portal and verify the application version from Admin > Organization > Account > License Details.
Sanity Checks: Test functionality and verify all features are working as expected.

Step 4: Performance Monitoring

System Resources: Monitor CPU, memory, and disk usage on both nodes.
Network Performance: Verify network connectivity between HA nodes.
Service Response Times: Monitor application response times.
Error Rates: Watch for any increased error rates in logs.

Troubleshooting

Common HA Upgrade Issues

Observer Service Issues

Symptoms: Observer service fails to start or stop properly

Resolution:

Check observer service status: sudo systemctl status ha
Review HA logs: tail -f /opt/HA/logs/ha_<date>.log
Verify network connectivity between nodes
Restart observer service if needed: sudo systemctl restart ha

Database Synchronization Problems

Symptoms: Database sync issues between master and standby

Resolution:

Check PostgreSQL service status on both nodes
Verify network connectivity between database nodes
Check database replication logs
Restart PostgreSQL services if needed

File Synchronization Issues

Symptoms: FileDB not synchronizing between nodes

Resolution:

Check file-sync service status: sudo systemctl status file-sync
Verify rsync connectivity between nodes
Check file permissions and ownership
Restart file-sync service if needed: sudo systemctl restart file-sync

Service Startup Failures

Symptoms: Services fail to start after upgrade

Resolution:

Check service status: sudo systemctl status <service-name>
Review service logs: sudo journalctl -u <service-name>
Verify dependencies are installed
Restart services in proper order

Recovery Procedures

Rollback to Previous Version

If upgrade fails and you need to rollback:

Stop Observer Service: Stop the HA observer service
Restore from VM Snapshot: If available, restore from VM snapshot
Restore from Backup: Restore ServiceOps from backup if needed
Reinstall Previous Version: Install the previous ServiceOps version
Restart Services: Restart all services in proper order
Start Observer Service: Start the HA observer service

Cluster Recovery

If HA cluster becomes unstable:

Stop All Services: Stop all ServiceOps services on both nodes
Restore from Backup: Restore ServiceOps from backup
Verify Dependencies: Ensure all required packages are installed
Restart Services: Restart services in proper order
Restart Observer: Start the HA observer service

Log Analysis

HA Observer Logs

Check HA observer logs for errors:

tail -f /opt/HA/logs/ha_<date>.log

Service Logs

Monitor ServiceOps service logs:

sudo tail -f /opt/flotomate/main-server/logs/common/error

File Sync Logs

Check file synchronization logs:

sudo journalctl -u file-sync.service

Performance Optimization

Post-Upgrade Optimization

After successful HA upgrade, consider these optimizations:

Service Optimization: Optimize service configurations for HA environment
Network Optimization: Ensure optimal network connectivity between nodes
Database Tuning: Optimize PostgreSQL configuration for HA
File Sync Optimization: Configure optimal file synchronization settings

Monitoring and Maintenance

Regular Health Checks: Schedule regular HA cluster health checks
Performance Monitoring: Monitor system performance on both nodes
Backup Scheduling: Ensure regular backups are maintained
Log Monitoring: Monitor logs for any issues or warnings

Security Considerations

HA Security

Network Security: Ensure secure communication between HA nodes
Access Control: Verify proper access controls on both nodes
Audit Logging: Enable and monitor audit logs
Firewall Configuration: Configure firewalls appropriately for HA

Post-Upgrade Security

Service Permissions: Verify service permissions are correct
Database Security: Check database security settings
File Permissions: Verify file permissions and ownership
Network Security: Test network security between nodes

Backward Compatibility - Understand compatibility aspects with previous versions of ServiceOps.
PostgreSQL Upgrade Guide - Detailed instructions for upgrading your PostgreSQL database.
Ubuntu Version Upgrade Guide - Guide for upgrading the underlying Ubuntu OS.
High Availability Deployment Guide - Comprehensive guide for setting up a new HA environment.
DC-DR Upgrade Guide - Step-by-step instructions for upgrading Data Center-Disaster Recovery setups.

Pre-Upgrade Requirements​

Application Upgrade Steps​

Database Upgrade Steps (Optional)​

Pre-Upgrade Steps​

Step 1: Upgrade Master Node​

Step 2: Upgrade Slave Node​

Step 3: Reconfigure HA​

Post-Upgrade Verification​

Troubleshooting​

Performance Optimization​

Security Considerations​

Related Topics​

Pre-Upgrade Requirements

Application Upgrade Steps

Database Upgrade Steps (Optional)

Pre-Upgrade Steps

Step 1: Upgrade Master Node

Step 2: Upgrade Slave Node

Step 3: Reconfigure HA

Post-Upgrade Verification

Troubleshooting

Performance Optimization

Security Considerations

Related Topics