SAS Viya Multi Availability Zone Deployment on Azure
This reference architecture provides an overview of how a Viya environment can be deployed to an Azure AKS cluster with multiple Availability Zones to minimize downtime in case an Availability Zone goes down.

Scenario
The reference architecture for multi Availability Zone deployments in Azure provides the recommended approach to deploy Viya in these environments.
This architecture provides enhanced recovery in case of the following disruptions:
- Single Pod failures: by running multiple instances of all services, the system is protected against pod failures.
- Single Node failures: by spreading multiple instances over multiple nodes, the system is protected against node failures.
- Availability Zone failures: by ensuring all required persisted data is available in a secondary Availability Zone, the Viya environment can be quickly restarted in case of an Availability Zone failure.
Note that protection against single pod and node failures can also be achieved in single Availability Zone setups.
This reference architecture can be combined with other reference architectures to provide additional resilience in the form of Backup / Restore and Disaster Recovery functionalities.
End-User experience
When an Availability Zone goes down, users will experience a service disruption during the time the SAS Viya environment is restarted in the secondary Availability Zone. After the service has resumed, users can resume working as normal. They should however be aware that:
- Compute sessions will have terminated and any work that was in-progress at the time of the disruption will have to be restarted.
- CAS data will have to be reloaded into memory before it can be used again.
Considerations for cross Availability Zone deployments
Although AKS provides the ability to run workloads across Availability Zones, this approach is not recommended. The main reasons for this are:
- Performance Cost: Although cross AZ latency is lower than cross region latency, the increase compared to same zone deployments can have a negative impact on the performance of high performance analytical platforms like SAS Viya.
- Infrastructure Cost: In order to maintain the same level of performance when compared to single AZ deployments, additional infrastructure needs to be deployed that can handle the application load even when an Availability Zone goes down.
Solution overview
Assumption
Networking infrastructure has been set up so that end users can reach the SAS Viya platform and the platform can reach its data sources, regardless of in which Availability Zone the application is running.
Components
The following key components make up the reference architecture:
-
AKS Node Pools Separate AKS Node Pools are deployed in at least two Availability Zones. All node pools are labeled and tainted according to the SAS documentation. If following the recommended workload placement strategy this means at least 10 node pools will be created:
- 2 default node pools
- 2 stateless node pools
- 2 stateful node pools
- 2 compute node pools
- 2 CAS node pools
Five of these node pools will be scaled down to zero nodes in normal operation. In case of an Availability Zone failure, these node pools can be scaled up to the required number of nodes.
-
Azure DB for PostgreSQL A zone-redundant Azure DB for PostgreSQL database is deployed. In case of an Availability Zone failure, database will automatically failover to a secondary Availability Zone allowing the SAS Viya platform to be restarted with minimal delay.
-
Azure NetApp Files Azure NetApp Files is deployed with cross-zone replication of volumes. SAS Viya requires both RWO block storage and RWX shared storage. Azure NetApp Files provides a resilient RWX shared storage platform. This again ensures the SAS Viya platform can be restarted with minimal delay in case of an Availability Zone failure.
-
Azure Disk Storage For RWO block storage, Azure Disks are used. When Azure Disks are provisioned with Zone-redundant storage (ZRS), Azure synchronously replicates your Azure managed disk across three Azure Availability Zones in the region you select. This again ensures the SAS Viya platform can be restarted with minimal delay in case of an Availability Zone failure.
-
Azure Container Registry Although not strictly required, removing the dependency on upstream container image repositories decreases the time in which you are able to restart your environment in a different Availability Zone. Using an Azure Container Registry removes this dependency. The ACR should not only mirror the SAS container registry, but also any other images required to run the supporting services in the ACR cluster such as the Ingress controller and CSI providers.
Additional Resources
Please also have a look at the related resources for this reference architecture: