High Availability Options for Contact Expert
Contact Expert v6.1 for Skype for Business Server
Introduction
Contact Expert inherently supports some level of high availability in the form of CE domains. CE Domains refer to the physical architecture of the deployed solution whereby a single domain represents an isolated operational environment using a dedicated application server. In other words one CE Domain means one CE Core Host server computer. If a domain – the relevant CE Core Host – goes down then only the resources (agents, campaigns, etc.) assigned to that specific domain (CE Core Host) are affected. Agents and campaigns in other domains remain unaffected. When looking at this from a survivability and administration perspective, a number of problems are apparent:
- Agents and campaigns must explicitly be assigned to domains
- Agents cannot work and campaigns cannot run until the application servers hosting their domain are up and running
- Therefore when a CE application server fails, a labour-intensive administration needs to be performed to reconfigure all resources to another server
- Even if there is an additional application server, it might not have been scaled to withstand the increased load two domains worth of resources might mean
While the domains approach provides some level of load balancing capability, it is not considered a highly available or fault tolerant solution in itself.
For further information on how CE operates with OnCall IVR in case of failover please read the Behavior of Contact Expert and OnCall IVR in High-Availability cases article.
Automatic Failover
Contact Expert supports the concepts of Primary and Secondary CE Core Host servers in a domain. In terms of hardware parameters the secondary server should be identical to the primary as both host the same software components including CE licenses and should be able to provide the full function set to the whole range of users.
In a nominal situation however the secondary server runs in passive mode and the primary provides all functions. Only one of the servers are active at any one time, though the CE services are up and running on both computers.
The Switchover Process
Normally, when the primary server is the active one, the two servers exchange heartbeat information and both of them register the result of this to the CE system database periodically. The system database plays the witness role in this context. See the diagram for an overview.
The switchover process is initiated by the witness party – the database.
Automatic switchover is initiated only if the below are all true
- The previously active server did not register status to the database for a while...
- ...AND the previously passive server registers status to the database...
- ...AND the previously passive server indicates that sending heartbeats to the previously active server fails.
When a failover occurs, the secondary server takes up the role of the primary server. During a failback it is the other way around. The server that became active at the end of such a switchover process starts running the queues and campaigns assigned to the domain and starts accepting connections from agents.
Automatic Failback
After a successful failover process, all CE services are provided by the secondary server – it is the active one. The criteria triggering the failback and the process itself is exactly same as with the failover – the only difference between the two is the direction of the switchover.
Mean Recovery Time Objective
The mean recovery time objective (RTO) is the following:
- automatic failover or failback (aka. primary server is lost or recovered): 3 minutes decision threshold + 3 minutes switchover process = 6 minutes
- manual failover or failback: 3 minutes switchover process
Automatic Failover and Failback Decision Threshold
This is the time CE will wait from the moment it observes the currently active server is inoperational until it actually initiates the switchover to the other server.
Read the Application Servers article for more information on configuring this setting.
Failover and Failback Process Time
This is the time CE needs to perform the switchover of control from one server to the other, either in a failover scenario (primary to secondary) or failback (secondary to primary).
This time period can not be changed and it assumes a system environment at or above the minimum requirements (IT equipment performance, network, etc).
The switchover process time is not a guaranteed maximum value as – on top of the IT environment performance – it also depends on the number of business resources configured in CE.
Important Notice
Highlights of important notices
- HA works in a single datacenter only.
- No HA for multiple datacenters (aka. there is no disaster recovery).
- Maximum component startup/shutdown timeframe is 5 seconds by default. Make sure the physical resources (CPU, IO) dedicated to the primary and secondary application servers guarantee this startup/shutdown requirement. On slower machines you need to increase the maximum component startup/shutdown timeframe to a higher value (e.g. to 10 seconds) in ServerAgent.Config.xml.
- Make sure the CE Storages – specified on the portal – are *always available*. Unavailable storages might result in CE startup failure.
- Make sure the callback list cache file (CallbackList.xml) is stored in a network share which is *always available*. Otherwise, callback requests might not be performed at all.
- Make sure the custom presence state file (ace_presence.xml) is stored on a web server which is *always available*. Otherwise, agents might not be able to sign in.
- In a virtualized environment, never host the primary and secondary application servers on the same physical host. Otherwise, HA functionality might become completely useless.
- CE databases should be *always available*. Otherwise, CE might not work at all.
- Never restart the “CE Server Agent” Windows service manually as it – by design – stops and starts components one by one that results in service disruption. Use the CE Server Manager tool for this.
- Make sure the startup mode of “CE Server Agent” is set to “Automatic (Delayed)”.
“Always available” means that the referred objects are
- stored neither on the primary nor secondary application server (so a HW failure would not render them unaccessible)
- highly available on their own (do not pose a single point of failure themselves)
Supported Failover Scenarios
The failover mechanism provides high availability for the following scenarios:
- Active server is 'HW-down' meaning:
- Server is powered off
- Server suffers power outage or power supply failure
- Server suffers any HW problem rendering it out of order
- Local networking issue affecting only the active server:
- Network card(s) failure
- Router or other network component configuration issue
- Subnet networking issue
- CE services are completely stopped on the active server, meaning that even the components responsible for the heartbeat synchronization are off.
Scenarios not supported
This HA solution does not address vis maior situations such as global networking outages, issues with the database or the unified communications (UC) infrastructure. You can implement database server high availability (e.g. SQL AlwaysOn) and UC level high availability (e.g. SfB server pools) to defend against the latter two.
Switchover will also not occur when some of the server components are affected by internal (software bug) or external (network, prerequisite component issues or one of many other potential factors) issues, but the particular software components responsible for the HA connectivity and heartbeat protocol are NOT affected and continue to work OK. Such situations can result in an erroneous overall CE operation on the active server, but the automatic failover will still not occur.
The CE High Availability features are not replacing active human supervision by IT administration staff.
Manual Failover and Failback
On top of the automatic process described above, CE also supports initiating both the failover and failback features manually via the administration portal.
Microsoft Skype for Business Specific Details
In a Microsoft SfB environment, both the primary and the secondary server should be provisioned as trusted application server (New-CsTrustedApplicationComputer) hosting the same applications and the same application endpoints (CE campaigns and recorder). Additionally, the primary and the secondary servers should be organized into a trusted application pool (New-CsTrustedApplicationPool). Since only the active server registers application endpoints, SfB Front End server(s) will route incoming calls to the active server automatically.
Associated User Interfaces
Administrator Perspective
Using CE Portal administrators can have an overview on the connectivity of the CE application server, providing information on the active and passive server, Skype for Business front-end pool and status report.
Navigate to Infrastructure→**Application Servers.
Click Edit of the preferred domain.
Go to the bottom of the page and click MORE ACTIONS.
Choose Manage High Availability.
The active application server is indicated in a blue frame.
For further details on how to identify the state of the failover system, read the How to verify the Contact Expert high availability status article.
The same screen can be used by administrators to initiate a manual failover or failback procedure. The GUI provides real-time feedback indicating the expected remaining time to finish the manually initiated failover or failback procedure.
Agent Perspective
Identifying the Active Server
Among many other details, the ***Home→Diagnostics ***menu of the CE client software reveals which CE server the client is connected to. See the highlighted parts in the screenshot.
Automatic Recovery
When a failover/failback procedure is initiated – either manually or automatically – the agent application presents a recovery dialog to the agent to provide feedback on the situation and the system attempts to reconnect to an active CE server component. This might take a couple of minutes in case a CE server is available.
If the agent is in the middle of an administrative work at the time the dialog shows up then clicking on the Close button will make it disappear. It will however also stop any further attempts to automatically reconnect with the server components.
In case both CE servers in a HA pair fails – or a network glitch blocks access to both of them –, or the failover/failback procedure takes considerably more time than configured, then all automatic reconnection attempts will fail and the system will eventually give up trying.
In case the failover procedure succeeds or fails for any reason, the recovery dialog will let the agent know this. Any particular reason for a failed recovery attempt is also presented to the agent in a temporary "toast" message that disappears after a while.
SfB User Perspective
Any SfB user – even employees not associated with Contact Expert operations – can take a look at the campaign endpoint contact in their SfB client software to reveal which CE server the given campaign endpoint is currently registered from. See the highlighted part in the screenshot.
{.image-right}
Networking, SfB Server and CE Server Configuration by Example
This chapter describes how to configure networking, the Skype for Business Front-end servers and CE application servers for high availability by going through an example system setup. The following configuration is for a single CE domain assuming that CE application servers host both core and recording components.
We are using the following FQDN and IP addresses in our example setup:
FQDN | IPv4 address | |
---|---|---|
CE server pool | ce-pool.msvoip.dev | n/a |
Primary CE server | ce-dev.msvoip.dev | 10.168.3.101 |
Secondary CE server | ce-ins.msvoip.dev | 10.168.3.102 |
SfB Front-end pool | sfb-pool.msvoip.dev | n/a |
DNS Server Configuration
Create DNS A records in the forward lookup zone msvoip.dev for the CE application servers:
Name Type Mapped to ce-pool Host (A) 10.168.3.101 ce-pool Host (A) 10.168.3.102 Execute the "nslookup ce-pool.msvoip.dev" command at least 3 times in a command prompt window on both the SfB Front-end and CE servers and make sure the DNS server returns the IP addresses belonging to the CE servers and the order of these IP addresses changes each time nslookup command is issued
Computer Certificates on the CE Servers
Assign a computer certificate to each CE application servers with the following certificate attributes (Computer account > Personal folder)
Attribute name | Attribute value |
---|---|
Certificate Template Name | WebServer |
Friendly Name | CEPoolCert |
Subject (CN) | ce-pool.msvoip.dev |
Subject Alternate Name (DNS Name) | ce-pool.msvoip.dev ce-dev.msvoip.dev ce-ins.msvoip.dev |
Key Usage | Digital Signature Key Encipherment |
Enhanced Key Usage | Server Authentication |
SfB Server Configuration
Use "SfB Server Manager Shell" to run the following cmdlets (required permission is SfB Server CSAdministrator)
Data in these cmdlets are examples – do not use these in your environment!
Create an application pool using the CE pool FQDN and the 1st CE application server FQDN:
New-CsTrustedApplicationPool -Identity "ce-pool.msvoip.dev" -ComputerFqdn "ce-dev.msvoip.dev" -Registrar "sfb-pool.msvoip.dev" -Site 1
Add the 2nd CE application server to the pool:
New-CsTrustedApplicationComputer -Identity "ce-ins.msvoip.dev" -Pool "ce-pool.msvoip.dev"
Create a trusted application for CE core:
New-CsTrustedApplication -ApplicationId "ACE" -Port "9000" -TrustedApplicationPoolFqdn "ce-pool.msvoip.dev"
Create at least one campaign endpoint:
New-CsTrustedApplicationEndpoint -ApplicationId "ACE" -TrustedApplicationPoolFqdn "ce-pool.msvoip.dev" -SipAddress "sip:campaign_1@msvoip.dev" -DisplayName "CE: Campaign 1" -LineURI "tel:+XXXXXXXXX"
Create a trusted application for CE recording:
New-CsTrustedApplication -ApplicationId "ACE_Recorder" -Port "9100" -TrustedApplicationPoolFqdn "ce-pool.msvoip.dev"
Create at least one recorder endpoint:
New-CsTrustedApplicationEndpoint -ApplicationId "ACE_Recorder" -TrustedApplicationPoolFqdn "ce-pool.msvoip.dev" -SipAddress "sip:recorder_1@msvoip.dev" -DisplayName "CE: Recorder 1"
Publish the SfB topology:
Enable-CsTopology
CE Server Configuration
Data in these cmdlets are examples – do not use these in your environment!
Launch "CE PowerShell" on the Primary CE Core Host – in our example it is ce-dev.msvoip.dev – and execute the following cmdlet with local Windows Administrator role to configure the primary CE Core Host to work with the designated Trusted Application in Microsoft telephony:***
Set-CESfbConnectorProperties -ApplicationName "ACE" -ApplicationFqdn "ce-dev.msvoip.dev" -ApplicationPort "9000" -CertificateFriendlyName "CEPoolCert" -ApplicationGruu "[ACE trusted app's ComputerGruu for ce-dev.msvoip.dev]"
Execute the following cmdlet to configure the CE recording services – in our example residing on the same computer as the primary CE Core Host – to work with the designated Trusted Application in Microsoft Telephony:***
Set-CESfbRecorderProperties -ApplicationName "ACE_Recorder" -ApplicationFqdn "ce-dev.msvoip.dev" -ApplicationPort "9100" -CertificateFriendlyName "CEPoolCert" -ApplicationGruu "[ACE_Recorder trusted app's ComputerGruu for ce-dev.msvoip.dev]"
Execute the following cmdlet to establish the required firewall rules on the primary CE Core host:
Add-CEFirewallRules
Launch "CE PowerShell" on the Secondary CE Core Host – in our example it is ce-ins.msvoip.dev – and execute the following cmdlet with local Windows Administrator role to configure the secondary CE Core Host to work with the designated Trusted Application in Microsoft telephony:
Set-CESfbConnectorProperties -ApplicationName "ACE" -ApplicationFqdn "ce-ins.msvoip.dev" -ApplicationPort "9000" -CertificateFriendlyName "CEPoolCert" -ApplicationGruu "[ACE trusted app's ComputerGruu for ce-ins.msvoip.dev]"
Execute the following cmdlet to configure the CE recording services – in our example residing on the same computer as the secondary CE Core Host – to work with the designated Trusted Application in Microsoft Telephony:
Set-CESfbRecorderProperties -ApplicationName "ACE_Recorder" -ApplicationFqdn "ce-ins.msvoip.dev" -ApplicationPort "9100" -CertificateFriendlyName "CEPoolCert" -ApplicationGruu "[ACE_Recorder trusted app's ComputerGruu for ce-ins.msvoip.dev]"
Execute the following cmdlet to establish the required firewall rules on the secondary CE Core host:
Add-CEFirewallRules
Callback List Cache File in HA Environment
Callback requests are managed by the RuleServer which maintains a local cache file (CallbackList.xml) for them in order to handle them efficiently. By default this file is stored locally on the application server in the C:\Geomant\CE\Backup directory. In a HA environment this file should be stored on a network share in order to avoid loss of callback requests in case of a server failure.
The RuleServer on both the primary and secondary servers has to be configured to read/write the callback list cache file from/to this directory.
Create the folder on a file server (e.g.: "cefiles") where the callback list cache file is to be stored (e.g.: "filesrv").
Grant both the primary and secondary server computer accounts (e.g.: "cesrv1$", "cesrv2$") full permission on this folder. Also make sure these permissions are inherited by all child objects within this folder.
Share this folder on the network (e.g.: "\\filesrv\cefiles").
Please note that '\\filesrv\cefiles' is only an example! Provide the exact path name of the network share where you store the callback list cache file.
Stop Contact Expert services using the CE Server Manager tool.
Stop the CE Server Agent service in the Windows Services administration tool.
Configure the RuleServer on both the primary and secondary server to store the callback list cache file (CallbackList.xml) in the shared folder ({-backupdir \\filesrv\cefiles}).
Navigate to the ServerAgent.Config.xml config file located at C:\Geomant\CE\Services\ServerAgent\ by default.
Find the argument on the location of the backup directory among the RuleServer parameters:
<Argument name="backupdir" value="C:\Geomant\CE\Backup"></Argument>
Replace the default backup path to the location of the network share where the callback list cache file is stored.
Start CE Server Agent in Windows Services admin tool.
Start Contact Expert services using the CE Server Manager tool.
Notes on the ApplicationGruu parameter in a HA CE environment
The CE Powershell cmdlets in this document require ApplicationGruu parameters – the contents of which might be a bit counter-intuitive because the Microsoft objects either refer to a Service Gruu or a Computer Gruu, but not Application Gruus... So which one is needed here? Follow these steps to acquire the correct data:
Execute the following SfB Powershell cmdlet to list all the available trusted applications:
Get-CsTrustedApplication
Find the trusted application entry dealing with the CE pool you created earlier – in our example it is "ce-pool.msvoip.dev".
Find the Computer Gruu parameter of this trusted application and extract the part that deals with the actual CE Core Host you currently deploy
Example ApplicationGruu for a HA CE environment, copied from the Cumputer Gruu of the trusted application:
***sip:ce-dev.msvoip.dev@msvoip.dev;gruu;opaque=srvr:ce-pool:bJ-p03O411CzeiHbxYOOhQAA ***
The gray sections are implementation-specific details coming from FQDNs and hostnames (and are merely example values), the blue part is auto-generated, the black are fixed commands.
Reporting Data Collection During HA Failover
Contact Expert collects statistical data from all aspects of the ongoing operation of the contact centre in real-time. This is accomplished by server components installed on the application server of the first domain.
In a HA deployment where two CE servers are operating to form a single domain – assuming this is the first domain – then only the active server fo domain 1 collects data. In case of a failover situation the passive server takes over the data collection duties in an automated fashion.
The reporting data collection service is deployed to domain 1 only, it is not installed on any other domain. In case of a complex system where more than one domain is deployed and domain 1 is (or if it is a HA pair, then both its active and passive are) disconnected or inoperable, then reporting data collection stops for all other servers too. This can be remedied, but requires manual intervention.
Additional Requirements
SQL Server
The high availability solution described here assumes that both the CE system and reporting databases are hosted by a dedicated SQL Server, preferably using some of the ofdfered redundancy features such as a flavour of AlwaysOn High Availability Groups.
The SQL server must be running on a separate host (most probably running on an SQL cluster).
CE databases must not be hosted on a CE application server.
SfB Custom Presence States
SfB custom presence states required by the CE agent application are defined in the following XML file normally stored on the CE Core Host:
Custom Lync presence state file
ace_presence.xml
In order to force the Microsoft telephony client to download and use these customer presence states, the installer of the CE Agent application sets the Windows registry key CustomStateURL on each agent PC. The location of this registry key depends on the version of the SfB client:
SfB Custom Presence states must be downloaded from a separate host.
Microsoft Telephony Client
Windows Registry path
Key
Value
Lync 2013 Client & Skype for Business 2015 Client
HKLM \ SOFTWARE \ Policies \ Microsoft \ Office \15.0\ Lync
CustomStateURL
http://[CE Core Host FQDN]:8080/ClientAccessServer/ace_presence.xml
Skype for Business 2016 Client
HKLM \ SOFTWARE \ Policies \ Microsoft \ Office \16.0\ Lync
This means that the xml file is downloaded from one of the CE application servers by default. Obviously, this is not a good solution for high availability. Instead, the ace_presence.xml file should be stored on a separate web server and the CustomStateURL Windows registry key should be set accordingly. You do not need to change the Windows registry key on each agent PC one by one. CustomStateURL can be specified from SfB client policy as well. Use the Set-CsClientPolicy cmdlet in the Skype for Business Server Management Shell.
The XML file containing the SfB custom presence states must not be stored on a CE application server.
Network Shares Configured
CE is using a shared folder for playing back recorded audio files and also to store exported contact lists regardless of which servers of a HA pair are active. While these functions work just fine in a non-HA environment even if these shares are not configured, it is essential to set them up for HA:
- Add-CEContactManagementShare
- Add-CEMediaReplayShare
For more information on these and other CE PowerShell cmdlets please read the PowerShell Commands article.
Recorded Conversations Stored Outside of CE
Media files should be stored on external (network) drives. CE recording rules should be configured accordingly.
Recordings must not be stored on a CE application server.
The following article contains information on how to setup external file shares: CE Recording
Each URL must be set in Agent Policy
The script capability and customer history option uses web based data served by a web server on the CE Core hosts. If these features are active and in use, then the URL of these should be set from agent policy. Each URL is a template by default. You do not need to change these templates; you just need to check the checkbox in the Set column. These URL templates guarantee that tasks and histories are always downloaded through the active CE application server.
Each agent policy URL must be set.