High Availability Deployment

Network

Both HA instances are connected by a separate network connection called "replication channel". This channel is used from the HA Controller Swiftlet of both instances to replicate state and - most important - to exchange heart beat messages to detect whether the other instance is down.

This network connection should be a private network segment, connected via dedicated network cards on both hosts and a network switch. The required network speed depends on the type of persistent store and the message load. If the Replicated File Store is used and the message load is high, the load on the replication channel is high as well. In this case we recommend a Gigabit network link for the replication channel. Shared File Store and Shared JDBC Store don't require replication, so a 10 or 100 MBit link is ok.

Persistent Store

You can choose between Replicated File Store, Shared File Store and Shared JDBC Store.

The Replicated File Store is default. An image of the store is transfered to the STANDBY during connect of both HA instances. Thereafter the transaction log is synchronously replicated to the STANDBY to ensure consistency. This degrades performance and puts load on the replication channel. However, the advantage of this store is that you don't need disk sync of the transaction log because you have 2 copies of the store. So even if you hard kill the ACTIVE instance, you have the STANDBY that is consistent. You don't need expensive hardware either. Your standard SCSI disks will do it.

The Shared File Store doesn't require replication. So there is no wait time that the transaction log has been committed at the STANDBY as with the Replicated File Store. However, a Shared File Store requires disk sync in a HA environment. If disk sync is disabled and the ACTIVE is hard killed, the transaction log might be inconsistent and the STANDBY will fail on startup. Everything is down in that case and you will not have HA at all. Disk sync on standard disks is very slow in the range of 10 to 50 or more milliseconds per operation. To get acceptable performance and to avoid a single point of failure, you need a high speed RAID array or a SAN with high speed [fibre] network, which is very expensive.

The Shared JDBC Store doesn't require replication but requires a clustered database to avoid a single point of failure.