Monday, August 31, 2015

EFM with Elastic Load Balancing on AWS

EnterpriseDB Failover Manager (EFM) has built-in support for virtual IP address use, but a VIP isn't appropriate for every environment, particularly for cloud environments where your nodes may be spread across different regions/zones/networks. When using Amazon Web Services, an Elastic IP Address (EIP) can be used instead through EFM's fencing script ability. An EIP isn't always the best choice, however, if you don't want a public IP address. This blog describes using Elastic Load Balancing (ELB) as an alternative. An ELB has the advantage that it can be completely internal to a VPC -- no public IP address will be involved -- while still being able to span multiple availability zones.

The ELB getting started guide is here, but the steps are fairly simple. Starting one from the AWS console involves:
  1. Giving the new balancer a name and picking your VPC (or using "EC2 Classic").
  2. If using a VPC, choose whether or not you want to make an internal load balancer or have it available to the Internet. If internal, you will also select the subnets and security groups to use later.
  3. Choose the protocols/ports to listen for traffic to forward.
  4. Configure security settings if a secure protocol (ssl, https) was picked in step 3.
  5. Configure health checks.
  6. Add instances.
  7. Optionally, add tags for organizing your AWS resources.
In step 5 above, you don't have to worry about the health checks in great detail. If using EFM to manage failover, you just want the ELB to send traffic to the current master, not be in charge of figuring out whether the master is alive or not. I use TCP pings to my database's port with rather large values for timeout, interval, and unhealthy threshold. You can set the "healthy" threshold as low as 2 checks, but this shouldn't normally come into play.

For step 6 above, we'll use imaginary instances "i-aaaaaaaa" for the master, and "i-bbbbbbbb" and "i-cccccccc" for two replica nodes. (For more information on setting up EFM with three database nodes, you can see a video example here or follow the user's guide.) Initially, you only want to add the master database node to the ELB. In effect, we're using the ELB as a private EIP. After the ELB has been created, you will have options for the address to use on the balancer's description tab. Note that your instances can be spread across availability zones when using this feature.

From the EFM perspective, this cluster is no different from any other 3-node cluster that is not having EFM manage a virtual IP address internally. To have EFM switch the load balancer after a failover, we need to add a script to call when a standby is promoted. EFM 2.0 provides two hooks for user-supplied scripts: a fencing script that is used to fence off the old master (i.e. from a load balancer) and a post-promotion script for any work that needs to be done once promotion has completed. In this case, I recommend using the fencing script entirely. The load balancer changes take a couple seconds, and even if promotion hasn't finished yet, the load balancer will see that the newly-added instance is in service. See chapter 3 of the EFM user's guide for more information on the fencing script property.

Our script will be using the AWS command line interface to alter the ELB's instances. More information on installing the CLI can be found here. These are the steps I took on the database nodes, though there are several ways to install the tools:
  1. curl "" -o ""
  2. yum install -y unzip (if needed)
  3. unzip
  4. ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
  5. aws configure
The first four steps I ran as root, or they could be done through sudo. Because the fencing script will be run as the 'efm' user, I ran the final configuration step as 'efm.' If you don't want the this user to have that much access to your aws controls, then the fencing script can simply be "sudo the_actual_script" with the appropriate sudoers permissions in place and the_actual_script is what we define below.

With the above initial steps completed, the fencing script is quite simple. On each node, the script will remove either/both of the two other nodes from the ELB and then add itself. Here is the example script on the i-aaaaaaaa node. You may only want these on the standby nodes, but it could be useful on the master in case you ever bring it up as a standby, or simply to reconfigure the balancer if a failed master is brought back online again.
[root@ONE ]}> cat /var/efm/

export ELB_NAME=<line balancer name>
export OLD="i-bbbbbbbb i-cccccccc"
export NEW=i-aaaaaaaa

aws elb deregister-instances-from-load-balancer --load-balancer-name ${ELB_NAME} --instances ${OLD}
aws elb register-instances-with-load-balancer --load-balancer-name ${ELB_NAME} --instances ${NEW}
That's all there is to the script. It should be tested that the 'efm' user can run this script on each node. You can run the script and refresh the AWS console to see the changes take effect. More information on the 'aws elb' command is on this page.

The final step is to specify this script in your file. It will then be run right before the trigger file is created on a standby that is being promoted. In this example, the property would be (along with comments in the file):
# Absolute path to fencing script run during promotion
# This is an optional user-supplied script that will be run during
# failover on the standby database node.  If left blank, no action will
# be taken.  If specified, EFM will execute this script before promoting
# the standby. The script is run as the efm user.
The script should only take a couple seconds to run, and the ELB will take another couple seconds to decide that the newly-added instance is in service and available for traffic to be sent to it. After that, traffic will be sent to your new master database.


  1. Thanks for such a nice article Bobby ...I m sure it will very helpful for us

    1. Note that I don't know if this is something you *should* do, just that it's possible. I think AWS recommends using EIP for this purpose.

  2. This comment has been removed by the author.