Monday, August 31, 2015

EFM with Elastic Load Balancing on AWS

EnterpriseDB Failover Manager (EFM) has built-in support for virtual IP address use, but a VIP isn't appropriate for every environment, particularly for cloud environments where your nodes may be spread across different regions/zones/networks. When using Amazon Web Services, an Elastic IP Address (EIP) can be used instead through EFM's fencing script ability. An EIP isn't always the best choice, however, if you don't want a public IP address. This blog describes using Elastic Load Balancing (ELB) as an alternative. An ELB has the advantage that it can be completely internal to a VPC -- no public IP address will be involved -- while still being able to span multiple availability zones.

The ELB getting started guide is here, but the steps are fairly simple. Starting one from the AWS console involves:
  1. Giving the new balancer a name and picking your VPC (or using "EC2 Classic").
  2. If using a VPC, choose whether or not you want to make an internal load balancer or have it available to the Internet. If internal, you will also select the subnets and security groups to use later.
  3. Choose the protocols/ports to listen for traffic to forward.
  4. Configure security settings if a secure protocol (ssl, https) was picked in step 3.
  5. Configure health checks.
  6. Add instances.
  7. Optionally, add tags for organizing your AWS resources.
In step 5 above, you don't have to worry about the health checks in great detail. If using EFM to manage failover, you just want the ELB to send traffic to the current master, not be in charge of figuring out whether the master is alive or not. I use TCP pings to my database's port with rather large values for timeout, interval, and unhealthy threshold. You can set the "healthy" threshold as low as 2 checks, but this shouldn't normally come into play.

For step 6 above, we'll use imaginary instances "i-aaaaaaaa" for the master, and "i-bbbbbbbb" and "i-cccccccc" for two replica nodes. (For more information on setting up EFM with three database nodes, you can see a video example here or follow the user's guide.) Initially, you only want to add the master database node to the ELB. In effect, we're using the ELB as a private EIP. After the ELB has been created, you will have options for the address to use on the balancer's description tab. Note that your instances can be spread across availability zones when using this feature.

From the EFM perspective, this cluster is no different from any other 3-node cluster that is not having EFM manage a virtual IP address internally. To have EFM switch the load balancer after a failover, we need to add a script to call when a standby is promoted. EFM 2.0 provides two hooks for user-supplied scripts: a fencing script that is used to fence off the old master (i.e. from a load balancer) and a post-promotion script for any work that needs to be done once promotion has completed. In this case, I recommend using the fencing script entirely. The load balancer changes take a couple seconds, and even if promotion hasn't finished yet, the load balancer will see that the newly-added instance is in service. See chapter 3 of the EFM user's guide for more information on the fencing script property.

Our script will be using the AWS command line interface to alter the ELB's instances. More information on installing the CLI can be found here. These are the steps I took on the database nodes, though there are several ways to install the tools:
  1. curl "" -o ""
  2. yum install -y unzip (if needed)
  3. unzip
  4. ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
  5. aws configure
The first four steps I ran as root, or they could be done through sudo. Because the fencing script will be run as the 'efm' user, I ran the final configuration step as 'efm.' If you don't want the this user to have that much access to your aws controls, then the fencing script can simply be "sudo the_actual_script" with the appropriate sudoers permissions in place and the_actual_script is what we define below.

With the above initial steps completed, the fencing script is quite simple. On each node, the script will remove either/both of the two other nodes from the ELB and then add itself. Here is the example script on the i-aaaaaaaa node. You may only want these on the standby nodes, but it could be useful on the master in case you ever bring it up as a standby, or simply to reconfigure the balancer if a failed master is brought back online again.
[root@ONE ]}> cat /var/efm/

export ELB_NAME=<line balancer name>
export OLD="i-bbbbbbbb i-cccccccc"
export NEW=i-aaaaaaaa

aws elb deregister-instances-from-load-balancer --load-balancer-name ${ELB_NAME} --instances ${OLD}
aws elb register-instances-with-load-balancer --load-balancer-name ${ELB_NAME} --instances ${NEW}
That's all there is to the script. It should be tested that the 'efm' user can run this script on each node. You can run the script and refresh the AWS console to see the changes take effect. More information on the 'aws elb' command is on this page.

The final step is to specify this script in your file. It will then be run right before the trigger file is created on a standby that is being promoted. In this example, the property would be (along with comments in the file):
# Absolute path to fencing script run during promotion
# This is an optional user-supplied script that will be run during
# failover on the standby database node.  If left blank, no action will
# be taken.  If specified, EFM will execute this script before promoting
# the standby. The script is run as the efm user.
The script should only take a couple seconds to run, and the ELB will take another couple seconds to decide that the newly-added instance is in service and available for traffic to be sent to it. After that, traffic will be sent to your new master database.

Wednesday, August 26, 2015

Video: EFM 2.0 Installation and Startup

We've posted two new videos about installing EnterpriseDB Failover Manager and starting a new EFM cluster.

The first video shows the installation and setup of EFM.

The seconds shows the steps involved in starting a failover manager cluster. After the initial cluster has started, we show more information about the new .nodes file as nodes are added to the cluster.

For more information, see the EFM 2.0 documentation.

Wednesday, July 29, 2015

Changes for EDB Failover Manager 2.0

EDB Failover Manager 2.0 includes several changes from 1.X. The user's guide contains information on upgrading and a full description of the properties. This blog gives a little more detail about just some of the new features.

New 'efm' command

The service interface now contains only the standard commands, such as start, stop,  and status. Ditto for systemctl on RHEL 7. For everything else, there is a new 'efm' script that is installed in /usr/efm-2.0/bin. This script is now used for commands such as cluster-status, stop-cluster, encrypt, etc, in addition to new commands in version 2.0. A full description is here.

Cluster name simplification

Every failover manager cluster running in the same network should have a unique cluster name. In 1.X, there were two separate places that a cluster name was specified: the service script (so that it could find the properties file location) and in the properties file itself (to define the name used by jgroups for clustering).

Version 2.0 simplifies this by using the convention that your cluster name is the same as the .properties and .nodes file names, and the files are expected in the /etc/efm-2.0 directory. Thus, a single parameter in the service script tells it what file information needs to be passed into the agent at startup. Likewise, passing the cluster name into the 'efm' script tells the script where to find the needed files in order to connect to a running agent (for instance, when running the 'cluster-status' command). There is no more parameter in the properties file.

This makes it even harder for you to accidentally run two clusters that cause interference with each other, and cuts down on the information needed to run failover manager. Section 4.9 of the user's guide has full information on how to run more than once cluster at a time, using separate cluster names. The change also simplifies the password text encryption, because you don't need to save the cluster name in a properties file first before running the encrypt utility.

Specifying initial cluster addresses

In 1.X, a cluster always had exactly 3 nodes, with the addresses never changing. You specified the addresses for these in each properties file. This accomplished two things:
  1. The cluster knew which node addresses were allowed to join.
  2. An agent, at startup, knew which addresses to contact to find the other cluster members.

EFM 2.0 supports an arbitrary number of standby (or, for that matter, witness) nodes. You may not know all of the addresses when starting the initial members -- you might, for instance, add another standby months after the cluster was started. So now the properties file doesn't contain agent/witness addresses. Each properties file records only that node's binding address (which was inferred from agents/witness properties in 1.X) and whether or not a node is a witness node.

After starting the first member, the two steps above are more explicit. For step 1, the 'efm' utility is used to add a new node's information to the list of allowed addresses. For step 2, you now start an agent with a list of existing cluster members in a .nodes file, kept in the same directory as the properties file.

After an agent joins the cluster, EFM will keep this file up-to-date for you as other nodes join or leave the cluster. Section 4.2 of the user's guide walks you through these steps.

Wednesday, June 3, 2015

IPv6 and Centos 6.6 -- SocketException: Permission denied

What I don't know about IPv6 would fill a very long web page.

When verifying that Enterprise Failover Manager (EFM) would work with IPv6 on Centos 6.6, my connections (using JGroups) would fail at the socket level with Permission denied.

A short Java app reproduces the problem:

[root@FOUR ~]}> cat
public class IPv6Test {
    public static void main(String[] args) {
        try {
            InetAddress ia = InetAddress.getByName("fe80::20c:29ff:feb0:ba66");
            System.err.println("Opening socket for: " + ia);
            Socket socket = new Socket(ia, 22);
            System.err.println("We have: " + socket);
        } catch (Exception e) {
[root@FOUR ~]}> javac && java IPv6Test
Opening socket for: /fe80::20c:29ff:feb0:ba66 Permission denied
    at Method)
    at [....]

I thought this was a problem at the OS level creating the socket in the first place, but now I understand that the error message could be coming back from the remote node and this is how it's displayed by Linux back to the client. In case others run into this, here are the configuration changes needed to get a proper IPv6 setup working. First, disable the usual suspects (if you know how to properly use NetworkManager and ip6tables, feel free to do so instead of killing them):

[root@FOUR ~]}> grep disabled /etc/selinux/config
#     disabled - No SELinux policy is loaded.
[root@FOUR ~]}> service NetworkManager stop
Stopping NetworkManager daemon:                            [  OK  ]
[root@FOUR ~]}> chkconfig NetworkManager off
[root@FOUR ~]}> service ip6tables stop
ip6tables: Setting chains to policy ACCEPT: filter         [  OK  ]
ip6tables: Flushing firewall rules:                        [  OK  ]
ip6tables: Unloading modules:                              [  OK  ]
[root@FOUR ~]}> chkconfig ip6tables off

At this point, the problem is my link-scoped IPv6 address on both nodes:

[root@THREE ~]}> ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:29:B0:BA:66  
          inet addr:  Bcast:  Mask:
          inet6 addr: fe80::20c:29ff:feb0:ba66/64 Scope:Link

What we want is a globally-scoped address for each node. After a little editing, this is my current eth0 config:

[root@FOUR ~]}> cat /etc/sysconfig/network-scripts/ifcfg-eth0

The IPv6 address was found using this link, and then using ::1, ::2, etc., for the various virtual machines. After the above changes and a reboot, I now have a proper global IPv6 address on my nodes:

[root@THREE ~]}> ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:29:B0:BA:66  
          inet addr:  Bcast:  Mask:
          inet6 addr: fdd9:6fe6:6a5b:3835::3/64 Scope:Global
          inet6 addr: fe80::20c:29ff:feb0:ba66/64 Scope:Link

…and no more SocketException with the new address:

[root@FOUR ~]}> javac && java IPv6Test
Opening socket for: /fdd9:6fe6:6a5b:3835:0:0:0:3
We have: Socket[addr=/fdd9:6fe6:6a5b:3835:0:0:0:3,port=22,localport=49971]

My thanks to the JGroups users forum (and Bela), the OpenJDK list, and Dave Page for their help getting me back on track with IPv6.

Tuesday, May 21, 2013

Vaadin 7 Cookbook

I finally have a chance to look at Vaadin version 7 now, which is already up to v7.0.6 (nice work team!). There is a wealth of information from Vaadin about the changes and migration path, but I'm skipping all that and jumping right to examples as an experiment. Thus, my first blog ever about a book.

I haven't always liked "cookbook" style technical books ("Python Cookbook" is a great one), but I'm enjoying Packt Pub's Vaadin 7 Cookbook a lot. It's not trying to be a replacement for the Book of Vaadin, which is still a required reference. Instead, it's a very easy-to-follow tour of things you're either going to do, or will want to do once you know you can.

When I say it's easy to follow, I mean the code examples are very concise and get to the point without becoming overwhelming. I could probably learn a lesson there in my own blog examples, heh. It helps that Vaadin applications are easy to construct, but it's still some nice writing to get a fully-working, useful example written so that it can be explained in small chunks and works on its own. Each "recipe" introduces some end goal, walks you through the code, and then has an explanation of why it works. The formula works well. Then links are included to the Vaadin API or Book of Vaadin for more information. The links alone makes this a great reference source.

As a fun example, the very last recipe in the book is, well, the opposite of the first web app we've all written (I don't want to give it away). The recipe very succinctly illustrates how easy it is to include Javascript code in your application. This was the first time I've seen that in Vaadin 7, and it's much easier than I expected. A somewhat larger example is the drag-and-drop uploader, which is something I think I'd be able to come up with eventually, but now I don't have to!

Overall, this is a great book of examples that cover a lot of common and not-so-common tasks in writing a Vaadin application. For the Vaadin newcomer, this book illustrates the power of the Vaadin framework very quickly. Reading it reminded me of how I felt when I first learned about Vaadin a thousand years ago. For the veteran developer, there will be things you haven't tried yet, especially if you're making the switch now from version 6 to 7.

If I have a complaint at all, it's that some of the downloaded examples are maven projects (so there's no setup to run), and others are just the source files. Still, it's pretty simple to change to an example directory and run "rm -rf $dir/* && cp -R . $dir/" where $dir is the package in an already-made project. This works for most of them -- don't forget to move .js files to the 'resources' dir in a maven project. So it's a very minor nit. Having all this example code in one place is easily worth buying the book.

Thursday, October 18, 2012

Bookmarklet: Restarting Vaadin Applications

This is a quick one....

When working on a Vaadin application, there are several ways to update the app's class files while sessions are still active: deploy to a server that supports session preservation, make a change to a properties file on the running server, use JRebel to push incremental changes, etc. Because the state of the UI is stored in the user's session, the changes may not be visible unless a new session is started.

You could always clear out your cookies, restart your browser, etc., but Vaadin offers a simple way to re-initialize the application so that you see the changes right away. Adding "?restartApplication" to the URL in the browser performs this function. See Debugging Vaadin Applications for more information.

If you're like me (lazy), typing "?restartApplication" more than once means there's got to be a better way. The bookmarklet below will handle this for you. Just drag it to your bookmarks bar and click it to restart the app. It takes everything in your current URL up to a question mark and then appends the magic phrase.


Happy reloading!

Friday, June 8, 2012

Session Timeouts with Vaadin's Refresher Add-on

The Refresher add-on for Vaadin refreshes the UI without user input, allowing it to display information that has changed asynchronously on the server. It adds an invisible component to the client web page which polls the server for changes. A matching server-side handler is called when the server is polled.

Refresher is among the most popular Vaadin add-ons, but adding a poll (or push) mechanism  to a Java EE application can be dangerous in terms of user sessions. The servlet container doesn't distinguish automatic poll requests from any user-initiated request, so the session will never time out on its own.

This blog shows an example of manually tracking user requests in order to time out sessions properly. The application is contained in this one class (linked here for reference). You can get the whole application in this zip file that includes pom.xml for building and a session listener to see when sessions start and end.

Note the README file: you have to build with the -Pcompile-widgetset option the first time because we're using a Vaadin add-on. Thank you to this blog for showing me how to put the widget compilation into its own profile!

The code is heavily commented to explain how user activity is tracked, but here is a summary:

  1. Using the HttpServletRequestListener interface we can note when a request comes in and compare the time of the request to the time of the previous one. If it's been X minutes, end the session.
  2. But we don't want to note the time of every request. So in step 1 we note the current time, but save the previous one (like pushing the time onto a stack). If the Refresher.RefreshListener is called, we know this isn't user input, so we can discard the current request time and only consider the older one (popping the stack).
  3. At this point, we can now compare the current request time to the older one and make a decision about ending the session.
In the above scenario, you can see that the session-timeout decision is only ever made in the refresher listener. It can't be in the onRequestStart() method because we don't know what kind of request it is (unless you want to parse the request stream!). We also can't end the session in that method because it causes an error within Vaadin. You could make this decision in every listener in your application if you wanted to, but I think the refresher is fine. Since it runs every N seconds, at the most a session will be N seconds longer than normal.

This description of the application is also contained in the source code:

 Simple app with a label and start button. When a user first loads the
 page, there are no automatic refresher calls happening and the container
 will timeout the session normally.
 Clicking the button starts the refresher, mimicking a user logging into
 the 'real' application UI. Whenever the UI is refreshed, it updates
 the label with a new value. This represents the UI reading state from
 another resource such as a database, which is constantly updated by
 other threads. For our simple case, we just show how much time we
 have left before ending the session.

The application was deployed and tested on a GlassFish 3.1.2 server (the Java EE reference implementation), but it should work on any Servlet 3.X container.

While the application works as-is, there is another aspect to consider that I will (hopefully) cover later. Some components will reload their data when a page is refreshed, for instance a Table that only shows a subset of its rows. In this case, the refresh call will force tables to make subsequent calls back to the server, extending the session. I'll add tables and their request handling to this sample application in a separate blog.