Thursday 30 June 2016

How to configure failover and high availability network bonding on Linux

This tutorial explains how to configure network bonding on Linux server. Before I start, let me explain what network bonding is and what it does. In a Windows environment, network bonding is called network teaming, this is a feature that helps any server architecture to provide high availability and failover in scenarios were one of the main ethernet cable has a malfunction or is misconfigured.
Normally, it is a best practice and a must have feature to be implemented when you set up a server for production purpose. Eventhough, this feature can be done in a Linux environment configuration, yet you have to confirm first with your network admin to ensure the switches that are linked to your server have support for network bonding. There are several bonding-modes that you can be implemented in your server environment. Below is a list of the available modes and what they do:
  • Balance-rr
    This mode provides load balancing and fault tolerance (failover) features via round-robin policy. Means that it transmits packets in sequential order from the first available slave through the last.
  • Active-Backup
    This mode provides fault tolerance features via active-backup policy. It means that once the bonding ethernet is up, only 1 of the ethernet slaves is active. The other ethernet slave will only become active if and only if the current active slave fails to be up. If you choose this mode, you will notice that the bonding MAC address is externally visible on only one network adapter. This is to avoid confusing the switch.
  • Balance-xor
    This mode provides load balancing and fault tolerance. It transmits based on the selected transmit hash policy. Alternate transmit policies may be selected via the xmit_hash_policy option.
  • Broadcast
    This mode provides fault tolerance only. It transmits everything on all slave ethernet interfaces.
  • 802.3ad
    This mode provides load balancing and fault tolerance. It creates an aggregation group that shares the same speed and duplex settings. It utilizes all slave ethernet interfaces in the active aggregator, it is based on the 802.3ad specification. To implement this mode, the ethtool must support the base drivers for retrieving the speed and duplex mode of each slave. The switch must also support dynamic link aggregation. Normally, this requires Network Engineer intervention for detailed configuration.
  • Balance-TLB
    This mode provides load balancing capabilities as the name TLB represent transmit load balancing. For this mode, if configuration tlb_dynamic_lb = 1, then the outgoing traffic is distributed according to current load on each slave. If configuration tlb_dynamic_lb = 0 then the load balancing is disabled, yet the load is distributed only using the hasd distribution. For this mode, the ethtool must support the base drivers for retrieving the speed of each slave.
  • Balance-ALB
    This mode provides load balancing capabilities as the name TLB represents adaptive load balancing. Similar to balance-tlb, except that both send and receive traffic are bonded. It receives load balancing by achieving ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the slaves in the bond. For this mode, the ethtool must support the base drivers for retreiving the speed of each slave.


1. Preliminary Note

For this tutorial, I am using Oracle Linux 6.4 in the 32bit version. Please note that even though the configuration are done under Oracle Linux, the steps are applicable also to CentOS and Red Hat OS distro and to 64Bit systems as wwell. The end result of our example setup will show that the connection made to our bonding server will remain connected even though I've disabled 1 of the ethernet networks. In this example, I'll show how to apply network bonding using mode 1 which is the active-backup policy.


2. Installation Phase

For this process, there's no installation needed. A default Linux installation of a server includes all required packages for a network bonding configuration.


3. Configuration Phase

Before we start the configuration, first we need to ensure we have at least 2 ethernet interfaces configured in our server. To check this, go to the network configuration folder and list the available ethernet interfaces. Below are the steps:
cd /etc/sysconfig/network-scripts/
ls *ifcfg*eth*
The result is:
ifcfg-eth0 ifcfg-eth1
Notice that we currently have 2 ethernet interfaces by setup in our server which are ETH0 and ETH1.
Now let's configure a bonding interface called BOND0. This interface will be a virtual ethernet interface that contains the physical ethernet interface of ETH0 and ETH1. Below are the steps:
vi ifcfg-bond0
DEVICE=bond0
ONBOOT=yes
MASTER=yes
IPADDR=172.20.43.110
NETMASK=255.255.255.0
GATEWAY=172.20.43.1
BONDING_OPTS="mode=1 miimon=100"
TYPE=Ethernet 
Then run:
ls *ifcfg*bon*
The result is:
ifcfg-bond0

That's all. Please notice that inside the BOND0 interface, I've included an IP address. This IP address will be the only IP address connected to our server. To proceed in the process, we need to modify the physical ethernet interface related to the BOND0 interface. Below are the steps:

vi ifcfg-eth0
DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=no
MASTER=bond0
SLAVE=yes 
vi ifcfg-eth1
DEVICE=eth1
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=no
MASTER=bond0
SLAVE=yes 
Done. We've made the modification of the interface ETH0 and ETH1. Notice that we've removed the IP address inside both interfaces and appended MASTER = bond0. This is needed to validate that both interfaces will be virtual interfaces which are dedicated to the ethernet BOND0 interface.
To proceed with the configuration. Let's create a bonding configuration file named bonding.conf under /etc/modprobe.d . Below are the steps:
vi /etc/modprobe.d/bonding.conf
alias bond0 bonding
options bond0 mode=1 miimon=100 
modprobe bonding
Based on the above config, we've configured a bonding module using interface BOND0. We also assigned the bonding configuration to use mode = 1 which is active-backup policy. The option miimon = 100 represents the monitoring frequency for our bonding server to monitor the interface status in milli seconds. As per description above, this mode will provide fault tolerance features in the server network configuration.
As everything is setup, let's restart the network service in order to load the new configuration. Below are the steps:
service network restart
Shutting down interface eth0: [ OK ]
Shutting down interface eth1: [ OK ]
Shutting down loopback interface: [ OK ]
Bringing up loopback interface: [ OK ]
Bringing up interface bond0: [ OK ]

Excellent, now we have loaded the new configuration that we had made above. You'll notice that the new interface called BOND0 will be shown on the network list. You also will notice that there is no IP address assigned to the interface ETH0 and ETH1 interfaces, only the BOND0 interface shows the IP.

ifconfig
bond0 Link encap:Ethernet HWaddr 08:00:27:61:E4:88
inet addr:172.20.43.110 Bcast:172.20.43.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe61:e488/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:1723 errors:0 dropped:0 overruns:0 frame:0
TX packets:1110 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:147913 (144.4 KiB) TX bytes:108429 (105.8 KiB)
eth0 Link encap:Ethernet HWaddr 08:00:27:61:E4:88
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:1092 errors:0 dropped:0 overruns:0 frame:0
TX packets:1083 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:103486 (101.0 KiB) TX bytes:105439 (102.9 KiB)
eth1 Link encap:Ethernet HWaddr 08:00:27:61:E4:88
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:632 errors:0 dropped:0 overruns:0 frame:0
TX packets:28 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:44487 (43.4 KiB) TX bytes:3288 (3.2 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:208 errors:0 dropped:0 overruns:0 frame:0
TX packets:208 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:18080 (17.6 KiB) TX bytes:18080 (17.6 KiB)

You can also check the bonding status via this command:

cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 08:00:27:61:e4:88
Slave queue ID: 0
Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 08:00:27:c8:46:40
Slave queue ID: 0
Notice on the above that we've successfully converted the interfaces ETH0 and ETH1 into a bonding configuration using active-backup mode. Stated also now the server is using interface ETH0, ETH1 will be as backup interface .


4. Testing Phase

Now as everything is configured as expected. Let's made a simple test to ensure the configuration we made is correct. For this test, we will login to a new server (or Linux desktop) and start pinging our bonding server to see if there's an intermittent connection happen during the test. Below are the steps:
login as: root
root@172.20.43.120's password:
Last login: Wed Sep 14 12:50:15 2016 from 172.20.43.80
ping 172.20.43.110
PING 172.20.43.110 (172.20.43.110) 56(84) bytes of data.
64 bytes from 172.20.43.110: icmp_seq=1 ttl=64 time=0.408 ms
64 bytes from 172.20.43.110: icmp_seq=2 ttl=64 time=0.424 ms
64 bytes from 172.20.43.110: icmp_seq=3 ttl=64 time=0.415 ms
64 bytes from 172.20.43.110: icmp_seq=4 ttl=64 time=0.427 ms
64 bytes from 172.20.43.110: icmp_seq=5 ttl=64 time=0.554 ms
64 bytes from 172.20.43.110: icmp_seq=6 ttl=64 time=0.443 ms
64 bytes from 172.20.43.110: icmp_seq=7 ttl=64 time=0.663 ms
64 bytes from 172.20.43.110: icmp_seq=8 ttl=64 time=0.961 ms
64 bytes from 172.20.43.110: icmp_seq=9 ttl=64 time=0.461 ms
64 bytes from 172.20.43.110: icmp_seq=10 ttl=64 time=0.544 ms
64 bytes from 172.20.43.110: icmp_seq=11 ttl=64 time=0.412 ms
64 bytes from 172.20.43.110: icmp_seq=12 ttl=64 time=0.464 ms
64 bytes from 172.20.43.110: icmp_seq=13 ttl=64 time=0.432 ms
During this time, let's go back to our bonding server and turn off the ethernet interface ETH0. Below are the steps:
ifconfig eth0
eth0 Link encap:Ethernet HWaddr 08:00:27:61:E4:88
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:1092 errors:0 dropped:0 overruns:0 frame:0
TX packets:1083 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:103486 (201.0 KiB) TX bytes:105439 (122.9 KiB)
ifdown eth0
Now we have turned off the services for the network interface ETH0. Let's check the bonding status. Below are the steps:
cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 08:00:27:c8:46:40
Slave queue ID: 0
You will notice, that now the ETH0 interface does not exist in bonding status anymore. During this time, let's go back to the previous test server and check the continuous ping to our bonding server.
64 bytes from 172.20.43.110: icmp_seq=22 ttl=64 time=0.408 ms
64 bytes from 172.20.43.110: icmp_seq=23 ttl=64 time=0.402 ms
64 bytes from 172.20.43.110: icmp_seq=24 ttl=64 time=0.437 ms
64 bytes from 172.20.43.110: icmp_seq=25 ttl=64 time=0.504 ms
64 bytes from 172.20.43.110: icmp_seq=26 ttl=64 time=0.401 ms
64 bytes from 172.20.43.110: icmp_seq=27 ttl=64 time=0.454 ms
64 bytes from 172.20.43.110: icmp_seq=28 ttl=64 time=0.432 ms
64 bytes from 172.20.43.110: icmp_seq=29 ttl=64 time=0.434 ms
64 bytes from 172.20.43.110: icmp_seq=30 ttl=64 time=0.411 ms
64 bytes from 172.20.43.110: icmp_seq=31 ttl=64 time=0.554 ms
64 bytes from 172.20.43.110: icmp_seq=32 ttl=64 time=0.452 ms
64 bytes from 172.20.43.110: icmp_seq=33 ttl=64 time=0.408 ms
64 bytes from 172.20.43.110: icmp_seq=34 ttl=64 time=0.491 ms

Great, now you'll see that even though we have shutdown the interface ETH0, we are still able to ping and access our bonding server. Now let's do 1 more test. Turn back on ETH0 interface and turn off ETH1 interface.
ifup eth0
cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 08:00:27:c8:46:40
Slave queue ID: 0
Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 08:00:27:61:e4:88
Slave queue ID: 0
As the ETH0 interface was already up, let's shutdown ETH1 interface.
ifdown eth1
cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 08:00:27:61:e4:88
Slave queue ID: 0

Now let's go back to the test server and check what happen on the continous ping made to our bonding server

64 bytes from 172.20.43.110: icmp_seq=84 ttl=64 time=0.437 ms
64 bytes from 172.20.43.110: icmp_seq=85 ttl=64 time=0.504 ms
64 bytes from 172.20.43.110: icmp_seq=86 ttl=64 time=0.401 ms
64 bytes from 172.20.43.110: icmp_seq=87 ttl=64 time=0.454 ms
64 bytes from 172.20.43.110: icmp_seq=88 ttl=64 time=0.432 ms
64 bytes from 172.20.43.110: icmp_seq=89 ttl=64 time=0.434 ms
64 bytes from 172.20.43.110: icmp_seq=90 ttl=64 time=0.411 ms
64 bytes from 172.20.43.110: icmp_seq=91 ttl=64 time=0.420 ms
64 bytes from 172.20.43.110: icmp_seq=92 ttl=64 time=0.487 ms
64 bytes from 172.20.43.110: icmp_seq=93 ttl=64 time=0.551 ms
64 bytes from 172.20.43.110: icmp_seq=94 ttl=64 time=0.523 ms
64 bytes from 172.20.43.110: icmp_seq=95 ttl=64 time=0.479 ms

Thumbs up! We've successfully configured and proven our bonding server manages to cater the disaster recovery scenario on a network fail over condition.


Thanks to :https://www.howtoforge.com/tutorial/how-to-configure-high-availability-and-network-bonding-on-linux/