Exchange 2010 CAS Array NLB not working on VMWare ESX

Exchange 2010 CAS Array NLB not working on VMWare ESX

Issue:

I have came up with an issue in one of our customer environment where in the CAS Server losses the connectivity when the second node of CAS Server gets rebooted.

Configuration: 

We have got 2 CAS/HUB Servers on ESX VMware configured as CAS-Array using WNLB

Diagnosis:

  1. When either of the node is rebooted the Outlook users gets disconnected until the server comes up
  2. Windows NLB is configured in Unicast Mode
  3. CAS servers are virtualised on ESX VMware
  4. The NLB traffic is being redirected to only one server
  5. Binding of the NIC cards were wrongly configured.
  6. IPV6 was enabled

Resolution:

  • Disabled IPV6
  • Changed the binding order of NIC Cards
  • As per VMware the best way to configure WNLB is in multicast mode however for that the Network Switch needs to be configured for Multicast traffic.
  • If we want the NLB to be configured in unicast mode it has to be configured on separate NICs other than the production NIC in VMware environment.
  • Please find the below sample NLB configured in unicast mode on virtualised servers.

Sample Configuration

In unicast mode, all the NICs assigned to a Microsoft NLB cluster share a common MAC address. This requires that all the network traffic on the switches be port-flooded to all the NLB nodes. Normally, port flooding is avoided in switched environments when a switch learns the MAC addresses of the hosts sending network traffic through it.
The Microsoft NLB cluster masks the cluster’s MAC address for all outgoing traffic to prevent the switch from learning the MAC address.
In the ESXi/ESX host, the VMkernel sends a RARP packet each time certain actions occur; for example, when a virtual machine is powered on, experiences teaming failover, performs certain vMotion operations, and so forth. The RARP packet informs the switch of the MAC address of that virtual machine. In an NLB cluster environment, this exposes the MAC address of the cluster NIC as soon as an NLB node is powered on. This can cause all inbound traffic to pass through a single switch port to a single node of the NLB cluster.
To resolve this issue, you must configure the ESXi/ESX host to not send RARP packets when any of its virtual machines is powered on.
Notes:

  • VMware recommends configuring the cluster to use NLB multicast mode even though NLB unicast mode should function correctly if you complete these steps. This recommendation is based on the possibility that the settings described in these steps might affect vMotion operations on virtual machines. Also, unicast mode forces the physical switches on the LAN to broadcast all NLB cluster traffic to every machine on the LAN. If you plan to use NLB unicast mode, ensure that:
    • All members of the NLB cluster must be running on the same ESXi/ESX host.
    • All members of the NLB cluster must be connected to a single portgroup on the virtual switch.
    • vMotion for unicast NLB virtual machines is not supported.
    • The Security Policy Forged Transmiton the Portgroup is set to Accept.
    • The transmission of RARP packets is prevented on the portgroup / virtual switch as explained in the later part of the article.
  • VMware recommends having two NICs on the NLB server.

ESXi/ESX 3.x, 4.x, and 5.x

You can prevent the ESXi/ESX host from sending RARP packets upon virtual machine power up, teaming failover, and so forth using the Virtual Infrastructure (VI) Client or vSphere Client. You can control this setting at the virtual switch level or at the port group level.
To prevent RARP packet transmission for a virtual switch:
Note: This setting affects all the port groups using the switch. You can override this setting for individual port groups by configuring RARP packet transmission for a port group.

  1. Log into the VI Client/vSphere Client and select the ESXi/ESX host.
  2. Click the Configuration
  3. Click Networkingunder Hardware.
  4. Click Propertiesfor the vSwitch. The vSwitch Properties dialog appears.
  5. Click the Ports
  6. Click vSwitchand click Edit.
  7. Click the NIC Teaming
  8. Select Nofrom the Notify Switches
  9. Click OKand close the vSwitch Properties dialog box.

To prevent RARP packet transmission for a port group:
Note: This setting overrides the setting you make for the virtual switch as a whole.

  1. Log into the VI Client or vSphere Client and select the ESXi/ESX host.
  2. Click the Configuration
  3. Click Networkingunder Hardware.
  4. Click Propertiesfor the vSwitch. The vSwitch Properties dialog appears.
  5. Click the Ports
  6. Click the portgroup you want to edit and click Edit.
  7. Click the NIC Teaming
  8. Select Nofrom the Notify Switches
  9. Click OKto close the vSwitch Properties dialog.

ESX 2.x

  1. Log into the Management Interface and click Options> Advanced Settings.
  2. Set the value for NotifySwitchto 0.
    Note: Net.NotifySwitch is a global setting that impacts all virtual machines.

For more information on NLB, see the Microsoft TechNet article Network Load Balancing Technical Overview.

For related information, see     Microsoft Network Load Balancing Multicast and Unicast operation modes   (1006580).
Windows 2008 introduced a strong host model that does not allow different NICs to communicate with each other. For example, if a request comes in on the second NIC and if there is no default gateway set up, then the NIC will not use the first NIC to reply to the requests, even though a default gateway setup on the first NIC.
To change that behavior and return to the 2003 model, run these commands from the command prompt:
netsh interface ipv4 set interface “Local Area Connection” weakhostreceive=enable netsh interface ipv4 set interface “Local Area Connection” weakhostsend=enable
Where Local Area Connection is the name of the network interface.
For more information, see the Microsoft TechNet Magazine article on Strong and Weak Host Models.