Part 1 is an explanation of the UCS infrastructure and its built-in redundancy:
Things were so much simpler when we just bought a physical server with four physical network cards and built our virtual environment underneath it. We controlled the network failover with VMware’s built-in features so we could withstand a NIC blinking out. The virtual switch would even notify the physical switch what was going on so we didn’t get any MAC confusion. In some cases, we added another physical switch, with the alternate physical NIC attached to it, so we could also survive a full switch failure.
The Cisco UCS changed all that. The UCS is a bleeding edge server blade chassis with its own level of virtualization and failover. What I’m presenting today isn’t a problem solve, but a choice to make. Do we use Cisco or VMware to make sure our virtual machines stay online during physical, or in this case, Cisco level virtualization failure?
The mainstay of redundancy in the Cisco UCS are the blades, the I/O cards (also called Fabric Extenders) in the chassis and the separate interconnect fabric switches.
Let’s start at the blade level. The UCS uses a VIC (Virtual Interface Card) to present a vNIC to the VMware host, which the host accepts as physical. Each blade in the UCS has a physical network card that serves as the VIC. Each VIC can present up to 128 vNICs. So, if we were to configure 4 vNICS on one VIC, the VMware host installed on the UCS blade would think it had 4 physical network cards to work with, which is the same as the example in the first paragraph.
Ok, so each blade has a physical NIC serving as the VIC and that’s where the VMware hosts gets its NICs from. The next question should be, “What does the blade, and therefore the VIC, connect to?” That would be the I/O card or the Fabric Extender. These are the same thing and called one or the other depending on whom you are talking to. I’m going to call it the Fabric Extender, or FE for short. There are two FE’s per chassis and the blade’s NIC connects to each; a sort of East-West connection. You won’t see these connections, as they are part of the internal wiring on the chassis. That is the first level of redundancy. If one FE fails, the data will just be re-routed to the other since the blade’s NIC is connected to both.
The FE’s simply pass traffic, they are not switches. They “extend” the capabilities of the chassis by providing more ports. FE’s are usually added to the chassis as it fills with blades, although it is not required.
In the illustration, you can see the FE’s, labeled 1 and 2. These are located in the back of the UCS chassis. Since each blade’s physical NIC in the chassis has an internal connection to each, we have the first level of redundancy. If one FE were to fail, network traffic would seamlessly travel out the other.
Next up are the Fabric Interconnects (FI). The FE’s connect to the FI’s and the FI’s are connected to each other. In the above illustration, they are labeled as Fabric Interconnect 1 and 2 (although the proper labels are A and B, with A being represented as “1”). These aren’t switches, in the strictest sense as they don’t pass live Ethernet packets to each other. The main purpose of the Fabric Interconnects are to connect the chassis to the LAN. What the diagram doesn’t show is the next northbound connection, which is the LAN.
The FI’s do connect to each other, but only to synchronize network information so one can take over for the other in case of failure. This is the second layer of failover: The Fabric Interconnects share the same connection information so the entire chassis can run on one FI. Notice that each FE is connected to one FI and not both. Cross connects between the FE’s and FI’s are not supported and will cripple the UCS.
The above illustrates how the Fabric Interconnects are attached. They are properly labeled Fabric A and B.
Now we have a good picture of the UCS redundancy design:
– Each blade (and therefore each VIC and each VMware host) has two connections, one to each Fabric Extender
– Each Fabric Extender connects to one Fabric Interconnect. Since there are two FE’s, there are two ways to reach the northbound Fabric Interconnect level.
– The Fabric Interconnects are synchronized. In case of failure, one FI can pass all the traffic in and out of the chassis.
Part 2 will show which software to use, VMware or Cisco, when planning network failover.