The days of VMware RDM(s) are over. That’s what the interweb tells me anyway, and I’m cool with that. I’m really cool with it after I discovered that RDM mappings on a VMware host slows its boot time by about three minutes per disk: VMware KB. I tried the workarounds briefly, and none of them worked the way I wanted in vSphere 6 (45 minute boot times), so it was on to Plan B: re-engineer.
At the time of original planning, RDM disks were the only answer. I was unable to get NPIV working to a Windows VM housed in a VMware host installed in a Cisco blade that is controlled by Cisco’s UCSM. I mean, c’mon. I now have the answer to the question “How many layers of virtual networking does it take to break the fiber channel NPIV feature set?” Four.
I’ll be incorporating the switch over along with Microsoft’s Failover Cluster Manager in Windows Server 2012 R2.
Note: The Failover Cluster Manager has a dangerous acronym, FCM or FOCM. Either way, be careful how you pronounce these as one word in meetings, especially if you incorporate a hard “C” with either option.
The testing and ultimately proven scenario used is a Windows file server cluster. I’ll be using the same technique for the legacy Microsoft SQL server clusters.
Prerequisites: All of these are in place before the Planning and Prepping stage.
- All storage arrays have iSCSI cards installed and configured
- The network has a VLAN specifically for iSCSI traffic
- The VMware infrastructure has a port group assigned to the iSCSI VLAN. VMware will tag the traffic, so give the VLAN the correct number in the port group.
- If Cisco’s UCS is in the mix, add a Common/Global VLAN to the LAN Cloud–>VLANs section in the UCSM.
Planning and Prepping it:
- Schedule downtime. Rebooting the servers is necessary for MPIO configuration but not iSCSI. The cluster will be unavailable during the transfer.
- Add a vNIC to each server and assign it to the iSCSI port group in VMware.
- Configure the IP settings of the vNIC in Windows (don’t add a gateway)
- In the Advanced IP section make sure to remove the check box for automatically registering the IP with DNS. Note: Don’t forget this step because it’s likely the iSCSI VLAN will not be accessible from any other, (unique switch ports and no trunks with the iSCSI VLAN on the list), so the server will resolve to its ISCSI IP address but will not answer a ping.
- Install the iSCSI Initiator on the Windows servers. On Windows 2012 this is simple. Search for iSCSI in the desktop and choose the initiator. The server will ask if you want the service to be started automatically- answer with yes.
- Add the server to the storage arrays.
- In a Dell Compellent right click the Servers icon or the folder that you want the server in and select Create Server. Choose Manually Define HBA’s. In the next windows choose iSCSI and add the iSCSI name of the server. The initiator name can be found in the Configuration tab of the iSCSI Initiator.
- For the two arrays I have (Dell and Pure) iSCSI servers will appear offline or be unable to connect unless there is a LUN presented to it by the array’s iSCSI connection.
- Create a place holder drive (something small, like 5GB) and map it to the server.
- On a Pure array, go to the Storage tab and click the plus sign next to Hosts. Choose Create Host and give it a name. Select the new host and click the Host Ports tab. To configure the ports click the gear to the right and enter the initiator name.
Before moving on make sure there are active ISCSI connections to each server. If not, check the iSCSI network connection by pinging the iSCSI cards on the arrays from the Windows servers. Also, double check there is a small iSCSI volume mapped to each server. You can use one drive and change the mapping to each server to verify connectivity. There will need to be an active iSCSI connection to configure Windows MPIO.
5. Configure MPIO on the Windows Servers.
MPIO is a Windows feature so it has to be installed but a server reboot is not necessary until we configure it to look at iSCSI devices. This blog posts has pictures and stuff. The first section goes through installing the feature and enabling iSCSI devices. Look through the storage vendor documentation on the best way to configure MPIO for the attached disks.
With everything needed in place, start swapping disk connections. A reboot of the Windows servers should not be necessary, but services will be interrupted. The VMs can be kept online and running. Here are the steps for making the swap as it worked for me.
In Windows FCM:
- Set the hardware cluster to offline and leave the cluster service running.
- Take the resources offline.
- Remove the resources from the cluster.
- Delete the resource disks.
In VMware vCenter:
- Delete all the RDM disks from all members of the cluster. In the deletion options choose delete from disk. The placeholder vmdk files need to be removed from the datastore; they’re just placeholders anyway since the data resides on a remote volume. Note: The virtual SCSI controllers will remain in the VM after the RDMs that were attached to them get deleted. They can be removed anytime after.
In the Storage Array
- Make sure your iSCSI based servers are in a cluster or group depending upon what the array uses.
- Remove the disk assignments from the VMware host or cluster.
- Assign the disks to the iSCSI cluster group
Back to Windows FCM:
On the owner node of the cluster re-scan the disks in Computer Management–>Disk Management. The drives will re-appear with the same drive letter as assigned before. Open FCM to re -add everything:
- Add the disks as cluster resources if they aren’t back in there already
- Assign the disks to the cluster role so the owner node has control of them
- Add the iSCSI Initiator Service as a disk dependency to each resource disk. Note: This is a must because of the way Windows handles the iSCSI service in a cluster. Failover will not be successful without this step. Right click each disk and go to Properties to add the dependency.
Once this happens the iSCSI service on the standby node will be turned off by Windows.
Prepare for a Successful Failover:
On the standby node re-scan for disks in Disk Management. Test the failover scenario. If it fails from the original owner node, re-scan on the stand-by once again so the disks appear. After that, failover will be successful. That’s it, everything should be back online and RDM free.
Changes to the Environment:
Keep these things in mind after switching to iSCSI disks in a virtual environment using a Microsoft Windows 2012 R2 cluster.
- The iSCSI service will be off on the Windows standby node. In some storage arrays this will cause an alert.
- VM snapshots will not capture data on an iSCSI disk
- VMware is not aware of iSCSI disks in this scenario as this is an OS level connection
- The VM will need to backed up as a physical machine or the storage array will need to be used.
- The VMs can be put back in vCenter controlled HA and DRS groups