These notes are based on the limitations of RDM disks in a SQL 2012 cluster in conjunction with Microsoft’s Cluster Services (MSCS). If you are lucky enough to be using (READ: afford) the newer features in SQL 2012 with vSphere 5.5 then your cluster isn’t as much of a , well, “cluster” in the worst sense.
In this case, we’re dealing with a very basic two server cluster. The VMs are version 8 with the latest VMtools (as of this writing) in a vSphere 5.5 environment. Each VM’s OS is Windows Server 2012 R2, with SQL 2012. The OS disk and separate data disks are VMDKs local to each VM. The clustered disks are all RDM mappings to volumes on a Dell Compellent storage array.
***When I mention “cluster” in the following paragraphs I’m referring to the SQL or MSCS cluster, not a vSphere host cluster***
There are many limitations when setting up clusters using a combination of RDM disks and MSCS, which are detailed in my last post, here. If MSCS is using shared disks then most of VMware’s best features, like DRS (both vMotion and Storage versions) are illegal. The caveat to the shared disk implementation is how the secondary server in the cluster references the shared disks. In the initial setup, the primary server, or the first server being setup in the cluster, receives the direct RDM mappings via VMware.
The secondary server gets attached to the shared disks by using “an existing disk” of the primary server. The existing disk is a actually a RDM, but looks like a VMDK in the primary server’s VM folder. So, we have this linear thing going…
Raw Volume on Storage Array<—>Primary Server’s RDM disk attached<—>Secondary Server attached to existing disk
Connecting a VM to an existing disk requires pointing to the direct path of where that disk is. So, if the primary server is in, say, datastore 1 in VMware, then the secondary server has a direct link to datastore 1 for the shared disk connection. Thus, we can’t move the location of the primary server. If we do, the secondary server cannot connect to the shared disks and the cluster will fail.
Of course we’ll have to move the VM at some point, right? Here’s how:
1. Schedule downtime. The amount will depend on the size of the non-RDM disks. We’re not moving those, since they’re on the storage array.
2. Shut down all the servers in the cluster. I would also add that the first server built in the cluster should still the cluster master/owner at this point. Make sure all the shared disks are owned by the primary server. If the shared disks are owned by a secondary server, change the owner/fail back to the primary.
3. Verify what RDM’s are attached to which SCSI controllers. This is important because we have to destroy/rebuild the RDM mappings on all the servers except the primary. The secondary servers should mimic the SCSI Controller to disk connections that the primary has. So if vmdk_3 is currently connected to SCSI Controller 3 on the primary, the secondary needs to comply to the same configuration.
4. Move the primary server to the new datastore.
5. Verify the move of the primary and make sure all the RDM disks are in the folder. You could power it on at this point to make sure Windows sees all the disks. That’s optional though.
6. On the secondary server, edit the settings of the VM. Remove and delete the RDM disks from the machine. Delete? Yeah, that freaked me too, but here’s why: Once the primary has been moved, browse to the VMware datastore where it used to be. You’ll see a folder with the primary server’s name. Inside the folder, there will be “ghost” RDM disks with no other files. These folders and files are created once the primary has moved, so the secondary can still have a chance at booting up. If you don’t delete the RDM disks from the secondary, you’ll have ghost folders and disks in the datastores.
7. On the secondary server, once the RDM disks have been removed and deleted, re -add them. This is where you’ll need the SCSI Controller to disk connections that mimic the primary. Add the “existing disks” by browsing to the primary’s new location and choosing the disks one by one, connecting them to separate SCSI Controllers (i.e. 1:0,2:0).
8. On the secondary server, verify all the new SCSI controllers have their SCSI Bus Sharing set to “Physical” and are LSI Logic SAS type.
9. At this point, the secondary servers should be powered on and the MSCS cluster verified.