Wow, that’s a boring headline. I should be a little more creative with those. Anyway, I ran into an interesting situation on a Friday before a long weekend. I needed to clone a live production virtual server for some testing. Normally, this wouldn’t be a big deal but our VMware environment ended up posing some challenges to an otherwise standard task.
The production side has eight hosts in one cluster with 150+ VM’s. We’re running vSphere 5.1 with HA and DRS configured. All the hosts tie in on the virtual network via vDS (virtual Distributed Switch) with static binding. Static binding is the default setting for vDS. It will assign a port on the vDS to the virtual machine and keep it assigned no matter the power state of the VM. As VMware explains it, static binding works like a physical switch. When you plug a network cable into a switch, the port used is not available for another cable, just as the port assigned in static binding on a vDS is not available for another vNIC. However, there was no warning about live cloning in this particular configuration.
On my first attempt to clone the live production VM, I got an instant error. There was a resource conflict. The error stack reported the “Guest vDS” and “1404” was already in use. Our Guest vDS is divided up into 6 different VLAN’s, on six different vSwitch port groups, segmenting departments. I didn’t know what the 1404 was referencing and Google didn’t know either (a bad sign). Since the Guest vDS was involved in the error, I started there.
In the properties of the port group that the production server was on, 1404 was the static port number assigned to the VM by the vDS. The clone couldn’t be created because it can’t be assigned to the same static port number.I was going to have to remove the conflict and I didn’t want to power off the VM and add some after hours work. So, I had a some choices:
1- Change the binding in the port group to Dynamic or Ephemeral ( VMware KB explaining the differences ). This would remove the binding for every machine on the port group. I decided against this as it was approaching closing time on a Friday. The odds of this adjustment screwing anything up was small. It can be done on the fly and is immediately reversible. Still, I didn’t want that to be the answer.
2- In the Clone Wizard, edit the static port assignment of the vNIC in the Edit Virtual Hardware section of the clone’s properties. I thought this idea was clever, I would keep the clone on the same port group but assign a port number myself. Since vCenter does the port assignment on a vDS, this plan also failed. By trying to do vCenter’s job on a machine that had not been fully created, the unique port I entered caused the same error. Don’t try and cross vCenter is the basic message I’m trying to convey.
3- Remove the clone’s vNIC in the Edit Virtual Hardware settings. This caused the cloning to fail with a “device ‘0’ error”. This is documented as a time mismatch between hosts in the cluster. All of the hosts had the correct time and NTP settings, so this must have been unique to what i was trying to do. Since editing virtual hardware in an unborn clone is experimental, I expect the errors to be general and otherwise uninformative. vCenter might as well have given me the finger. I would have got the same info out of it.
4- Staying in the Edit Virtual Hardware settings, I changed the port group the clone would attach to upon creation. I also confirmed the specific port number assigned was free on the vDS port group I was attaching it to. Essentially I was going to plug it into a different switch. The clone was created without a problem.
The Edit Virtual Hardware settings are “experimental” as it says in the Clone Wizard, but it allowed me to have a three day weekend, so there’s that.