Major Minor Build Revision
----- ----- ----- --------
1 0 0 0
PowerShell 2.0 (Windows 7 - 2008)
Major Minor Build Revision
----- ----- ----- --------
2 0 -1 -1
SRE, VMware Virtualization, vSphere, vCD, ESX, Configuration Management, Microsoft AD, Security, Networking, and about anything else..
Major Minor Build Revision
----- ----- ----- --------
1 0 0 0
PowerShell 2.0 (Windows 7 - 2008)
Major Minor Build Revision
----- ----- ----- --------
2 0 -1 -1
I have a VM with snapshots exported from lab manager, so I am using the vmware-vdiskmanager to consolidate my 14 or so linked clones/snapshots to one flat file. the vmware-vdiskmanager can be found in server or workstation installations, it’s not in the path, you have to go to the directory to find it.
vmware-vdiskmanager -r 015495-2008R2.vmdk -t 2 2008R2.vmdk
What I found was that NTP has a default maximum correction of about 1000 seconds. I had this same issue, but if I set the clocks 3-4 minutes off, they would auto correct within about 15 minutes. Using ESXi 4.1, almost none of the ntpd commands gave me any output, but I was able to see the corrections happening in the /var/log/messages.
1) Firmware Upgrades, make sure VT & DEP are enabled.
2) If Blade Chassis
a. Make sure networking is ready, i.e. virtual connect
b. Rename blade in chassis/iLo
3) If using shared storage (and you better be)
a. Fiber Channel
i. setup VSAN’d and Zoned.
ii. Properly setup
c. Present LUN(s) to Server
4) Install ESX or ESXi
5) ESXi only
a. Change Root Password
b. Change IP to static
c. Change hostname
6) Allow SSH (if you want)
7) Install Hardwareproviders/ agents, i.e. HP ESX Agent
8) Add to vCenter
9) Apply vCenter templates (this should do the following automatically)
a. Setup Networking vNics, vMotion & Management
c. NFS storage
10) Patch with VUM
11) Setup Monitoring, SiteScope, vCM, etc..
12) Add to Lab Manager
that’s ProgramDATA, not program files..
I wanted a template I could add to Lab Manager so it would auto start with a new hostname and IP address every time, I used CentOS 5.5 because there is a RedHat 5 64bit tools for guest customization already available. I created a VM with a 10GB system and a 40GB data drive based off of RH5x64.
There are many good guides to get TGT installed and working, this was probably the best one.
I still had quite a few issues, but here is some good info that helped me sort it all out.
If your like me, doing something one time is nice, but ultimately not useful, you need this service to auto-start or it’s pretty much useless, therefore I think the tgtadm anc tgt-admin commands are only good for testing/setup.
Most of the guides have you setup a test file that you mount, that’s great for testing, but I wanted to map a whole physical (vm) drive, for me that was /dev/sdb. I thought for a while my problem was that I needed to partition that drive i.e. /dev/sdb1, but luckily that isn’t necessary, I tested it both ways.
All I really needed to do was this:
1) Install TGT – yum install scsi-target-utils
2) make the tgtd service auto start - chkconfig tgtd on
3) make the firewall auto-stop (this is a lab application) - chkconfig iptables off
4) modify the /etc/tgt/targets.conf file to look something like this:
5) reboot and your done.
Here are some useful commands for testing:
Show what your current setup looks like (this is the most useful of the tgtadm commands =
tgtadm --lld iscsi --op show --mode target
The second most useful is this one, it reloads your TGT from the targets.conf file so you can see if you have it working right =
The rest are mildly useful.
Setup a test target = tgtadm --lld iscsi --op new --mode target --tid 1 -T iqn.2001-04.com.1234:example
Setup a test LUN on above target = tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /dev/sdb
Allow all IP’s to use your target = tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL (theoretically this is not needed per all documentation, but I found otherwise)
There was a mistake made on initial setup of an ESX host, and the Clariion grabbed that info and would not let go. I spent quite a while looking for “delete”, to no avail. I know the ESX hosts are correct now, but I can’t get the Clariion to refresh, the solution was to go to each SP and “Restart Management Server”, upon reboot they rescanned and found the correct info. http://SPAipaddress/setup, and http://SPBipaddress/setup
After the reboot, everything looks good and I was able to add the correct hosts to the storage groups.
if your ever stuck trying to use /nascmd/sbin/clariion_mgmt and the command just won’t go through, try this option -skip_rules
So I read a lot of Scott Lowe’s blog, and again he has the answer, but I needed just a bit more detail to make this work. Since i’m using his blog as a starting point i’ll reference it here. I also used this here from EMC
first, change the IP of the Celerra Manger, also known as the “control station”
Then change the IP of the SP’s with this command
/nasmcd/sbin/clariion_mgmt -start -spa_ip 128.221.ххх.ххх -spb_ip 128.221.ххх.ххх -use_proxy_arp
So a few things i’ll add
1) when using the clariion_mgmt tool, make sure to ssh as nasadmin, then su to root, if you ssh directly as root, the command might fail for you as it did for me with a “NAS_DB environment variable is not defined” message.
2) default passwords for nasadmin and root are “nasadmin”
3) when you setup a PPP connection to the Celerra or Clariion, do it this way:
Use the special EMC serial cable and the maintenance port on the back of the SP. Create a PPP dial up connection on your Windows laptop 115200 baud, HW flow control. Point a web browser to http://192.168.1.1/setup .
After you P2V something, there are alot of old hardware devices that no longer exist, but you can’t see them by default, this article tells you how to see them so you can remove them.
log into HP OBA with SSH
update ilo all tftp://x.x.x.x/ilo2_200.bin
wait 10 minutes
This is pretty simple, but for some reason hard to google4.
1) download the latest Kickstart and System image from cisco.com
2) tftp, or whatever method you like get the image onto the Cisco
3) Run this on cisco console in “config” mode
a) boot kickstart bootflash:/m9100-s2ek9-kickstart-mz.5.0.1a.bin
b) boot system bootflash:/m9100-s2ek9-mz.5.0.1a.bin
4) Save your Config, Reboot the Switch..and your done.
You probably don’t want to skip versions, i.e. 3-5 directly.
You can probabaly only hold 2 versions at a time, so if your on 3, upload 4, upgrade to 4, delete 3, then upload 5, update to 5, rinse, wash, repeat.
I must like to learn things the hard way, like for instance when I run out of space on a Lab Manager LUN. The default is 5GB of free space for a “yellow” alert, and 3GB for a “red” alert. For my application, we burn through 50GB in a single deployment (yes even with linked clones) because we spin up a large number of machines simultaneously. These alerts are configured on a PER LUN basis and not as a global setting (unfortunately). So you have to go into Resources/Datastores/Lun Properties, then set your limits higher. You must do that for every LUN you want to be alerted on.
So other than good old ADUC, lets assume you just have DNS (nslookup) access
> set type=SRV
I used this article and was able to get the certificate generated and installed
But after reboot I can SSH to my ESX host, but the certificate is not working the webpage is dead and I can’t add the ESX host back to vCenter
I found the problem, at the end of the article it has you put the key and cert in the wrong place
cp rui.key /host/ssl_key
cp esx.cer /host/ssl_cert
but that doesn't work.
you need to do this instead:
cp rui.key /etc/vmware/ssl/
cp esx.cer /etc/vmware/ssl/
mv esx.cer rui.crt (must be renamed)
Often the media store is not available and it appears you can’t add a media store to an organizaiton. To fix this make sure you go into Organizations/OrgName Properties. Unless you add the Media Datastore to the “Datastores” it won’t let you add the Media Store to the “Media Stores” section.
1) Disable Deployments to that server by going to Resources/Hosts click disable.
2) Migrate live VM’s to other blades with access to same datastores by going to vCenter, choose the server, Virtual Machines tab. Select everything but the VMwareLM-ServiceVM and migrate to another host. Wait for this step to complete.
3) Unprepare the host in LabManager, wait for complete
4) In vC Put the host in Maintenance Mode, wait for complete, then Disconnect, then Remove from the cluster.
5) SSH or local access to ESX host.
Then run esxcfg-advcfg –s <HostName> /Misc/Hostname
Wait for host to come back online
6) Add host back to vC cluster
7) Apply Host Profiles if applicable (you’ll need this for vDistributed Switches) Steps Below:
a) attach profile to host
b) check compliance
c) apply profile
d) Exit server from Maintenance Mode
8) Prepare host inside of LBM
Your done, but just to make sure everything worked as planned, migrate a VM from another host to this host, if that works, test some LBM Deployments and see if any deploy to this host.
The job gets to 2% and hangs, you have a VM on that blade called 000000-VMwareLM-ServiceVM-x00-x00 without VMware tools installed.
The simple version is that this VM is there to help with “host spanning networks”. So inside the Lab Manager UI Resources/Hosts, you must go into properties of the host and disabled the checkbox for “Host Spanning Enabled”, then the VM will power down and you will be able to enter maintenance mode gracefully.
To fix this, copy a working svchost.exe from a good XP 32 bit SP3 machine and reboot and your good to go.
I have a copy if you want it. www.bsmith9999.com/dl/mcafee_fix.zip
I’ve had a few remote people ask how to fix this on their own.
Login to your computer and abort the reboot by doing:
start/run , in that box type in “cmd” (no quotes)
This should bring up a black command box window
In that window type “shutdown /a” this means abort the shutdown.
Once that is done, Go to this website.
Download the sdat5957.exe to a known location, such as c:\ (root of your c drive)
in the black command window you opened earlier type in “c:\sdat5957.exe /F “ this will run the patch with a “Force downgrade” to the old version of McAfee Dat.
The bad version is 5958, so once 5959 is released, that will probably be a better option that installing 5957, but again it isn’t released yet.
If you can’t go to that website on your PC due to network issues, use a second computer to download the update to a USB key or burn to a blank CD and then use that on your effected PC to repair it.
I hope they publish the SuperDat for 5959 soon.
It’s really nice that windows has a 20 character username limit, but when you create one longer it doesn’t warn you, it just happily accepts the username that will never work. Yes there is a registry hack to allow windows to allow longer names, but cmon..put some intelligence into ADUC..
Since you can’t modify unattend.xml or really do anything else to keep it enabled through sysprep (which auto disables it)
Create SetupComplete.cmd (make sure no hidden extensions like SetupComplete.cmd.txt)
Edit that file, put this in there:
net user administrator /active:yes
(optional change password)
net user administrator new_password
After Windows is installed, but before the logon screen appears, Windows Setup searches for the SetupComplete.cmd file in the %WINDIR%\Setup\Scripts\ directory. If a SetupComplete.cmd file is found, the file is executed. Otherwise, installation continues normally. Windows Setup logs the action in the Setupact.log file.
That should do it, now that Template should keep local admin enabled even after sysprep.
So when the Lab Manager documentation people say don’t modify the unattend.xml, they mean it. No matter how much you modify it, the changes never make it into sysprep.
After Creating Templates for XP 64 SP1(base) and SP2 in our lab manager install, some percentage of the time after you deploy these templates you can not long into these VM’s, you type in your password, and you are re-prompted for the password. If you type the wrong password it tells you, but the right password just causes a re-prompt. After much troubleshooting I found we were not following the best practice of having machines ready to be sysprepped with the Administrator password blank. After making that change everything works again. To get the standard password back into the VM, we modified the “c:\Program Files\VMware\VMware vCenter Lab Manager\Tools\CustomizeGuest\Windows\Sysprep\WinXP_64\sysprep.inf” (that your never supposed to modify according to VMware, but i’m a rebel) from: AdminPassword="*” (* means bank) to:AdminPassword="password”
“The connection cannot be completed because the remote computer that was reached is not the one you specified. This could be caused by and outdated entry in the DNS cache. Try using the IP address of the computer instead of the name”
I saw this error connecting from Windows 7 Desktops to a specific 2008 Server in the same domain. The problem was that the clock on the server was wrong. Once I updated it, it all works again.
Lab Manager is one of my favorite technologies in the market today, but before you install, beware of the limitations!
i. 8 ESX hosts max to connect to VMFS3 Datastore (each LUN), you can use NFS to get around this, but for our use case, this is not performant enough.
ii. 2TB vmfs3 size limit, and don’t start there, we started at 1.2TB Luns so we could expand as needed. Avoid SSMOVE if at all possible (SSMOVE is slow and painful, but works well), if you fill up your Lun, create the extend and/or disable templates and move them to a new datastore.
iii. Only available backups are SAN Snapshots (well the only realistic one for us), and for this to be useful, see #1 below
1. Recommended to put vCenter & Lab Manager Servers on VM’s inside cluster on SAN with the guests
iv. vCenter limits
1. 32 bit has max of 2000 deployed machines and 3000 registered
2. 64 bit has max of 3000 deployed machines and 4500 registered
Best Practices & What we’ve learned
i. Make Luns 10x the size of your Template(s)
ii. Shared Storage is generally the first bottleneck.
1. I used all Raid 1+0 since we observed this on our first LBM deployment and our application is database driven (disk I/O intensive)
iii. We have averaged between 80-100 VM’s per blade, so this means our LBM 4.0 environment should top out at approximately 32 hosts (2 full HP c7000’s) (one cluster, otherwise you lose the advantages of Host Spanning Transport Networks)
iv. LOTS of IP addresses, I recommend at least a /20 for LBM installs = 4096 IP’s, you do not want to have to re-IP lab manager guests, we’ve done that before.
v. Create Gold Master Libraries for your users, helps prevent the 30 disk chain limit from being hit as often.
vi. Encourage Libraries, not snapshots
vii. Do not allow users to create\import templates, Export Only
viii. Do not allow users to modify Disk, CPU or Memory on VM’s.
ix. Storage and Deployment leases are the best thing since sliced bread. Recommend between 14-28 days for both.
x. Train your users, we even went as far as to have two access levels, one for trained, one for untrained, so the untrained are less dangerous, and if they want the advanced features it forces them to get training.
If using SQL 2005/2008 then setup two databases VirtCent & VUM before installing vCenter, they can share a DB, but VMware recommends against it for performance reasons.
When Installing, your ODBC may ask for a 32bit DSN, use this to configure that “Data Source”
After allot of troubleshooting and a very good suggestion in an email from someone reading my blog, I have a fix and a better understanding of the issue. When you update from Firmware CD 8.6 to 8.7, the version of the Broadcom Nic goes from 2.2.2 to 2.2.4, however, more specifically, there are two subversions of code that make up 2.2.2 and 2.2.4. There is “bootcode” and “iSCSI”, when you do an update from 2.2.2 to 2.2.4, you only update “bootcode” from 4.8.0 to 5.0.11 and not “iSCSI” since the version theoretically didn’t change from 3.1.5 to 3.1.5. This seems to be the issue. If while performing the FW update, you manually rewrite-reapply the iSCSI as well, then the firmware update applies, and everything works properly. If you are already on 2.2.4, you can just reapply just the iSCSI or both, either works well.
In order to successfully update your ISCSI and have a working firmware, do the following:
1) Boot from the HP 8.7 firmware CD:
Choose “Installation Options”, then click “Click here to enable install options”, then click “Allow Rewrites”, then of course “OK”.
It will warn you that you must manually select NICS and options when doing NIC firmware rewrites and downgrades. Click “OK”.
Now under “HP NC-Series Broadcom Online Firmware Upgrade, Choose “Select Devices”, When you see the screen with your NICs in it, choose “Device Details” for your first NIC in the list"
You will then see a screen like this below, Select the “iSCSI” box to reapply 3.1.5
Click “OK” then Select your 2nd device, and repeat 8 times.
After that, reboot and everything should be upgraded and working properly.
The Exact error is “The RAID group being selected (RAID 1/0, RAID 1) includes disks that are located both in Bus 0 Enclosure 0 as well as some other enclosure(s). This configuration could experience Rebuilds of some disk drives following a power failure event, because Bus 0 Enclosure 0 runs longer than other enclosures during a power failure event. An alternate configuration where the disks in the RAID group are not split between enclosures in this manner, is recommended.”
The fix was to update my Flare version and everything now works normally.
1) Plug into the HP OBA Serial port, 9600 8/N/1.
2) connect interconnect <bay number>
2b) reset switch, during boot press "CTRL+c"
3) dir bootflash:
4) boot “kickstart_image” i.e. m9100-s2ek9-kickstart-mz.3.1.2a.bin
5) config t
6) (in config) admin-password <new complex password>
8) dir bootflash:
9) load bootflash:”system image" i.e. m9100-s2ek9-mz….(not kickstart again like the broken HP instructions tell you)#note they have fixed this document in the last week.
10) login (fyi, the default username is “admin” not “admin123” like cisco.com states)
11) config t
12) (in config) snmp user admin auth md5 <new complex password>
(you need step 12 to use Cisco Device Manager to complete setup)
13) int mgmt 0
14) ip address <ip><mask>
16) ip default-gateway <dg-ip>
18) copy run start
19) (bonus info) if you want to leave the current bay to setup the redundant switch, hit CTRL+SHIFT+_ to exit back to the OBA so you can start over by connecting to the redundant bay.
My BL490c G6 blades were unable to get network connectivity to our chassis. I had setup all the networking in HP Virtual Connect, but 6 blades worked, and 9 did not (we only purchased 15, not 16, don’t ask)
The error I received was different depending on the OS I was loading.
ESX 4.0 Update 1 gave this error: “The script 32.networking-drivers failed to execute and the installation can not continue.”
ESX 4.0 (no update 1) gave this error: Network ports Disconnected.
I was able to isolate this to a blade specific issue by swapping blades 4 and 6 and the issue followed the blades, not the bay. There are 4 types of firmware on these HP blades that I can see. BIOS, iLo, Qlogic(HBA), BC (nic). The UI shows everything but the NIC firmware version. All other blades with the latest Nic BIOS do not have any active network ports. If I downgrade them from 2.2.4 to 2.2.2(more specifically the Bootcode from 5.0.11 to 4.8.0), then they work(problem solved). It’s a real pain to see the HP Nic Bios since it only shows up when booting from the HP firmware CD. I do have the latest firmware on my c7000 chassis and flex10 switches. Just to verify this was a blade specific issue, I also swapped hard drives from a non working blade to a working blade, the issue followed the blade again.
I think I have this solved, I’ve “fixed” 5 blades so far by downgrading them, the only way I can find to downgrade the NIC bios is from the HP firmware boot CD, 8.7 has 2.2.4 and 8.6 has 2.2.2. There is a version 2.2.3, but best I can tell it’s not easy to install it, it must be done from inside a guest OS(no bootable CD) and ESX 4 Update1 does not install because of this error, I’d have to install another OS, then change FW version, then reload the blade.
I have an open HP case to get this resolved, hopefully we can get an updated firmware soon.
I was unable to definate a network set, or more accurately unable to define uplinks, the option simply wasn’t there. Turns out when they say IE8 isn’t supported, they aren’t kidding. I jumped on my ancient XP box running IE6 and suddently it works.
After months of waiting to get all the parts and pieces together, I believe I finally have them all after getting my HP GBIC’s.
Here is the basic Setup
HP c7000 with 16 BL490c Blades, Dual Quads with 72GB Ram each
Dual Flex-10 and Dual Cisco MDS 9124 blades
EMC CLARiiON CX4-960 with 15k 450GB drives
And here is the part i’ve been waiting so long for…