Wednesday, September 24, 2008

ESX HA Errors

When trying to setup HA, I received the error "An error occurred during configuration of the HA Agent on the host." Looking deeper under "Tasks & Events" I was able to see the error was that the "configuration of the host IP address is inconsistent on host ESX1.local.com: address resolved to 192.168.x.x and 172.168.x.x"

I looked in
/etc/hosts
/etc/FT_HOSTS (didn't exist for me)
/etc/sysconfig/network
/etc/vmware/esx.conf
checked all my DNS entries, even made sure they were case sensative (someone elses blog said that mattered)
I did a hostname -i and a hostname -s, all returned good results.

I also read that your /etc/hosts file must be

IPaddress (x.x.x.x) then Hostname, then FQDN, in that order, and contain all ESX hosts, I tried that, it didn't help.

I did a /opt/vmware/aam/bin ./ft_gethostbyname, it gave me an incorrect result, for a service console that no longer existed.

Then after hours of pulling my hair out
I then looked in the /etc/opt/vmware/aam/FT_HOSTS , it had old bad information.

I deleted that file, then disabled HA on the cluster, then re-enabled it and FINALLY HA installed properly.

Tips for Common ESX HA Errors

Check to make sure DNS is configured properly
Check to see if you can resolve DNS
Check DNS records
Make sure you are using FQDN’s
Check your /etc/hosts file
Make sure your using lower case
Check Service Consoles have the same names and networks
Disable and Re-enable HA
Select Reconfigure for HA on the ESX host

Out of these troubleshooting tips the most common problem with HA is a DNS issue, so it is best to start troubleshooting DNS first.

I got this from here,
http://www.holy-vm.com/2008/09/17/troubleshooting-tips-for-common-ha-errors/

Thanks for the info..it helped me.

Monday, September 22, 2008

Notes to make the openfiler 2.2 instructions below work (the "simple stuff" he left out)

First off, I am using openfiler 2.3, not 2.2

Lets start with some notes:

1) if you use two nics, I like to use Balance lb
2) Jumbo frames, for some reason, make my NFS a little slower..not sure why yet.
3) iostat -x is a great way to find out your I/O performance.
4) make sure to open NFS Client on your ESX firewall
5) Setup your "host access" at the bottom of the Network Access Configuration page BEFORE you setup the cluster, do it early when you setup the network, otherwise it doesn't work later (permissions)
6) if you skipped 5, just chmod the /cluster_metadata/opt/openfiler/etc directory and files in it so the openfiler GUI still works for all pages AFTER the cluster setup.
7) The guy who wrote this help file is a genius, however, he keeps using the volume group name vg0drbd and vg0_drbd interchangeable, but you need to use one or the other, his way it don't work.


Added Instructions
1) before you being, you must setup your two new drives/partitions that will run the "cluster_metadata" and "vg0_drbd" volumes.
to do this, use fdisk /dev/sda(or_whatever) you will have this
1 = Boot (whatever size it says i do)
2 = OS (2GB min)
3 = Swap (twice ram min)
4 = you must setup, 300Meg+ required.
5 = Data, as much space as you can get.

Since fdisk only allows 4 partitions, the 4th partition needs to be an "extended" then create the two partitions inside the "extend"

2) You need to create your meta data for your volumes. Do this after the ha.cf stuff, before you start the drbd service
(on both nodes)
drbdadm create-md vg0_drbd
drbdadm create-md cluster_metadata

3) after you setup your cluster.xml, make sure you fix the following links.
rm /cluster_metadata/opt/openfiler/sbin/openfiler
rm /cluster_metadata/opt/openfiler/etc/httpd/modules
ln -s /usr/sbin/httpd /cluster_metadata/opt/openfiler/sbin/openfiler
ln -s /etc/httpd/modules/ /cluster_metadata/opt/openfiler/etc/httpd/modules

4) before you can start the heartbeat, you must/need to create your Volume inside your volume group, I also setup my share at this time.

A troubleshooting note, sometimes the 'openfiler' GUI (webpage) is dead after a node reboot, make sure one of your nodes mounts the data, i.e. somebody is the primary node, then the GUI should work again.

Openfiler/software linux raid misc commands and pieces of info

Openfiler/software raid:

If you have a software raid /dev/md0, even if you sub-partition it, you cant use those for openfiler, software raids can't be partitioned(and work).

Also, if you rebuild your openfiler, and for some reason your disks are now type "gpt" you may not be able to add them to a raid, use these commands

parted /dev/sdb
then type
mklabel msdos

do that for each of your drives to get them back to msdos type so you can re-raid them. Also be VERY CAREFUL with that command as it will nuke/wipe any drive you use it on, and by default if you don't specify the 'parted' drive, it defaults to your first/boot drive..and blamo, your back with a CD reinstalling the OS...not that I know...from experience....

To Zero out a partition, here is the command
dd if=/dev/zero of=/dev/hdc3

Friday, September 19, 2008

Setting up Jumbo Packets with ESX 3.5

First off, thanks to another blog

http://blog.scottlowe.org/2008/04/22/esx-server-ip-storage-and-jumbo-frames/

But there is one thing left out, i'll put it at the end.

Configuring ESX Server
There is no GUI in VirtualCenter for configuring jumbo frames; all of the configuration must be done from a command line on the ESX server itself. There are two basic steps:
Configure the MTU on the vSwitch.
Create a VMkernel interface with the correct MTU.
First, we need to set the MTU for the vSwitch. This is pretty easily accomplished using esxcfg-vswitch:
esxcfg-vswitch -m 9000 vSwitch1
A quick run of “esxcfg-vswitch -l” (that’s a lowercase L) will show the vSwitch’s MTU is now 9000; in addition, “esxcfg-nics -l” (again, a lowercase L) will show the MTU for the NICs linked to that vSwitch are now set to 9000 as well.
Second, we need to create a VMkernel interface. This step is a bit more complicated, because we need to have a port group in place already, and that port group needs to be on the vSwitch whose MTU we set previously:
esxcfg-vmknic -a -i 172.16.1.1 -n 255.255.0.0 -m 9000 IPStorage
This creates a port group called IPStorage on vSwitch1—the vSwitch whose MTU was previously set to 9000—and then creates a VMkernel port with an MTU of 9000 on that port group. Be sure to use an IP address that is appropriate for your network when creating the VMkernel interface.
To test that everything is working so far, use the vmkping command:
vmkping -s 9000 172.16.1.200
Clearly, you’ll want to substitute the IP address of your storage system in that command.
That’s it! From here you should be able to easily add an NFS datastore or connect to an iSCSI LUN using jumbo frames from the ESX server.

when doing the excfg-vmknic I got the following error:
"Error performing operation: A vmkernel nic for that portgroup already exists: PortGroupName"
I did a esxcfg-vmknic -d PortGroupName
Then you can follow the instructions as written.

========THE SHORT VERSION=========
The Setup...
esxcfg-vswitch -m 9000 vSwitch1
esxcfg-vmknic -d PORTGROUPNAME
esxcfg-vmknic -a -i 192.168.1.10 -n 255.255.255.0 -m 9000 PORTGROUPNAME (set your ESX IP on a portgroup)

To Make sure everything took......
esxcfg-vswitch -l
esxcfg-vmknic -l
vmkping -s 9000 192.168.1.20 (another host running jumbo frames)

Tuesday, September 16, 2008

Free ESX / Compliance Checker Tool

Configuresoft announced a free ESX Compliance tool that checks for the ESX hardening guidelines & the CIS Benchmarks for ESX.

You can download the tool here, the only limitation is that you can only scan 5 ESX hosts at a time, but you can print and save the results.

http://www.configuresoft.com/compliance-checker.aspx

Sunday, September 14, 2008

2008 boot.ini or lack thereof

I was trying to add PAE so I could use my 8GB of ram with my 32 bit procs and run 2008 in 32 bit mode with all my ram, but there is no boot.ini in 2008.

The Solution, use bcdedit.

1. BCDEdit /set nx AlwaysOff (kills DEP)
2. BCDEdit /set PAE forceenable (Enables PAE)
3. Reboot