Tuesday, January 26, 2010

Helped HP trace down a firmware issue today

My BL490c G6 blades were unable to get network connectivity to our chassis.  I had setup all the networking in HP Virtual Connect, but 6 blades worked, and 9 did not (we only purchased 15, not 16, don’t ask)

The error I received was different depending on the OS I was loading.

ESX 4.0 Update 1 gave this error: “The script 32.networking-drivers failed to execute and the installation can not continue.”

script 32.network error

ESX 4.0 (no update 1) gave this error: Network ports Disconnected.

HP esx4

I was able to isolate this to a blade specific issue by swapping blades 4 and 6 and the issue followed the blades, not the bay.  There are 4 types of firmware on these HP blades that I can see.  BIOS, iLo, Qlogic(HBA), BC (nic).  The UI shows everything but the NIC firmware version.  All other blades with the latest Nic BIOS do not have any active network ports.  If I downgrade them from 2.2.4 to 2.2.2(more specifically the Bootcode from 5.0.11 to 4.8.0), then they work(problem solved).  It’s a real pain to see the HP Nic Bios since it only shows up when booting from the HP firmware CD.  I do have the latest firmware on my c7000 chassis and flex10 switches.  Just to verify this was a blade specific issue, I also swapped hard drives from a non working blade to a working blade, the issue followed the blade again.

I think I have this solved, I’ve “fixed” 5 blades so far by downgrading them, the only way I can find to downgrade the NIC bios is from the HP firmware boot CD, 8.7 has 2.2.4 and 8.6 has 2.2.2.  There is a version 2.2.3, but best I can tell it’s not easy to install it, it must be done from inside a guest OS(no bootable CD) and ESX 4 Update1 does not install because of this error, I’d have to install another OS, then change FW version, then reload the blade.

I have an open HP case to get this resolved, hopefully we can get an updated firmware soon.

25 comments:

Kurt Johannessen said...

Hi Brian

1. The FW 4.0.8 for Broadcoms are on HP FW CD 8.50. Notice before you apply VC profiles to the 10 GB nics you only have 2 Nic's to downgrade. After I made a full profile with 8 Nics the HPSUM in FW CD 8,50 shows 8. I only had to roll back the first 2 NICS in list to get all 8 back to old FW version


2. There is an updated driver for the Broadcom NIC's at Vmware (I am still testing these on HP ESXI 4,0 Update 1a) see http://downloads.vmware.com/d/details/esx_40_broadcom_bnx2xu1_dt/ZHcqYmRqcGhiZGVqdA==

Notice it is documented in the Vmware HCL see
http://www.vmware.com/resources/compatibility/search.php?action=search&deviceCategory=io&productId=1&advancedORbasic=advanced&maxDisplayRows=50&key=NC532i&release%5B%5D=34&datePosted=-1&partnerId%5B%5D=41&ioTypeId%5B%5D=6&manufacturer%5B%5D=11&vid=&did=&svid=&ssid=&rorre=0


So far I am monitoring my test system BL460c G5 as I had 1 ASR entry just after doing the update.

3. The HP Firmware Update bundle ver 1.70 for blades containt FW 5.0.11 for the Broadcoms so be aware !!

4. You can only update ddrivers on ESXi once the system is installed so expect to roll back some NIC's until a new ESXi image is available from HP or Vmware

4. The issue also happens on Citrix Xenserver 5.5 I have located no updates and will properly have to compile a new driver myself for this. As I can deduct the new FW changes the PC ID's so the issue is universal you have and old driver installed that does not know the new ID's the driver does not load pure PnP ( aka Plug and Pray)

Keep me updated on howe you support case with HP pans out

Cheers

Kurt Johannessen

Sameer Elayodath said...

Hi Brian

I faced this issue last week, and I have narrowed down the issue a little more.

The firmware updates of the blades were done before applying the VC Profiles. So for the server, it has two NICs during this time. I applied the firmware, then applied the VC profile and installed ESX. It shows the same problem during installation.

I found that, if you reinstall the same firmware after the VC profile is applied, it solves the problem.

So, to resolve the issue, I made the rewrite option 'enabled' in firmware update window, and applied the same version of Firmware again. (Mine was Broadcom 2.2.4 in FW CD 8.70). It worked perfectly, and I did not have to revert back to an older firmware.

Cheers,
Sameer

rehnmark said...

We are having the same problem with our bl460c G6 servers.
Like Saamer said, it works if you reflash the firmware after you have applied the VC profile. But i noticed that if you remove the bladeserver from the chassi and put it back in, you get the same problem again. And the only thing to do is to flash the firmware once again. This means you have to reflash everytime you remove the blade from the chassi. Pretty annoying :/

Ben said...

This worked for me too on 2 blades that had their system board replaced. HP BL 460 G6.
Thank you very much for the tip.

AspirantVDX said...

Thanks for posting this. I was already pulling my hair out.

Same problem for BL495c G6 and virtual connect.

I followed Sameer's intstructions to rewrite the current firmware, 2.2.4.

Rebooted with the ESX installer CD and...PROBLEM SOLVED :-)


You might have guessed, I'm very happy with this.

thanks

Sameer Elayodath said...

Guys!

Finally I found the permanent issue resolution.

Download the driver update in the following URL adn update it!

http://www.vmware.com/support/vsphere4/doc/drivercd/esx40-net-bnx2x_400.1.48.107-1.0.4.html

You can install the update during ESX installation or you can update the same after the ESX is installed.

Cheers and enjoy!
Sameer

Brian Smith said...

Very cool, i'll have to give your fix a try, plus I know there was a new firmware posted a week or two ago, I haven't tried that yet either.

shukae said...

This worked for me, used this driver:

Download VMware ESX/ESXi 4.0 Driver CD for Broadcom NetXtreme II Ethernet Network Controllers
Description This driver CD release includes support for version 1.52.12.v40.3 of the bnx2x driver on ESX/ESXi 4.0. The bnx2x driver supports products based on the Broadcom NetXtreme II BCM57710/BCM57711/BCM57711E 10/100/1000/2500/10000 Mbps PCIE Ethernet Network Controllers.
Notes While the 1.52.12.v40.3 version of the driver for Broadcom NetXtreme II Ethernet Network Controllers supports Flex-10 capable devices, this version of the driver is not supported in an HP Virtual Connect environment with Flex-10 enabled. Testing of this feature is underway and, once complete, we will update this download location with further details. In the meantime, customers should continue to use the driver shipped with ESX/ESXi, which does not support the DCC/SmartLink functionality.
Version 1.52.12.v40.3
Build Number 223054
Release Date 2010/01/21
Type Drivers & Tools
Compatible with ESX/ESXi 4.0
Language Support English
Components This download contains the following components. Hide Details
VMware ESX/ESXi 4.0 Driver CD for Broadcom NetXtreme II Ethernet Network Controllers
File size: 4.6 MB

Caleb said...

I went ahead and did two test cases to isolate the issue to the VMware driver on our BL490c G6 blades.

Blade 1: Updated to newest Firmware from HP Smart Update Firmware. This updated the NIC Bootcode from 5.0.11 to 5.2.7 but left the iSCSI at 3.1.5. After this the machine will boot most of the time after the blade is removed from the chassis, however after several reboots there are seemingly random times when ESX will not load the NIC and you get the "No compatible network adapter found" error when attempting to boot ESX. After installing the new drivers found at "http://www.vmware.com/support/vsphere4/doc/drivercd/esx40-net-bnx2x_400.1.48.107-1.0.4.html" using the esxupdate command it works fine. I have rebooted many times and removed the blade and am no longer able to get the error.

Blade 2: I left the firmware alone and just updated the VMware driver to the latest from the above URL. After several reboots and physically removing the blade from the cabinet twice I am unable to get the error.

Conclusion: Looks like a VMware driver issue; however updating the HP Firmware on the NIC from 5.0.11 to 5.2.7 does help the issue taking it from happening every time to approximatly %50 of the time.

Mike said...

Interesting thread; I have the same issues.
Having updated firmware with HPSUM v9 and set up FlexNICs in VC, I experienced this problem.
I therefore forced a re-flash of the 8 FlexNICs, and on reboot, got a horrible red screen of death with "illegal Opcode".

Mike

Kalle said...

Hi

We've stumbled on something similar but on HP BL685c G6 in c7000 G2 enclosures, with VC modules.

We are trying to install ESXi 4 U1 but get an error message stating that "No compatible network adapter found".

The error appears just after the installation says that the NIC drivers were loaded successfully and before we get the chance to choose between
(ESC) Cancel (R) Repair (Enter) Install


We have cases open with both VMware and HP.

Regards,
Hackim

AspirantVDX said...

New patches released yesterday contain an update for the ESX 4.0 bnx2x device driver.

version: 1.45.20-2vmw

Has anyone tested this one? Does it sove the problem decribed here?

regards
K.

Jay Rogers said...

Guys, thanks so much for this info. HP and VMware support were no help on this. Question is, is this an HP hardware/Virtual Connect problem or fixed with vsphere 4.1?

I did the just really firmware update CD again and it fixed my issues. New to HP but don't see options for downloaded these older releases of firware update CDs on HP site.

OneDay said...

Hi Everyone,

I have a very unique issue I have been struggling with all week in regards to iSCSI. There may be a very simple fix but I am stumped and HP has offered zero support.

We have three 685 blades in a C7000 enclosure. Bay 1 and Bay 2 have the Flex 10 Virtual Connect switches in them. All three hosts are running ESXi v4. The NICs within the 685 series is the Broadcom BCM57711. All NICs can be seen within the ESXi hosts perfectly.

We bought a HP LeftHand storage (iSCSI) to complete our VM solution.

The issue we are faced with is none of the 8 NICs being presented are seen as a storage adapter. Although HP has stated within Windows and Linux, the driver will allow the NIC to act as an iSCSI initiator, we have not seen this within ESXi. We are left with the VMWare iSCSI software adapter only and 8 NICs.

Obviously our goal is to segment off the storage traffic and dedicate redundant paths.

I have accessed the Broadcom BIOS and made every adjustment humanly possible to make this work. I also am not attempting to boot from iSCSI as I know that is not supported with Virtual Connect in an ESXi implementation.

Any guidance would be greatly appreciated.

Respectfully,
Mike

Brian Smith said...

We have not seen the issue yet with ESX 4.1, but that doesn't mean its completely gone, it was intermittant before, but so far..so good.

Reddrinker said...

Hi,
I am needing to downgrade the driver in esx 4.1 from 1.54 to the 1.48 version. Does anyone know how to do this? esxupdate only updates, not downgrades

Any replies would be grateful

Thanks

Casper42 said...

Just FYI, if your NICs dissapear after a reboot with the 460/490 G6 and ESX 4.0u1/u2, you can get them to come back at least temporarily by booting to the HP Firmware Maintenance CD, going into Interactive mode and then simply exiting back out. This causes a reset at a lower level which brings the NIC out of its hung state.

Then you can get the machine up and running long enough to upgrade the NIC Drivers to 1.52 which resolves this problem.

When 4.0u3 comes out, it should have the fixed driver inside.

However, 4.1 used a different driver codebase and this fix was lost in 1.54? that comes included with 4.1. In that case you need to wait for 1.60 which is supposed to be out very soon.

PDMeat said...

I'm having a similar issue with an HP Bl460c G7 and VMWare Esxi 4.1: the "no compatible network adapter found" message on trying to install ESXi to local storage.

Everything in on the VMWare 4.1 HCL.

A call with HP is going nowhere. After reading this, I'm thinking I'm stuck with some sort of driver problem.

Argh.

Brian Smith said...

PD are you sure you presented NIC's to the server in Virtual Connect?

Brian's Rants and Raves said...

@PDMeat I'm having the exact same issue at a client site today. Was wondering if you've resolved this issue yet. bl490c g7 with esxi 4.1 with integrated HP agents.

@brian I liked where you were heading with the question about whether any profiles were assigned yet in Virtual Connect. My answer is yes, they were, and I'm still having the issue. I've tried both ways...with no server profiles assigned and with, and both yield the same results.

Tomorrow I'll try the latest firmware DVD (9.10c) and see if I get lucky.

Brian's Rants and Raves said...

The resolution is to go grab the september 2010 version of the installer from HP's SW Depot. I had the July 2010 one, and that didn't contain the ServerEngine 10g NIC drivers in it.

Jay Rogers said...

I would highly suggest the new 1.60 nic driver at vmware site for those with virtual connect. It fixed failover and not getting network stats for me.

ESXI image for 4.1 is also available now.

PDMeat said...

Brian, I tried the latest VMWare esxi 4.1 installable (9/10, 583772-007.iso) and it made no difference.

HP VM support claims they can't reproduce the problem yet I'm 100% on the VMWare HCL for hardware and firmware and using the latest 4.1 installer from HP.

I found the below tidbit on VMWare's site and it references some slightly different server model (same *relative* hardware release) but detail what I think the problem is- the nic driver.

Perhaps more to the point, I just throw a pair of drives in a DL380 G7, VMware esxi 4.1 installed fine. I took the pair of drives and threw them in my BL460c G7 and it boots up fine and just says no compatible nic found.

So as a workaround, I may just use this installed ESxi OS and try and load the nic drivers via USB etc and see if I can fix it (or drop the installer on a USB stick and add the drivers there).

regards,

Pete

Resolution
The NIC drivers that are shipped with the affected servers are not included in the current build of ESXi 4.1.

HP has an advisory about this issue. For more information, see http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c02476622&jumpid=reg_R1002_USEN.

PDMeat said...

We finally fixed this problem! HP wasn't able to reproduce it and we had a tough time ultimately because the problem was the BIOS.

Another tech of mine ran smart start and did the "delete server configuration" to clear CMOS, Array settings etc and ultimately only that worked.

Even though the NICs worked fine in windows server 2008 which we installed twice, somehow clearing the server configuration with smartstart did the trick.

CC said...

I'm also struggling to get Bl460c G7 servers in c3000 enclosures to talk to P4500 using the iSCSI HBA.

PDMeat, I've booted off the SmartStart 8.7 DVD but can't find anything to reset the CMOS. Can you ask your tech for exactly what he did?

Thanks

CC