Monday, April 27, 2009

Ok, I think I finally have a work around to the BL460C ASR reboots and ILO error 57's

Reading a couple articles, it appears with the latest HP Proliant Support Pack (8.2), they have screwed the pooch and broken ILO to a point that it will cause Random Reboots, ASR's and other things. This seems to only effect 64 bit Windows Systems.

Here is what i've found...

HP ProLiant Integrated Lights-Out Management Interface Driver for Windows Server 2003/2008 x64 Editions

Latest = 1.14.0.0
Stable = 1.13.0.0

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=3709945&prodSeriesId=1842750&swItem=MTX-b0749333be7a4336a9957e40eb&prodNameId=3288156&swEnvOID=1113&swLang=8&taskId=135&mode=5

and

HP ProLiant iLO 2 Management Controller Driver for Windows Server 2003 x64 Editions

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=3709945&prodSeriesId=1842750&swItem=MTX-b016a4092d95486b88c4ebe86d&prodNameId=3288156&swEnvOID=1113&swLang=8&taskId=135&mode=5

Latest = 1.11.0.0
Stable = 1.8.0.0

Here is where I found it:

http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1240871128552+28353475&threadId=1323879
and
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1332008

and the fun keeps going

Well my previous post didn't solve the issue, looks like the issue is with the
"HP iLO Management Channel Interface Driver"

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=3709945&prodSeriesId=1842750&swItem=MTX-b016a4092d95486b88c4ebe86d&prodNameId=3288156&swEnvOID=1113&swLang=8&taskId=135&mode=5

and i'm not the only with with an issue

http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1240868410367+28353475&threadId=1323879

I am going to try and go back on the version to 1.13 and see if that stops the random reboots and the error ID 57's.

Friday, April 24, 2009

HP Blade Server rebooting for no apparent reason

Just a bit less than a week after applying an HP Proliant Support pack to our BL460c G5 blades one of them running Windows 2003 begain rebooting randomly. The only real errors I could find were in the System log. These errors are about Event Source: hpqilo2 with an Event ID: 57. They had to do with Timeouts causing a ASR (reboot)

Description: The system has rebooted from a Automatic Server Recovery (ASR) event.
ProbableCause: 111 0x6f (Timeout)
ProbableCauseDescription: "ASR Reboot Occurred"

I did a full hardware swap since everything I found on google pointed to hardware. After that didn't resolve anything, I found that there is a new driver for HP ProLiant iLO 2 Management Controller Driver available. One of the known fixes is
"Resolved a problem where system could spontaneously reboot (ASR) if all CPU's were under continuous 100% load, and iLO 2 was reset (e.g. due to firmware update, changes to network settings, etc.)."

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=3709945&prodSeriesId=3808910&swItem=MTX-b016a4092d95486b88c4ebe86d&prodNameId=3808911&swEnvOID=1113&swLang=13&taskId=135&mode=4&idx=2

This looks like it could be our issue, only time will tell, but if history is my guide then I should see within 48 hours if this machine will be stable or keep blowing up.

Thursday, April 16, 2009

MAC ARP Poisoning, The case of the missing response packets ISA

Yesterday my network was subject to one of the most difficult to trace network problems i've seen. Basically at 12:45pm, the "internet died", and I rebooted our ISA server, it worked again. About 20 minutes later, it died again, this happened about 2-3 times more, I knew we had a major issue. Microsoft ISA Server is one of microsoft's best products, and especially ISA 2006 SP1 is very reliable. I put a packet trace outside my ISA firewall, it showed packets leaving my network AND returning, however, my ISA server reported that the packets were leaving, and NEVER returning. VERY ODD, ISA doesn't lie. We did the normal replace hardware, we even swapped out a router, switches and the ISA server hardware to no avail. This was very perplexing. Finally after 14 hours of wanting to tear my hair out, we found something in a packet trace. I had captured a ping of google.com when stuff worked, and when it wasn't working. This was just outside our ISA server(which is a back firewall, not a front one)but on the other side of our router (but this router doesn't do any packet filtering, so I ignored this *bad idea*). We noticed that packet responses LOOKED the same, but when we dug deeper we saw the MAC address on the response packets were different, but with the same correct IP. One Mac was accurate when it worked, and different when it wasn't. Someone had created a machine with the same IP Address of our router outside our ISA firewall. This was causing return packets from the internet to be misdirected to a server, and not to our router. So ISA wasn't lying, and doing some Mac Table lookups showed me which switches and ports to chase until I found the culprit rogue machine. After powering it down(and disconnecting the ethernet cable) all the problems stopped and I could finally go home at 3AM.
God I love IT.

Tuesday, April 7, 2009

How to Query Machine owner (registered windows) from a command prompt

REG QUERY "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion" /v RegisteredOwner

REG QUERY "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion" /v RegisteredOrganization