Friday, June 10, 2011

If your running HP Virtual Connect, you better upgrade to 3.17

It’s hard for me to believe that DNS settings on your Flex10 will cause HP virtual Connect Manager to die, but apparently it’s true.  We had this issue, one of our Flex10 adapters went offline, causing our ESX hosts to go into Isolation mode, causing our guest VM’s to all power down.  I am not a happy camper with HP right now.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02720395&lang=en&cc=us&taskId=101&prodSeriesId=3540808&prodTypeId=329290

 

SUPPORT COMMUNICATION - CUSTOMER ADVISORY

Document ID: c02720395

Version: 5

Advisory: (Revision) HP Virtual Connect - Virtual Connect Manager May Be Unable to Communicate (NO_COMM) if DNS Is Enabled for Virtual Connect Ethernet Modules

NOTICE: The information in this document, including products and software versions, is current as of the Release Date. This document is subject to change without notice.

Release Date: 2011-04-14

Last Updated: 2011-04-14


DESCRIPTION

Document Version

Release Date

Details

5

04/14/2011

Added VC firmware v3.17 availability, VCEM clarification and an OA Customer Advisory reference to the Resolution section. Also, added clearer guidance to customers on when to perform the resolution and explanation of VC network stability in an intermittent DNS environment

4

04/07/2011

Updated Description to include an error message that may be seen when this issue occurs.

3

03/07/2011

Added clarifications to the three scenarios described in the Resolution section to ensure the full sequence of steps is followed.

2

03/04/2011

Added additional details regarding the circumstances in which the issue may occur. Also, added three different workaround scenarios depending on whether Enclosure Bay IP Addressing (EBIPA) or external DHCP is being used and the version of OA firmware that is in use.

1

02/14/2011

Original Document Release.

The HP Virtual Connect Manager (VCM) may not be able to communicate (NO_COMM) with Virtual Connect (VC) Ethernet modules in an HP BladeSystem c-Class enclosure or multiple enclosures that are part of the same Virtual Connect Domain.

IMPORTANT: Due to the possibility of a VC network outage, HP recommends that the customer follow the Resolution below as soon as possible.

The NO_COMM state may occur in a new or an existing environment when a VCM Administrator attempts to perform any of the following tasks:

  • Firmware Update
  • Add/remove/reset server blades or Onboard Administrator (OA) modules
  • Retrieve any VC Ethernet module status and state information (e.g. stacking links, port statistics, etc.)
  • Add/edit/copy/delete/assign Server Profile
  • Add/edit/delete VC Network
  • Configure Port Mirroring
  • Restore Domain Configuration
  • Change SNMP Settings
  • Change Advanced Ethernet Settings
  • Executing the "Complete VC Domain Maintenance" command in Virtual Connect Enterprise Manager (VCEM)

IMPORTANT: Attempting to execute any of the above tasks during NO_COMM adds additional risk of a network outage during the recovery steps described below.

Customers particularly susceptible to this issue have VC Modules with management IP Addresses configured in the 10.x.x.x range and configured for DNS. When this problem occurs, the VC Manager will still be accessible, but all VC Ethernet modules in the domain will be displayed with an Overall Status of "No Communication." The Virtual Connect Domain will show a "failed" status, stacking links will show "failed" and Profiles and Networks will show a status of "Unknown." In addition, the following error messages may be displayed when clicking on Domain Status from the Virtual Connect Manager Web Interface or when issuing the VC CLI command "show status":

"The domain is incapable of managing its contained VC components"

AND

"The Virtual Connect Manager is unable to communicate with the module or the Onboard Administrator. Please ensure that the module has an IP address"

This occurs if DNS is enabled for the primary VC module. The VCM may initiate a DNS reverse lookup for a very limited scope of incorrect IP addresses for the VC Ethernet modules. If this reverse lookup fails, (i.e., it is not answered by the DNS infrastructure), the primary VC module will be able to communicate correctly with the VC Ethernet modules.

If the DNS infrastructure responds to this incorrect DNS reverse lookup, then VCM attempts to communicate with the VC Ethernet modules on this incorrect IP Address and fails, triggering a NO_COMM condition. Recently, the global DNS infrastructure began responding to these limited DNS reverse lookups.

While in the NO_COMM state due to the DNS issue, the customer will not experience a VC network outage and they will still be able to pass traffic. However, if DNS environment changes cause the system to regain communication, the VC network may experience a temporary VC network outage of a few minutes. Subsequently, if the system loses communication, the customer may experience a persistent VC network outage until communication returns.

SCOPE

Any HP Virtual Connect Ethernet Modules in a c-Class BladeSystem enclosure running VC Firmware Version 1.x, 2.x or 3.x (up to and including 3.15).

RESOLUTION

This issue is resolved with Virtual Connect Firmware version 3.17 ( or later). VC 3.17 is available as follows:

http://bizsupport2.austin.hp.com/bc/docs/support/SupportManual/c02774957/c02774957.pdf

As a workaround, disable DNS for the Virtual Connect Ethernet Modules in Enclosure Bay IP Addressing (EBIPA) or external DHCP. Removing DNS from VC Modules can potentially impact the following Virtual Connect features, if configured to use DNS names:

  • Directory Server Settings - If a DNS name is configured for the Directory Server Address then it will no longer be resolved. The IP address will need to be configured as the Directory Server Address.
  • SNMP Trap Destination - If a DNS name is configured for the SNMP Trap Destination then it will no longer be resolved. The IP address will need to be configured as the SNMP Trap Destination.
  • From the VCM CLI - Any URL targets provided to save backup configuration or support dump will need to use an IP address and not a DNS name.

The following three workaround scenarios depend on whether Enclosure Bay IP Addressing (EBIPA) or external DHCP is being used and the version of Onboard Administrator (OA) firmware that is in use:

IMPORTANT : In all three scenarios, use the default "Administrator" account when logging into the Onboard Administrator to make EBIPA changes. Otherwise, the OA network configuration changes may not be persistent if the configuration changes were made by a non-Administrator user account, as described in OA Customer Advisory c02639172

Scenario 1 - Enclosure Bay IP Addressing is being used to provide IP Addresses to the VC Ethernet Modules and the OA firmware version is 3.00 (or higher):

  1. Using the default Administrator account, log into the Onboard Administrator and select Enclosure Settings > Enclosure Bay IP Addressing.
  2. Select the "Interconnect Bays" tab and remove DNS server IP address entries from the bays that include VC Ethernet Modules and click Apply.
  3. Within 5 minutes, the DNS settings for the modules should update and normal module communication will be restored.
  4. It is important that no VC domain changes are made until the following steps are fully completed.
  5. If the Virtual Connect Domain is managed by Virtual Connect Enterprise Manager (VCEM):
    a) If any VC Domain from the impacted VC Domain group is currently in maintenance mode go to the VCEM user interface and click "Cancel VC Domain Maintenance". Note that cancelling maintenance mode will roll back any VC Domain changes that were made while in maintenance mode. Verify that all running and pended jobs are allowed to complete before proceeding to step b).
    b) Click the "VC Domains" tab, select the impacted VC domain and click "VC Domain Maintenance"
    c) Click "Make Changes via VC Manager". This will release control to the VCM.
  6. From the VCM GUI, select "Tools" => "Reset Virtual Connect Manager." This will force resynchronization of the modules if not synchronized in Step 3 above.
  7. If the Virtual Connect Domain is managed by Virtual Connect Enterprise Manager (VCEM): Go to the VCEM user interface and click "Cancel VC Domain Maintenance". Wait for the job to complete. Cancelling maintenance mode prevents unnecessary propagation of changes to other members of the VC Domain Group.

IMPORTANT : If the NO_COMM condition was present or detected during one of the VCM administrative update tasks (listed in the DESCRIPTION section above), VCM may automatically resynchronize the modules, which would create a temporary VC domain-wide network outage during VC module initialization in either Step 3 or Step 6 above (but not both). Outage time will vary depending on the size of the VC domain.

Scenario 2 - Enclosure Bay IP Addressing is being used to provide IP Addresses to the VC Ethernet Modules and the OA firmware version is 2.60 (or earlier). If iLO DNS name registrations are statically assigned in the DNS infrastructure, move to Step 2 below:

  1. If relying on Dynamic DNS updates for iLO, the OA firmware version must be updated to at least OA FW 3.11 before proceeding with the next step, otherwise iLO will only be reachable by IP address and there may be other ramifications to iLO LDAP Authentication.
  2. Using the default Administrator account, log into the OA, then in Enclosure Bay IP Addressing, Select the "Interconnect Bays" tab and remove the DNS server IP address entries from the "Shared Interconnect Settings." Click Apply.
  3. In the OA, in Enclosure Bay IP Addressing, Select the "Device Bays" tab and remove DNS server IP address entries from the "Shared Interconnect Settings" and click Apply.
  4. Within 5 minutes, the DNS settings for the modules should update and normal module communication will be restored.
  5. It is important that no VC domain changes are made until the following steps are fully completed.
  6. If the Virtual Connect Domain is managed by Virtual Connect Enterprise Manager (VCEM):
    a) If any VC Domain from the impacted VC Domain group is currently in maintenance mode go to the VCEM user interface and click "Cancel VC Domain Maintenance". Note that cancelling maintenance mode will roll back any VC Domain changes that were made while in maintenance mode. Verify that all running and pended jobs are allowed to complete before proceeding to step b).
    b) Click the "VC Domains" tab, select the impacted VC domain and click "VC Domain Maintenance"
    c) Click "Make Changes via VC Manager". This will release control to the VCM.
  7. From the VCM GUI, select "Tools" => "Reset Virtual Connect Manager." This will force resynchronization of the modules if not synchronized in Step 4 above.
  8. If the Virtual Connect Domain is managed by Virtual Connect Enterprise Manager (VCEM): Go to the VCEM user interface and click "Cancel VC Domain Maintenance". Wait for the job to complete. Cancelling maintenance mode prevents unnecessary propagation of changes to other members of the VC Domain Group.
    IMPORTANT : If the NO_COMM condition was present or detected during one of the VCM administrative update tasks (listed in the DESCRIPTION section above), VCM may automatically resynchronize the modules, which would create a temporary VC domain-wide network outage during VC module initialization in either Step 4 or Step 7 above (but not both). Outage time will vary depending on the size of the VC domain.

Scenario 3 - External DHCP is being used to provide IP Addresses to the VC Ethernet Modules with any version of OA firmware:

  1. On the External DHCP Scope, create an exclusion range of IP addresses (preferably only the VC Ethernet module addresses). This exclusion range needs to be configured within EBIPA on the OA.
  2. Using the default Administrator account, log into the OA, then in Enclosure Bay IP Addressing, select the "Interconnect Bays" tab and configure the IP Addresses that were excluded in Step 1 above for bays that contain VC Ethernet modules. Do not configure DNS Server entries. Click Apply.
  3. It is important that no VC domain changes are made until the following steps are fully completed.
  4. Reboot the standby and primary VC modules to force them to use the new EBIPA lease. In a redundant design, the modules should be rebooted serially to mitigate downtime.

a. Reset the standby VC Module from OA.
b. Wait 15 minutes for the standby module to recover.
c. Reset the primary VC Module from OA.
d. Within 5 minutes, normal module communication will be restored.

IMPORTANT : If the NO_COMM condition was present or detected during one of the VCM administrative update tasks (listed in the DESCRIPTION section above), VCM may automatically resynchronize the modules, which would create a temporary VC domain-wide network outage during VC module initialization. Outage time will vary depending on the size of the VC domain.

This advisory will be updated if additional information becomes available.

No comments: