Monday, July 30, 2012

User’s AD account being locked out by vCenter 5.0 server, how I found our culprit

My vCenter is running Windows 2008R2.  I tried Running the Microsoft sysinternals tools to find out which host is locking out this user and it kept pointing me to vCenter.  Of course to me it didn’t make any sense why vCenter would be locking this user out every 60 minutes.  I also checked the standard stuff, no services running as the user, nothing under the “Credential Manager” on vCenter.  There are also several VMware KB’s for dealing with the issue, they want you to use TCPVIEW, and some other tools, but none of them helped me on this one.

I’ll start my troubleshooting steps below after the normal sysInternals AD “find out which machine is locking the user out” steps.

1) I logged into vCenter and opened “Server_Manager/Diagnostics/Event_Viewer/Windows Logs/Security”

I saw the following 3 Audit Failures happening ever 60:16 seconds (not sure if that interval is important to the issue, but it gave me an ability to predict the next occurrence).  Since the last event was at 9:31:30am, I can predict the next occurrence at 10:31:46am.   You can see the “Account for Which Logon Failed:” “Account Name:” is the usersloginname, this is key information we’ll use later.


2) Logging into vCenter I see this under “Tasks & Events”, the timestamp is an exact match to the Event Log Event above.  I see nothing that correlates to this timestamp under vCenter Tasks, which makes me think this is being initiated from an external (to vCenter Service) source.


3) Before I realized this was happening every 60:16 I grabbed a large packet capture on my vCenter Server.  Now that I could predict the exact second this would happen again, I decided to capture a smaller time window with a script.

I setup a single scheduled wireshark dumpcap by utilizing Windows Task Scheduler.


The dumpcap –a parameter specifies a 4 second duration of packet capture, the –w specifies what to name the output file, lo.pcap in this example.  I dumped the files into a directory called c:\output .  I didn’t use the default location because when I ran the script manually (from the command line to test) I received access denied, and since I’m too lazy to troubleshoot it, but I’m pretty security minded, I setup a temp directory called output that I gave everyone full control to.  I setup the schedule to start the capture 2 seconds before the planned event and it did successfully gather the data I needed.

4) Now to analyze what I captured.  Because the user swore that he was not manually doing anything to the systems, I though if I did a search within the file for the users name, it may give me some good data.  I did a Wireshark Filter of tcp contains usersloginname.  What I saw was 4 packets containing his name, the first going from vCenter to DC, then the response from DC to vCenter, then the same repeated.


By itself, this isn’t overall useful, but IF this problem wasn’t coming FROM virtual center (or a locally installed app), it must be coming over the network.  And if vCenter is talking to AD, most likely something is happening right before this event to cause it.  You see this communication started with packet 316 @ 2.74 seconds after I began capturing.  This is a good correlation since I started the capture 2 seconds early, I was right on time.  I was also able to verify that the issue had happened again  right on time in the Windows Event log.  So what is happening on vCenter’s network RIGHT BEFORE packet 316?  I removed my filter and scrolled up a bit, most of the traffic was from vShield Manager, the SQL DB, misc vCloud items, but one IP stood out, it was not from my vCloud farm of servers.  This Mystery IP seemed like my most likely candidate.  I also did a ping –a 10.x.x.x hoping for a strike on reverse DNS to get a hostname without luck.

To confirm my suspicions that the mystery IP was somehow triggering the lockouts, I switched to analyzing my older bigger packet capture and did a Filter by ip.src==10.x.x.x (the mystery IP). This came up with traffic starting at packet 67701 @ 305.4 seconds into the capture.  Now if my hypothesis is correct and this mystery IP is the culprit, then there should be some talk back and forth with my DC immediately following.  I did another search in this old capture with tcp contains usersloginname and found this traffic on packet 67714 @ 305.5 seconds into the capture!!!  This means the Mystery IP was contacting vCenter, and then locking out my users AD account, but since it was vCenter talking directly to AD, vCenter was getting credit for locking the user out.

5) I located the Mystery IP VM, it was a VMware vCenter Operations VM using the users old password, we fixed it and then waited until 60:16 more seconds had passed, no new logs in vCenter, no new failures in AD, the issue is resolved!

1 comment:

EerFoolWVU said...

This was very helpful in hunting down something locking out an AD account. Thank you for the excellent post!