In today's Objective we'll discuss VCP6-DCV Objective 7.1 – Troubleshoot vCenter Server, ESXi Hosts, and Virtual Machines. You can check the whole VCP6-DCV Study Guide page for all topics there. You can also check the vSphere 6 page where you’ll find many how-to, videos, and tutorials about vSphere 6.
Another troubleshooting chapter today. After we cracked the troubleshooting of vSphere upgrades, in another troubleshooting chapter we hit the storage and network issues, today we'll hit the Toubleshooting of vCenter, ESXi and VMs.
When something goes wrong with vCenter, only things that rely on vCenter does suffer. Things like HA, DRS or FT continues to work, but you can't manually vMotion a VM if you don't have an access to vCenter. It can be that one of the vCenter services went down or something like that. Today well' have a look at those different things which can happened.
vSphere Knowledge
- Identify general ESXi host troubleshooting Guidelines
- Identify general vCenter troubleshooting Guidelines
- Troubleshoot Platform Services Controller (PSC) issues
- Troubleshoot common installation issues
- Monitor ESXi system health
- Locate and analyze vCenter and ESXi logs
- Export diagnostic information
- Identify common Command Line Interface (CLI) commands
- Troubleshoot common virtual machine issues
- Troubleshoot virtual machine resource contention issues
- Identify Fault Tolerant network latency issues
- Troubleshoot VMware Tools installation issues
- Identify/Troubleshoot virtual machines various states (e.g. orphaned, unknown, etc.)
- Identify virtual machine constraints
- Identify the root cause of a storage issue based on troubleshooting information
- Identify common virtual machine boot disk errors
- Identify and detect common knowledge base article solutions
—————————————————————————————————–
Identify general ESXi host troubleshooting Guidelines
When starting troubleshooting, you should first:
- Identify symptoms – WTF? …. is going on?
- Define problem space – software? Hardware? What is causing the problem? What's excluded?
- Test solutions – Once knwing the symptoms and problem space, you can test solutions, one by one until problem resolved.
check vSphere 6 troubleshooting guide p.7 and onward…
Identify general vCenter troubleshooting Guidelines
Few good troubleshooting scenarios is in the vSphere 6 troubleshooting guide p.33
You'll find problems (and their resolution) like those one below:
- vCenter Server Upgrade Fails When Unable to Stop Tomcat Service
- Microsoft SQL Database Set to Unsupported Compatibility Mode Causes
vCenter Server Installation or Upgrade to Fail - Error When You Change vCenter Server Appliance Host Name
- vCenter Server System Does Not Appear in vSphere Web Client Inventory
- Unable to Start the Virtual Machine Console
- Unable to View the Alarm Definitions Tab of a Data Center
- vCenter Server Cannot Connect to the Database
- vCenter Server Cannot Connect to Managed Hosts
Troubleshoot Platform Services Controller (PSC) issues
PSC logs location and names:
- cis-license – VMware Licensing Service
- SSO – VMware Secure Token Service
- VMCA – VMware Certificate Service
- vmdird – VMware Directory Service
For Platform Services Controller node deployments, additional runtime logs are located at
C:\ProgramData\VMware\CIS\runtime\VMwareSTSService\logs
including logs for these services:
- VMware Secure Token Service
- VMware Identity Management Service
Troubleshoot common installation issues
- Recursive panic might occur when using ESXi Dump Collector – PSOD. Check release notes.
vSphere installation guide p.245
vCenter server on Windows
- Collect Installation Logs by Using the Installation Wizard – You can use the Setup Interrupted page of the installation wizard to browse to the generated .zip file of the
vCenter Server for Windows installation log files. If the installation fails, the Setup Interrupted page appears with the log collection check boxes selected by default.
The installation files are collected in a .zip file on your desktop, for example, VMware-VCS-logs-time-of-installation-attempt.zip
You can then unzip the log file located on your desktop and start checking what's wrong.
Manual retrieve of logs:
C:\ProgramData\VMware\vCenterServer\logs
C:\Users\username\AppData\Local\Temp
The files in the %TEMP% directory include vminst.log, pkgmgr.log, pkgmgr-comp-msi.log, and vim-vcs-msi.log
vCenter Appliance
The full path to the log files is displayed in the vCenter Server Appliance deployment wizard.
1. Log in to the Windows host machine on which you want to download the bundle.
2. Open a Web browser and enter the URL to the support bundle displayed in the DCUI.
https://appliance-fully-qualified-domain-name:443/appliance/support-bundle
3. Enter the user name and password of the root user.
4. Click Enter > The support bundle is downloaded as .tgz file on your Windows machine.
5. (Optional) To determine which firstboot script failed, examine the firstbootStatus.json file.
If you ran the vc-support.sh script in the vCenter Server Appliance Bash shell, to examine the firstbootStatus.json file, run
cat /var/log/firstboot/firstbootStatus.json
- Attempt to Install a Platform Services Controller After a Prior Installation Failure
- Collect Installation Logs by Using the Installation Wizard.
Monitor ESXi system health
Hardware Monitoring on ESXi – The Common Information Model (CIM) is used on ESXi instead of installing the hardware agents in the Service Console. The different CIM providers are available for different hardware installed in the server (HBA, Network cards, Raid Controllers etc). [source…]
If connected through vCenter:
OR, If connected directly to the ESXi host:
Locate and analyze vCenter and ESXi logs
VMware KB – Location of log files for VMware products (1021806)
Export diagnostic information
Create a Log Bundle (via Web client)
Locate/Analyze VMware Log Bundles
- Start the vSphere Web Client and log in to the vCenter Server system.
- Under Inventory Lists, select vCenter Servers.
- Click the vCenter Server that contains the ESX/ESXi hosts from which you want to export logs.
- Click the Monitor tab and click System Logs.
- Click Export System Logs.
- Select the ESX/ESXi hosts from which you want to export logs.
- Select the Include vCenter Server and vSphere Web Client logs option. This step is optional.
- Click Next.
- Select the system logs that are to be exported.
- Select Gather performance data to include performance data information in the log files.Note: You can update the duration and interval time between which you want to collect the data.
- Click Next.
- Click Generate Log Bundle. The Download Log Bundles dialog appears when the Generating Diagnostic Bundle task completes.
- Click Download Log Bundle to save it to your local computer.Note: The host or vCenter Server generates .zip bundles containing the log files. The Recent Tasks panel shows the Generate diagnostic bundles task in progress.
To export the events log:
- Select an inventory object.
- Click the Monitor tab, and click Events.
- Click the Export icon.
- In the Export Events window, specify what types of event information you want to export.
- Click Generate CSV Report, and click Save.
Same covered in VCP6-DCV Objective 7.3 – Troubleshoot vSphere Upgrades.
Identify common Command Line Interface (CLI) commands
Cli commands. Depending what you want to do, which part of the infrastructure you targetting:
- vmkping – simple ping via vmkernel interface (ex. How-to troubleshoot iSCSI connection to your SAN )
- vmkfstools – works with VMFS volumes, VMDKs … (ex Recreate a missing VMDK header file )
- esxcli network <namespace> – ( ex. How to create custom ESXi Firewall rule )
- esxcli storage <namespace>- ( ex. How to tag disk as SSD VMware esxi 5.x and 6.0 )
- esxtop – performance monitoring – (ex. How-to check Queue Depth Of Storage Adapter or Storage Device )
Troubleshoot common virtual machine issues
Troubleshoot virtual machine resource contention issues
Identify Fault Tolerant network latency issues
For FT you'll need 10GbE pipe. That's a fact. vSphere 6 Features – New Config Maximums, Long Distance vMotion and FT for 4vCPUs.
Troubleshoot VMware Tools installation issues
- VMware KB Article 1003908 – Troubleshooting a Failed VMware Tools Installation in a Guest Operating System.
- How to remove VMware Tools manually if uninstall or upgrade finish with error
- Manual Download of VMware Tools from VMware Website
Identify/Troubleshoot virtual machines various states (e.g. orphaned, unknown, etc.)
A virtual machine is deleted outside of vCenter Server – A user can delete a virtual machine through the VMware Management Interface while vCenter Server is down, through the vSphere Client directly connected to an ESX/ESXi host, or by deleting the virtual machine's configuration file through the service console. These virtual machines can be removed from the vCenter Server by right-clicking the virtual machine and selecting delete
- Virtual machines appear as invalid or orphaned in vCenter Server (1003742)
Identify virtual machine constraints
- VMware KB Article 1008360 – Troubleshooting Virtual Machine Performance Issues
- Troubleshooting a virtual machine that has stopped responding: VMM and Guest CPU usage comparison (1017926)
- VMware KB Article 2001003 – Troubleshooting ESX/ESXi Virtual Machine Performance Issues
Identify the root cause of a storage issue based on troubleshooting information
Often the root cause is storage. We all know that spinning media are slowly replaced by SSDs, but they still have some years to come. Storage contention happens when the demand of hosts for IOs exceeds the the storage and hba(s). The contention can happens at the VM level, HBA level or at the arrray level.
ESXTOP:
davg – average response time for a command which are sent to the device.
kavg – average response time a command is in the vmkernel
gavg – response time as it appears to the VM. (davg + kavg).
CMD/s – number of IOps sent or received from the device or the VM
Identify common virtual machine boot disk errors
- kb.vmware.com/kb/1006296 – Cannot boot or start a virtual machine converted by VMware vCenter Converter 4.x/5.x (1006296)
- Identifying critical Guest OS failures within virtual machines
Identify and detect common knowledge base article solutions
- KB 2000988 – Troubleshooting vSphere Auto Deploy
- KB 653 – Collecting Diagnostic Information for VMware ESX/ESXi
- KB 1008360 – Troubleshooting Virtual Machine Performance Issues
- KB 2001003 – Troubleshooting ESX/ESXi Virtual Machine Performance Issues
- KB 1003908 – Troubleshooting a Failed VMware Tools Installation in a Guest Operating System
- KB 1003999 – Identifying Critical Guest OS Failures Within Virtual Machines.
Tools used for this Objective
- vSphere Installation and Setup Guide
- vSphere Troubleshooting Guide
- vSphere Virtual Machine Administration Guide
- vSphere Server and Host Management Guide
- vSphere Monitoring and Performance Guide
- vSphere Security Guide
- vSphere Client / vSphere Web Client