Hope you're having a great time during this Holiday season. But some of you might just right now studying and polish their skills to become a VMware certified professionals. So we'll add another chapter today. vCenter server and ESXi, find and fix problems, it's certainly important to know where to look. At least where to start. This topic from VCP 6.5-DCV study guide is another on today we're covering. We continue to fill our VCP6.5-DCV Study guide page where we usually do a one objective per day. Today's chapter is VCP6.5-DCV Objective 7.1 – Troubleshoot vCenter Server and ESXi Hosts.
You still have the choice to go for the latest VCP6.5-DCV or rather take the older VCP6-DCV which seems less demanding.
The exam has 70 Questions (single and multiple choices), passing score 300, and you have 105 min to complete the test. We wish everyone good luck with the exam.
Check our VCP6.5-DCV Study Guide Page.
You can download your free copy via this link – Download Free VCP6.5-DCV Study Guide at Nakivo.
VCP6.5-DCV Objective 7.1 – Troubleshoot vCenter Server and ESXi Hosts
- Understand VCSA monitoring tool
- Monitor status of the vCenter Server services
- Perform basic maintenance of a vCenter Server database
- Monitor status of ESXi management agents
- Determine ESXi host stability issues and gather diagnostics information
- Monitor ESXi system health
- Locate and analyze vCenter Server and ESXi logs
- Determine appropriate commands for troubleshooting
- Troubleshoot common issues, including:
- vCenter Server services
- Identity Sources
- vCenter Server connectivity
- Virtual machine resource contention, configuration and operation
- Platform Services Controller (PSC)
- Problems with installation
- VMware Tools installation
- Fault-Tolerant network latency
- KMS connectivity
- vCenter Certification Authority
Understand VCSA monitoring tool
You can monitor VCSA through the VAMI user interface which is accessible through the well-known port 5480. So in order to connect there use this format:
https://IP_or_FQDN:5480
You'll need to provide a root password for the connection. This UI is different and uses different credentials from the vSphere web client login. In some cases you are having problems accessing the web client UI, but the VAMI user interface works as usually.
That's where you can have a look in case you have problems as the UI gives you access to the services status, and you can monitor CPU, network and disk space usage there.
After logging in you can:
- Reboot and (or) shut down the VCSA appliance
- Upgrade/patch the appliance
- Create a vCenter Support Bundle
- Initiate file-level VCSA backup which saves configuration data.
- There is also a Health Status widget
Monitor status of the vCenter Server services
It's possible to visualize the badge showing the status of the most important vCenter server services. For checking the status of vCenter services you must be a member of the SystemConfiguration.Administrators group in the vCenter Single Sign-On domain.
Home > System Configuration Icon
You get to a view where you can further click on the Nodes and Services.
Then you can click on Nodes > select and Click the node > On the right side click the Related Objects TAB.
You can see different status such as warnings (yellow) or critical (red). You have a possibility to start or restart a service manually and configure service startup to start with the system. By clicking the individual service, you can see the details about each service.
Perform basic maintenance of a vCenter Server database
vCenter Server database instance and vCenter Server needs some attention when backing up, but it's a necessary step to back up your vCenter server DB before doing any upgrade processes.
One of the usual database maintenance tasks might be one of those below:
- performing a backup on regular basis (check your vendor's DB docs on that).
- Check the growth of the log file and compact the database log file, if necessary.
- You should Be Backing up the database before any vCenter Server upgrade.
For VCSA, there is not much to do as Postgre SQL is auto-managed by VCSA and no specific DB tasks are necessary. You can monitor Postgre SQL DB via VAMI UI.
Monitor status of ESXi management agents
From vSphere documentation:
The vCenter Solutions Manager displays the vSphere ESX Agent Manager agents that you use to deploy and manage related agents on ESX hosts.
An administrator uses the solutions manager to keep track of whether a solution's agents are working as expected. Outstanding issues are reflected by the solution's ESX Agent Manager status and a list of issues.
When a solution's state changes, the solutions manager update the ESX Agent Manager's summary status and state. Administrators use this status to track whether the goal state is reached.
The agency's health status is indicated by a specific color:
Red – The solution must intervene for the ESX Agent Manager to proceed. For example, if a virtual machine agent is powered off manually on a compute resource and the ESX Agent Manager does not attempt to power on the agent. The ESX Agent Manager reports this action to the solution. The solution alerts the administrator to power on the agent.
Yellow – The ESX Agent Manager is actively working to reach a goal state. The goal state can be enabled, disabled, or uninstalled. For example, when a solution is registered, its status is yellow until the ESX Agent Manager deploys the solutions agents to all the specified compute resources. A solution does not need to intervene when the ESX Agent Manager reports its ESX Agent Manager health status as yellow.
Green – A solution and all its agents reached the goal state.
Determine ESXi host stability issues and gather diagnostics information
In this section, we'll have a look at Troubleshooting vSphere HA Host States.
vCenter Server shows some error messages for vSphere HA host states. Each error message means something different error messages. If there is an error, vSphere HA cannot engage and protect VMs on the host. Those VMs cannot be restarted on other hosts in the cluster.
TIP: Fix 3 Warning Messages when deploying ESXi hosts in a lab
Those errors might occur when activating or deactivating HA on the cluster. When this happens, you should determine how to resolve the error, so that vSphere HA is fully operational.
Troubleshooting vSphere Auto Deploy – The vSphere Auto Deploy troubleshooting topics offer solutions for situations when provisioning hosts with vSphere Auto Deploy does not work as expected.
- Authentication Token Manipulation Error – Creating a password that does not meet the authentication requirements of the host causes an error.
- Active Directory Rule Set Error Causes Host Profile Compliance Failure – Applying a host profile that specifies an Active Directory domain to join causes a compliance failure.
- Unable to Download VIBs When Using vCenter Server Reverse Proxy – You are unable to download VIBs if vCenter Server is using a custom port for the reverse proxy.
VMware support bundles.
VMware Technical Support routinely requests diagnostic information from you when a support request is handled. This diagnostic information contains product specific logs, configuration files, and data appropriate to the situation. The information is gathered using a specific script or tool for each product and can include a host support bundle from the ESXi host and vCenter Server support bundle. Data collected in a host support bundle may be considered sensitive. Additionally, as of vSphere 6.5, support bundles can include encrypted information from an ESXi host. For more information on support bundles.
How To collect ESX/ESXi and vCenter Server diagnostic data?
vSphere Web Client and log in to the vCenter Server system > Inventory Lists, select vCenter Servers > Click the vCenter Server that manages the ESX/ESXi hosts from which you want to export logs > Monitor tab > System Logs > Click Export System Logs > Select the ESX/ESXi hosts > Select the Include vCenter Server and vSphere Web Client logs option (optional) > Click Next.
Select Gather performance data to include performance data information in the log files. You can update the duration and interval time between which you want to collect the data >Next
Click Generate Log Bundle. The Download Log Bundles dialog appears when the Generating Diagnostic Bundle task completes.
Click Download Log Bundle to save it to your local computer. The host or vCenter Server generates .zip bundles containing the log files. The Recent Tasks panel shows the Generate diagnostic bundles task in progress.
After the download completes, click Finish or generate another log bundle.
To export the events log:
Select an inventory object > Click the Monitor tab, and click Tasks and Events > Events > Click the Export icon > In the Export Events window, specify what types of event information you want to export. Click Generate CSV Report, and click Save. Specify a file name and location and save the file.
Monitor ESXi system health
vSphere uses Common Information Model (CIM) on ESXi instead of installing the hardware agents in the Service Console (well there is no Linux console since ages now). The different CIM providers are available for different hardware installed in the server (HBA, Network cards, Raid Controllers etc).
If connected through vCenter:
OR, If connected directly to the ESXi host:
Locate and analyze vCenter Server and ESXi logs
There are many ways to view ESXi system logs. To view System Logs on an ESXi Host you can:
- Use the direct console interface to view the system logs on an ESXi host. These logs provide information about system operational events. From the console, select View System Logs. Press a corresponding number key to view a log.
vCenter Server agent (vpxa) logs appear if the host is managed by vCenter Server.
Press Enter or the spacebar to scroll through the messages. You can also (optionally) do a regular expression search by pressing the slash key (/) and type the text to find. Then hit Enter to find the highlighted text on the screen.
To exit the search, just press q to return to the direct console.
View vCenter System Log Entries
In the vSphere Web Client, navigate to a vCenter Server > From the Monitor tab, click System Logs > From the drop-down menu, select the log and entry you want to view > Common Logs.
Determine appropriate commands for troubleshooting
The CLI commands is a vast chapter.
Depending what you want to do, which part of the infrastructure you targetting:
- vmkping – simple ping via a vmkernel interface (ex. How-to troubleshoot iSCSI connection to your SAN )
- vmkfstools – works with VMFS volumes, VMDKs … (ex Recreate a missing VMDK header file )
- esxcli network <namespace> – ( ex. How to create custom ESXi Firewall rule )
- esxcli storage <namespace>- ( ex. How to tag disk as SSD VMware esxi 5.x and 6.0 )
- esxtop – performance monitoring – (ex. How-to check Queue Depth Of Storage Adapter or Storage Device )
We have blogged previously about some ESXi CLI commands, but it's really vast. Check our posts:
- ESXi Commands List – Getting started
- ESXi Commands List – networking commands
- ESXi Commands List – networking commands [Part 2]
- ESXi Commands List – Storage
Troubleshoot common issues, including:
- vCenter Server services
- Identity Sources
- vCenter Server connectivity
- Virtual machine resource contention, configuration and operation
- Platform Services Controller (PSC)
- Problems with installation
- VMware Tools installation
- Fault-Tolerant network latency
- KMS connectivity
- vCenter Certification Authority
You might work with an excellent PDF from VMware called “vSphere Troubleshooting”. Do a search on Google to get the latest release. Section below is partly from this document.
There are many scenarios, in general, when it comes to troubleshooting. It would be really time-consuming (and impossible) to identify and invoke even small part of those scenarios. That's why I'll try to give a general guidance on vSphere troubleshooting.
At first, you need to Identify the symptoms. Know what's happen. If something is failing, then there must be an exact cause on that.
Some questions you ask when troubleshooting:
- What is the task or expected behavior that is not occurring?
- Can the affected task be divided into subtasks that you can evaluate separately?
- Is the task ending in an error? Is an error message associated with it?
- Is the task completing but in an unacceptably long time?
- Is the failure consistent or sporadic?
- What has changed recently in the software or hardware that might be related to the failure?
Defining the Problem Space – After you identify the symptoms of the problem, determine which components in your setup are affected, which components might be causing the problem, and which components are not involved.
To define the problem space in an implementation of vSphere, be aware of the components present. In addition to VMware software, consider third-party software in use and which hardware is being used with the VMware virtual hardware.
Recognizing the characteristics of the software and hardware elements and how they can impact the problem, you can explore general problems that might be causing the symptoms.
- Misconfiguration of software settings
- Failure of physical hardware
- Incompatibility of components
Break down the process and consider each piece and the likelihood of its involvement separately. For example, a case that is related to a virtual disk on local storage is probably unrelated to third-party router configuration. However, a local disk controller setting might be contributing to the problem. If a component is unrelated to the specific symptoms, you can probably eliminate it as a candidate for solution testing.
Think about what changed in the configuration recently before the problems started. Look for what is common in the problem. If several problems started at the same time, you can probably trace all the problems to the same cause.
Testing Possible Solutions – After you know the problem’s symptoms and which software or hardware components are most likely involved, you can systematically test solutions until you resolve the problem.
With the information that you have gained about the symptoms and affected components, you can design tests for pinpointing and resolving the problem. These tips might make this process more effective.
- Generate ideas for as many potential solutions as you can.
- Verify that each solution determines unequivocally whether the problem is fixed. Test each potential solution but move on promptly if the fix does not resolve the problem.
- Develop and pursue a hierarchy of potential solutions based on likelihood. Systematically eliminate each potential problem from the most likely to the least likely until the symptoms disappear.
- When testing potential solutions, change only one thing at a time. If your setup works after many things are changed at once, you might not be able to discern which of those things made a difference.
- If the changes that you made for a solution do not help resolve the problem, return the implementation to its previous status. If you do not return the implementation to its previous status, new errors might be introduced.
- Find a similar implementation that is working and test it in parallel with the implementation that is not working properly. Make changes on both systems at the same time until few differences or only one difference remains between them.
Today's topic VCP6.5-DCV Objective 7.1 – Troubleshoot vCenter Server and ESXi Hosts was a very large, sometimes “painful” topic to learn. Try to install few ESXi hosts in a nested environment and do some “hands-on” UI testing. Also, there are VMware Hands-on Labs or Ravello, if you don't have spare hardware at home.
Happy troubleshooting and good luck with your exam. Check our VCP6.5-DCV Study Guide Page.
More from ESX Virtualization
- VCP6.5-DCV Study Guide
- ESXi Lab
- What Is VMware ESXi Lockdown Mode?
- How to Configure Statistics Collection Intervals in vCenter
- How to Install latest ESXi VMware Patch – [Guide]
Stay tuned through RSS, and social media channels (Twitter, FB, YouTube)