VMware Snapshot Size Powershell Nagios Script

VMware snapshots are a fantastic feature. They can be easily created. The problem arises when they have been forgotten about. Not only do they consume disk space, they can also take a very long time to remove. The check_vm_snap_size.ps1 plugin for Nagios/Icinga was written to notify when any snapshots get over a certain defined size. While other methods exist for checking snapshot file sizes (like running a check via the service console), this plugin uses the PowerCLI interface released by VMware to present that information. When utilized along with NSClient++ it can easily report back to Nagios the size of your snapshots. Combine that with your favorite performance graphing utility (ex. Nagiosgraph) and you show the growth of your snapshot sizes.vmware-nagios-snapshot-size

While the plugin itself is fairly simple (I am no PowerShell guru) the steps to get it operate securely with NSClient++ and to minimize load are somewhat involved.

Prerequisites

Installation and Configuration

The installation and configuration of the script is fairly straight forward by itself. The difficult parts are related to optimizing your PowerCLI environment to reduce the load time. Because the script is reloaded every check interval, without optimization this can put extra load on your host. The other piece is to generate a credential file so that you are not passing username/passwords across the network needlessly.

PowerCLI Optimiziation

Because the PowerCLI is loaded every time the script is run, we want to minimized its impact on the system. One way to accomplish this is to manually compile the .Net PowerCLI XmlSerializers; doing this dramatically reduces the CPU load and startup of time of the add-in. You will only need to do this once per computer per version of the PowerCLI. A big thanks goes out to VELEMTNAL and vNugglets for the commands to do this.

The following script is what I ran on my host, it complies all known versions that might be on the host (I was lazy and didn’t really want to figure out which version I actually had). Note that this script needs to be run as Administrator (right-click and select “Run As Administrator”).

For 64-bit Operating Systems

C:\Windows\Microsoft.NET\Framework64\v2.0.50727\ngen.exe install "VimService55.XmlSerializers, Version=5.5.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f"
C:\Windows\Microsoft.NET\Framework64\v2.0.50727\ngen.exe install "VimService51.XmlSerializers, Version=5.1.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f"
C:\Windows\Microsoft.NET\Framework64\v2.0.50727\ngen.exe install "VimService50.XmlSerializers, Version=5.0.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f"
C:\Windows\Microsoft.NET\Framework64\v2.0.50727\ngen.exe install "VimService41.XmlSerializers, Version=4.1.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f"
C:\Windows\Microsoft.NET\Framework64\v2.0.50727\ngen.exe install "VimService40.XmlSerializers, Version=4.0.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f"
C:\Windows\Microsoft.NET\Framework64\v2.0.50727\ngen.exe install "VimService25.XmlSerializers, Version=2.5.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f"

If you have a 32-bit OS use

 
C:\Windows\Microsoft.NET\Framework\v2.0.50727\ngen.exe install "VimService55.XmlSerializers, Version=5.5.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f" 
C:\Windows\Microsoft.NET\Framework\v2.0.50727\ngen.exe install "VimService51.XmlSerializers, Version=5.1.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f" 
C:\Windows\Microsoft.NET\Framework\v2.0.50727\ngen.exe install "VimService50.XmlSerializers, Version=5.0.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f" 
C:\Windows\Microsoft.NET\Framework\v2.0.50727\ngen.exe install "VimService41.XmlSerializers, Version=4.1.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f" 
C:\Windows\Microsoft.NET\Framework\v2.0.50727\ngen.exe install "VimService40.XmlSerializers, Version=4.0.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f" 
C:\Windows\Microsoft.NET\Framework\v2.0.50727\ngen.exe install "VimService25.XmlSerializers, Version=2.5.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f" 

UPDATE – Apr 4, 2017
If you are using Powershell 3.0+ then you need to run a different set of commands in order to make the optimizaitons. Thanks to a comment on a VMware blog post the command you need to run for a 64-bit OS is

c:\Windows\Microsoft.NET\Framework64\v4.0.30319\ngen.exe install "VimService55.XmlSerializers, Version=5.5.0.0, Culture=neutral, PublicKeyToken=10980b081e887e9f"  /ExeConfig:%windir%\system32\WindowsPowerShell\v1.0\PowerShell_ISE.exe

Also according to another comment on the same post “This optimization is not possible and not needed any more with version 6.5+ of PowerCLI”

Credential Store Creation

The second challenge with running this script in an automated fashion via NSClient++ is related to authentication and user rights. The script has been designed to utilize the VI Credential Store features to securely save a credential file to the machine and then just pass that location in the command string so that you are not actually storing raw username and passwords or passing them across the network. The New-VICredentialStoreItem and Get-VICredentialStoreItem commandlets allow the file to be created, however the resulting saved files can only be utilized by the user account that created the file. Check out the PowerShell Article of the Week from Professional VMware for more information on secure credential storage.

By default the NSClient++ service runs as the local System account, so we need to launch a PowerCLI session as the System account if we want to utilize this feature.Thanks to a post on Ben Parker’s Blog called How do I run run Powershell.exe/command prompt as the LocalSystem Account on Windows 7? we have the answer. The trick is use PsExec from Microsoft/Sysinternals. Even though Ben’s blog post is specific to Windows 7 the process works just fine on Windows Server 2008 R2.

  1. Download PsExec from Microsoft
  2. Run PsExec from a command prompt as follows: psexec -i -s Powershell.exe this will open a new window
  3. In the new PowerShell console window type whoami and it should respond with NT AUTHORITY\SYSTEM
  4. Create the XML Credential file by running New-VICredentialStoreItem -host 'host.example.com' -user 'username' -password 'pword' -file c:\hostname.xml substiuting the correct server, user, password and file locations. Note that the location you choose should have the necessary security rights applied.

NSClient++ Configuration

In order to make this script work with NSClient++ you must first make sure that your nsclient.ini is configured for external scripts and NRPE, additionally you need to enable support for argument passing and to allow for nasty meta chars (this last step may not be needed) for the scripts. Assuming you have placed the check_vm_snap_size.ps1 script in the NSClient++ scripts folder then should add the following to the [/settings/external scripts/scripts] section of the config file.

Two things to note, first this should all be on one line it is shown as wrapped to easier reading. Second the last dash - is required.

check_vm_snap_size = cmd /c echo scripts\\check_vm_snap_size.ps1 -hostname $ARG1$ -crit $ARG2$ -warn $ARG3$ -credfile $ARG4$ -hostexclude $ARG5$ -guestexclude $ARG6$; exit($lastexitcode) | powershell.exe -command -

Nagios Configuration

The Nagios configuration is pretty straight forward. The check utilizes the check_nrpe command for passing the request to the host. The following is an example of the configuration for the checkcommands.cfg portion.

define command {
command_name  check_vm_snap_size
command_line  $USER1$/check_nrpe -H $HOSTADDRESS$ -t 45 -c check_vm_snap_size -a $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$
}

For the services check you will need to create something like the following, the last two arguments are optional and they refer to hosts and guests that you might want to exclude from the results. The other thing to note is that both the check_nrpe command and the NSClient++ configurations require that all backslashes be escaped because they are special characters, therefore for each single backslash in your path to your credential file you must enter four backslashes in the service check config.

define service {
service_description  VMware Snapshot Size
host_name            hostname
check_command        check_vm_snap_size!vcenterserver.example.com!1024!512!c:\\\\credfile.xml!excludehost.example.com!excludeguest
use                  generic-service
contact_groups       vm-admins
}

Known Issues/Limitations

  • While the check_vm_snap_size.ps1 script supports passing an array for both the hostexclude and guestexclude parameter options, that functionality does not yet work when sending via check_nrpe. You can specific a single host and a single guest, but not multiple.

Misc Notes

You may need to enable set-executionpolicy for both 64 bit and/or 32 bit PowerShell depending upon which version of NSClient++ you have installed.

To Do

The script still needs internal documentation written, as well as hopefully finding a solution to all of the known issues

 

The Script

Save the following as check_vm_snap_size.ps1

param ( [string] $Hostname = "",
 [double] $crit = 100,
 [double] $warn = 50,
 [string] $CredFile,
 [string] $HostExclude =@(""),
 [string] $GuestExclude =@(""),
 [switch] $help
)
$countLargeSnap = 0
$critcount = 0
$warncount = 0
$snapcount = 0
$crittsize = 0
$warntsize = 0
$snaptsize = 0
$LargeSnapNames = ""
$critSnapNames = ""
$warnSnapNames = ""
 
# parameter error checking
if ( $warn -ge $crit) {
 Write-Host "Error - crit vaule must be larger than warn value" -foregroundcolor "red"
 exit 3
}
if ( $Hostname -eq "") {
 Write-Host "Error - Hostname must be specified" -foregroundcolor "red"
 exit 3
}
 
#load VMware PowerCLI
add-pssnapin VMware.VimAutomation.Core -ErrorAction SilentlyContinue
 
# If no credential file specific use the account permission from the user running the script
# otherwise use the credential file to get the host, user, and password strings
if ($CredFile -eq "" ) {
 Connect-VIServer -Server $Hostname -WarningAction SilentlyContinue > $null
}
else {
 $creds = Get-VICredentialStoreItem -file $CredFile
 # check to see if the hostname specific matches hostname in credential file
 if ( $Hostname -eq $creds.Host) {
  Connect-VIServer -Server $creds.Host -User $creds.User -Password $creds.Password -WarningAction SilentlyContinue > $null
 }
 else{
  Write-Host "Unknown - Hostname specific does not match hostname in credentials file" -foregroundcolor "red"
  exit 3
 }
}
 
if ($global:DefaultVIServers.Count -lt 1) {
 write-host "Unknown - Connection to host failed!"
 exit 3
}
 
# Get the list of snaphosts to evaluate from the host, excluding hosts and
# guests if defined
$snapshots = get-VMhost | ?{$HostExclude -notcontains $_.Name} | get-vm | ?{$GuestExclude -notcontains $_.Name} | get-snapshot
 
# Loop through each snapshot and see any sizes exceed the warning or crital
# thresholds. If so then store their names and sizes. Could put into an array
# but that is for another day.
foreach ( $snap in $snapshots ) {
 $snapcount++
 $snaptsize = $snaptsize + $snap.SizeMB
 if ( $snap.SizeMB -ge $warn -and $snap.SizeMB -lt $crit ) {
  $warncount++
  $wVMName = $snap.VM
  $wVMSize = $snap.SizeMB
  $warntsize = $warntsize + $snap.SizeMB
if ( $warnSnapNames -eq "") {
    $warnSnapNames = "${wVMName}:${wVMSize}MB "
    }
   else {
    $warnSnapNames += "${wVMName}:${wVMSize}MB "
    }
 
        }      
  elseif ( $snap.SizeMB -ge $crit  ) {
   $critcount++
   $cVMName = $snap.VM
   $cVMSize = $snap.SizeMB
   $crittsize = $crittsize + $snap.SizeMB
   if ( $critSnapNames -eq "") {
    $critSnapNames = "${cVMName}:${cVMSize}MB "
    }
   else {
    $critSnapNames += "${cVMName}:${cVMSize}MB "
    }
 }
}
 
if ( $critcount -gt 0 ) {
 Write-Host "Critical -" $critcount "VM's with snapshosts larger than" $crit "MB :" $critSnapNames "|snaps=$snapcount;$warncount;$critcount;; ssize=${snaptsize}MB;$warn;$crit;;"
 exit 2
}
elseif( $warncount -gt 0 ) {
 Write-Host "Warning -" $warncount "VM's with snapshosts larger than" $warn "MB :" $warnSnapNames "|snaps=$snapcount;$warncount;$critcount;; ssize=${snaptsize}MB;$warn;$crit;;"
 exit 1
}
if ( $critcount -eq 0 ) {
 Write-Host "OK - No VM's with snapshosts larger than " $warn "MB" "or" $crit "MB" "|snaps=$snapcount;$warncount;$critcount;; ssize=${snaptsize}MB;$warn;$crit;;"
 exit 0
}

Sparklining Excel

According to Wikipedia a sparkline is a

“small, high resolution graphics embedded in a context of words, numbers, images … Whereas the typical chart is designed to show as much data as possible, and is set off from the flow of text, sparklines are intended to be succinct, memorable, and located where they are discussed. Their use inline usually means that they are about the same height as the surrounding text.”

A group has released an open source add-in for Microsoft Excel called TinyGraphs “that creates sparklines, mini column graphs, and area graphs from a row of data. It generates beautiful tiny graphics that are as small as a cell and is useful for visualizing large quantities of data, such as stock prices and exchange rates.”

While I have yet to actually use the add-in it is something that I definitely can see using in a number or reports that I generate at the office.

Graphing Sonicwall VPN Tunnel Usage

I have the need to track the network usage between each of our offices. We currently use IPSec based tunnels across the Internet for connectivity between all of our offices (we use a full mesh configuration). I looked around for way to monitor and graph the data for these tunnels off our Sonicwall firewalls, but found no good solution.

So I created the following templates and scripts for monitoring our Sonicwall firewalls via my favorite network monitoring application Cacti. The template includes graphs for CPU Utilization, Memory Usage, Current Connections Cache, and most importantly VPN utilization on a tunnel-by-tunnel.

The script portion (written is PERL) queries the firewall and returns the list of currently active tunnels (by the IP address on the Peer Gateway) as well as the tunnel name and decrypted (received) bytes and encrypted (transmitted) bytes. Because the tunnels are renegotiated (by default every 8 hours) you will experience spikes in your graph unless you follow the installation instructions.

Also because the firewall does not always return the VPN tunnel name you must renegotiate each tunnel prior to creating the graphs the first time in order for it to correctly pull in the name. You may need to do this a couple of times being sure to press the green reload O button in Cacti before they will all show up.

Installation Instructions: Visit my post on the Cacti forums for installing the software.

If you are running SonicOS Enhanced then you be able to graph everything, if you are running SonicOS Standard or the older the 6.X firmware, then you will only get the VPN monitoring as the other stats are unavailable via SNMP.

The following is the usage syntax if you would like to run the script by itself.

query_sonicwall_vpn.pl host community index
query_sonicwall_vpn.pl host community query {peergateway, vpnname, decryptbytes, encryptbytes}
query_sonicwall_vpn.pl host community get {peergateway, vpnname, decryptbytes, encryptbytes} DEVICE

DEVICE is the IP address of the PeerGateway of the tunnel you want

I know the script is less than optimal, but then I’m not really a programmer so I’d appreciate any feedback. Additionally, the basis for the script came from Dan Brummer in this post

Managing Your FLEXlm Licenses with Cacti and phpLicenseWatcher

So you are tasked with managing multiple FLEXlm based software license managers, but you want more than a dump of the current license information into a text file or in some horribly written and truly user-unfriendly Windows GUI. Then I have a couple of web based open source products for you, and while the installs are not the easiest they do offer some great insight into your license usage.

Managing Your License Servers

The first product is phpLicenseWatcher. It presents in a nice web interface all of the information that the command line as well as Windows GUI tool provides, and then some. Quoting from their website:

  • Shows the health of a license server or a group of them
  • Check which licenses are being used and who is currently using them
  • Get a listing of licenses, their expiration dates and number of days to expiration
  • E-mail alert of licenses that will expire within certain time period (i.e. within next 10 days)
  • Monitors license utilization

One of the biggest advantages of the product is that it allows you to monitor and manage multiple servers at once. It even includes the ability to graph your license use, but instead I would recommend the following:

Graphing Your FLEXlm License Usage

As you know I’m a great fan of the Cacti graphing system. Well thanks to the work of a user named pvenezia on their forums, he developed a fantastic template and script for graphing FLEXlm usage. The script allows you to monitor multiple license servers and graph the usage of every application on those servers.

What makes the Cacti based graphs of the FLEXlm servers so nice is that you can combine the data from multiple servers to give you an overall picture of your license use. For example we have multiple FLEXlm servers in different offices handling our AutoCAD use. Following graph shows an example of how I’ve combined the usage data from the multiple servers to show how each office is the licenses (note this is a custom graph create from the data from the referenced script and template).

I highly recommend these two open source solutions for helping you to monitor and manage your FLEXLm license servers and their associated products. It will help you to control your costs (i.e. knowing if you have too many licenses) and better manage who is using your software.

Graphing Motorola Surfboard SB5101 Cable Modem Stats with Cacti

So you’ve got a cable modem, and you’re having problems (or you just like to track everything). You’ve already been to the management page of your cable modem (in most cases it is reachable at http://192.168.100.1/), but now you want more, or at least to be able to track changes over time. What can you do?

Use Cacti to graph the stats on your cable modem. Man that would sure be easy if you only had SNMP access to your DOCSIS cable modem. If you do then check out this post. But, if you’re like me and you have good old Comcast who disables client side SNMP access then you’re going to need a script to scrape your modem’s web based interface.

Here is my script for a Motorola Surfboard SB5101 cable modem (based off this post) which will display the following two graphs within Cacti:

Cable Modem Power Graphs

This graph displays your power levels and signal to noise ratio.

Cable Modem Frequency Graphs

This graph displays the frequencies on which your modem is operating. These should almost never change.

In order to use these you will need to download two items. The first is my Motorola Surfboard SB5101 PERL script which can be downloaded from my Cablemodem Template post on the Cacti Forum and then upload to your Cacti scripts directory.

The second piece to download is my Cacti XML host template also from my Cablemodem Template post on the Cacti Forum. Once downloaded you can import it and then add your devices.

This is my first custom template. As I create additional ones, I add them. If you use this template, please let me know how it works for you.

Monitoring NetApp with Nagios and Nagiosgraph

With the installation of our new Network Appliance (NetApp) filers, I needed to be able to monitor them. Yes I know that they have an autosupport feature where they email you as well as NetApp whenever anything happens, but I still like to do my own monitoring.

The first thing that I did was check at the Nagios Exchange to see if they had any plugins for NetApps. They actually had two different plugins. The first worked and the second didn’t (if it offered a failed disk check). So I modified the first to add the additional feature, and becuase I knew I was going to be using Nagiosgraph, I corrected the performance data output for two of the checks to be compliant with the Nagios Plugin Development Guidelines.

Download my modified check_netapp Nagios plugin.

In order to graph the NetApp data with Nagiosgraph you will need to use my modified check_netapp plugin so please download and test before proceeding.

The following nagiosgraph map entries will allow you to graph both CPU Load and Disk Space Used per volume (by name):

(more…)

Cacti’s Painless Network Monitoring


For the past week I’ve submersed myself in the world of Cacti, and have been have a lot of fun making cool graphs. As my staff will attest, I’m really big into monitoring anything and everything on our network. I find it’s very helpful to be able to track usage, capacity, growth, and a bunch of other things. Without some kind of baseline how do you know if things are operating as they should?

Oh, so you’re wondering what Cacti is, well here is the developer’s description:

Cacti is a complete network graphing solution designed to harness the power of RRDTool‘s data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices.

Anyway, I’ve been using MRTG for last 8+ years to graph utilization, etc. It was a great product, and I’ve built up a number of useful scripts and hacks to monitor all kinds of things from Windows boxes to printers to email queues. I’ve even built a neat menu system, but it was a real hack. It was hard to manage, add devices, or even make changes. I’ve followed the RRDTool world for a while (and even moved my MRTG configs over to using RRD), but never found a solution that was easy to use and had the flexibility I wanted/needed. That was until I stumbled across Cacti.

Cacti has a templating system that makes adding new devices easy, it as an active user community that is sharing their templates for graphs, and device monitoring. It is really powerful and actually quite easy to use. It even integrates with Nagios, although I have yet to accomplish that integration. In the coming weeks I’ll be sharing my adventures with the installation and configuration as well as some of the templates that I have used and created/modified. So stay tuned for further post about Cacti.

Nagiosgraph with Windows support

After reviewing the four main tools for graphing performance with Nagios (APAN, Nagiosgraph, Nagiostat, and PerfParse), I decided that Nagiosgraph was the easiest for me to get up and running. Out of the box it worked great for my Linux systems and my network tests, but I needed to add support for monitoring my Windows servers.

I have used APAN in the past, but it was really tough to configure. I also tried PerfParse and liked it. However, it required a lot more resources for the database than I was prepared to handle, and I could probably only keep 30 days of data. But it worked great.

To make things easier I installed the latest CVS nightly of the 1.4.0alpha Nagios Plugins. As of 20040817 these plugins supported performance data output for the check_nt plugin (the one that works with the NSClient service). Once these plugins were complied and installed, I updated the nagiosgraph map file. This file is what is used to parse the output for generating the stats.

(more…)