Monitoring NetApp with Nagios and Nagiosgraph

With the installation of our new Network Appliance (NetApp) filers, I needed to be able to monitor them. Yes I know that they have an autosupport feature where they email you as well as NetApp whenever anything happens, but I still like to do my own monitoring.

The first thing that I did was check at the Nagios Exchange to see if they had any plugins for NetApps. They actually had two different plugins. The first worked and the second didn’t (if it offered a failed disk check). So I modified the first to add the additional feature, and becuase I knew I was going to be using Nagiosgraph, I corrected the performance data output for two of the checks to be compliant with the Nagios Plugin Development Guidelines.

Download my modified check_netapp Nagios plugin.

In order to graph the NetApp data with Nagiosgraph you will need to use my modified check_netapp plugin so please download and test before proceeding.

The following nagiosgraph map entries will allow you to graph both CPU Load and Disk Space Used per volume (by name):

# Service type: netapp-cpuload
#   check command: check_netapp -H Address -C community -v CPULOAD -w 75 -c 90
#   output: CPULOAD OK - CPU load: 1%
#   perfdata: netapp-cpuload=1%;75;90;0;100
/perfdata:netapp-cpuload=(\d+)%/
and push @s, [ netappcpuload,
	[ cpuload, GAUGE, $1 ] ];

# Service type: netapp-disk-used
#   check command:  check_netapp -H Address -C community -v DISKUSED -o /vol/volume/ -w 75 -c 90
#   output: DISKUSED OK - /vol/volume/ - total: 33554432 Kb - used 190692 Kb (1%) - free: 33363740 Kb
#   perfdata: NetApp /vol/root/ Used Space=190692KB;25165824;30198988;0;33554432
/perfdata:NetApp.*Space=(\d+)KB;(\d+);(\d+);\d+;(\d+)/
and push @s, [ netappdisk,
	[ diskused, GAUGE, $1*1024 ],
	[ diskwarn, GAUGE, $2*1024 ],
	[ diskcrit, GAUGE, $3*1024 ],
	[ diskmaxi, GAUGE, $4*1024 ] ];

Here are the entries that need to be created in your serviceextinfo.cfg file to produce the corresponding graphs:

define serviceextinfo {
  service_description  NetApp-Load
  host_name       netapp1,netapp2
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=netappcpuload,cpuload
  icon_image      graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  NetApp-DiskUsed-/vol/volume
  host_name       netapp1,netapp2
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=netappdisk,diskused,diskwarn,diskcrit,diskmaxi
  icon_image      graph.png
  icon_image_alt  View graphs
}

Here is my modified check_netapp Nagios plugin, you will need to upload this to your Nagios server.

#!/usr/bin/perl -w

# Copyright (c) 2006 Dy 4 Systems Inc.
#
# Parameter checks and SNMP v3 based on code by Christoph Kron
#  and S. Ghosh (check_ifstatus)
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
#
#
# Report bugs to ken.mckinlay@curtisswright.com, nagiosplug-help@lists.sf.net
#
# 2006.05.01 Version 1.0
#
#
#############################################################
#
# Updated by Ken Nerhood - http://nerhood.wordpress.net/
# 2006.06.19
#
# Added check for Failed Disks
# Corrected perfdata output for CPULOAD and DISKUSED
#    to make it compliant with Nagios Plugin Guiodlines
#
#############################################################
#
# $Id: check_netapp,v 1.2 2006/05/01 13:44:16 root Exp root $

use strict;
use lib "/usr/local/nagios/libexec";
use utils qw($TIMEOUT %ERRORS &print_revision &support);
use Net::SNMP;
use Getopt::Long;
Getopt::Long::Configure('bundling');

my $PROGNAME = 'check_netapp';
my $PROGREVISION = '1.0';

sub print_help ();
sub usage ();
sub process_arguments ();

my ($status,$timeout,$answer,$perfdata,$hostname,$volume);
my ($seclevel,$authproto,$secname,$authpass,$privpass,$snmp_version);
my ($auth,$priv,$session,$error,$response,$snmpoid,$variable);
my ($warning,$critical,$opt_h,$opt_V);
my %snmpresponse;

my $state = 'UNKNOWN';
my $community='public';
my $maxmsgsize = 1472; # Net::SNMP default is 1472
my $port = 161;

my $snmpFailedFanCount = '.1.3.6.1.4.1.789.1.2.4.2';
my $snmpFailPowerSupplyCount = '.1.3.6.1.4.1.789.1.2.4.4';
my $snmpFailedDiskCount = '.1.3.6.1.4.1.789.1.6.4.7';
my $snmpUptime = '.1.3.6.1.2.1.1.3';
my $snmpcpuBusyTimePerCent = '.1.3.6.1.4.1.789.1.2.1.3';
my $snmpenvOverTemperature = '.1.3.6.1.4.1.789.1.2.4.1';
my $snmpnvramBatteryStatus = '.1.3.6.1.4.1.789.1.2.5.1';
my $snmpfilesysvolTable = '.1.3.6.1.4.1.789.1.5.8';
my $snmpfilesysvolTablevolEntryOptions = '.1.3.6.1.4.1.789.1.5.8.1.7';
my $snmpfilesysvolTablevolEntryvolName = '.1.3.6.1.4.1.789.1.5.8.1.2';
my $snmpfilesysdfTabledfEntry = '.1.3.6.1.4.1.789.1.5.4.1';
my $snmpfilesysdfTabledfEntrydfFileSys = '.1.3.6.1.4.1.789.1.5.4.1.2';
my $snmpfilesysdfTabledfEntrydfKBytesTotal = '.1.3.6.1.4.1.789.1.5.4.1.3';
my $snmpfilesysdfTabledfEntrydfKBytesUsed = '.1.3.6.1.4.1.789.1.5.4.1.4';
my $snmpfilesysdfTabledfEntrydfKBytesAvail = '.1.3.6.1.4.1.789.1.5.4.1.5';
my $snmpfilesysdfTabledfEntrydfPercentKBytesCapacity = '.1.3.6.1.4.1.789.1.5.4.1.6';

my %nvramBatteryStatus = (
        1 => 'ok',
        2 => 'partially discharged',
        3 => 'fully discharged',
        4 => 'not present',
        5 => 'near end of life',
        6 => 'at end of life',
        7 => 'unknown',
        8 => 'over charged',
        9 => 'fully charged',
);

# Just in case of problems, let's not hang Nagios
$SIG{'ALRM'} = sub {
        print "ERROR: No snmp response from $hostname (alarm timeout)\n";
        exit $ERRORS{'UNKNOWN'};
};

$status = process_arguments();
if ( $status != 0 ) {
        print_help();
        exit $ERRORS{'OK'};
}

alarm($timeout);

# do the query
if ( ! defined ( $response = $session->get_table($snmpoid) ) ) {
        $answer=$session->error;
        $session->close;
        $state = 'CRITICAL';
        print "$state:$answer for $snmpoid with snmp version $snmp_version\n";
        exit $ERRORS{$state};
}
$session->close;
alarm(0);

foreach my $snmpkey (keys %{$response} ) {
        my ($oid,$key) = ( $snmpkey =~ /(.*)\.(\d+)$/ );
        $snmpresponse{$oid}{$key} = $response->{$snmpkey};
}

if ( $variable eq 'FAN' ) {
        $state = 'OK';
        $state = 'WARNING' if ( ( defined $warning ) && ( $snmpresponse{$snmpFailedFanCount}{0} >= $warning ) );
        $state = 'CRITICAL' if ( ( defined $critical ) && ( $snmpresponse{$snmpFailedFanCount}{0} >= $critical ) );
        $answer = sprintf("Fans failed: %d",$snmpresponse{$snmpFailedFanCount}{0});
        $perfdata = sprintf("failedfans=%d",$snmpresponse{$snmpFailedFanCount}{0});
} elsif ( $variable eq 'UPTIME' ) {
        $state = 'OK';
        $answer = sprintf("System Uptime: %s",$snmpresponse{$snmpUptime}{0});
        $perfdata = sprintf("uptime=%s",$snmpresponse{$snmpUptime}{0});
} elsif ( $variable eq 'FAILEDDISK' ) {
        $state = 'OK';
        $state = 'WARNING' if ( ( defined $warning ) && ( $snmpresponse{$snmpFailedDiskCount}{0} >= $warning ) );
        $state = 'CRITICAL' if ( ( defined $critical ) && ( $snmpresponse{$snmpFailedDiskCount}{0} >= $critical ) );
        $answer = sprintf("Disks failed: %d",$snmpresponse{$snmpFailedDiskCount}{0});
        $perfdata = sprintf("faileddisks=%d",$snmpresponse{$snmpFailedDiskCount}{0});
} elsif ( $variable eq 'PS' ) {
        $state = 'OK';
        $state = 'WARNING' if ( ( defined $warning ) && ( $snmpresponse{$snmpFailPowerSupplyCount}{0} >= $warning ) );
        $state = 'CRITICAL' if ( ( defined $critical ) && ( $snmpresponse{$snmpFailPowerSupplyCount}{0} >= $critical ) );
        $answer = sprintf("Power supplies failed: %d",$snmpresponse{$snmpFailPowerSupplyCount}{0});
        $perfdata = sprintf("failedpowersupplies=%d",$snmpresponse{$snmpFailPowerSupplyCount}{0});
} elsif ( $variable eq 'CPULOAD' ) {
        $state = 'OK';
        $state = 'WARNING' if ( ( defined $warning ) && ( $snmpresponse{$snmpcpuBusyTimePerCent}{0} >= $warning ) );
        $state = 'CRITICAL' if ( ( defined $critical ) && ( $snmpresponse{$snmpcpuBusyTimePerCent}{0} >= $critical ) );
        $answer = sprintf("CPU load: %d%%",$snmpresponse{$snmpcpuBusyTimePerCent}{0});
        #$perfdata = sprintf("cpuload=%d",$snmpresponse{$snmpcpuBusyTimePerCent}{0});
        $perfdata = sprintf("netapp-cpuload=%d%%;%d;%d;0;100",$snmpresponse{$snmpcpuBusyTimePerCent}{0},$warning,$critical);
} elsif ( $variable eq 'TEMP' ) {
        $state = 'OK';
        $state = 'CRITICAL' if ( $snmpresponse{$snmpenvOverTemperature}{0} ==  2 );
        $answer = sprintf ("Over temperature: %s",($snmpresponse{$snmpenvOverTemperature}{0} == 1 ? 'no':'yes'));
        $perfdata = sprintf("overtemperature=%d",$snmpresponse{$snmpenvOverTemperature}{0});
} elsif ( $variable eq 'NVRAM' ) {
        $state = 'OK';
        $state = 'CRITICAL' if (( $snmpresponse{$snmpnvramBatteryStatus}{0} > 1 ) && ( $snmpresponse{$snmpnvramBatteryStatus}{0} < 9 ));
        $answer = sprintf ("NVRAM battery status: %s",$nvramBatteryStatus{$snmpresponse{$snmpnvramBatteryStatus}{0}});
        $perfdata = sprintf("nvrambatterystatus=%d",$snmpresponse{$snmpnvramBatteryStatus}{0});
} elsif ( $variable eq 'SNAPSHOT' ) {
        $state = 'OK';
        $answer = 'Snapshot status:';
        foreach my $key ( keys %{$snmpresponse{$snmpfilesysvolTablevolEntryOptions}} ) {
                if ( defined $volume ) {
                        if ( $snmpresponse{$snmpfilesysvolTablevolEntryvolName}{$key} eq $volume ) {
                                if ( $snmpresponse{$snmpfilesysvolTablevolEntryOptions}{$key} !~ /nosnap=off/ ) {
                                        $state = 'CRITICAL';
                                        $answer = sprintf ("%s %s Snapshots disabled;",
                                                        $answer,
                                                        $snmpresponse{$snmpfilesysvolTablevolEntryvolName}{$key});
                                } else {
                                        $answer = sprintf ("%s volume %s enabled",$answer,$snmpresponse{$snmpfilesysvolTablevolEntryvolName}{$key}) if $state ne 'CRITICAL';
                                }
                                last;
                        }
                } else {
                        if ( $snmpresponse{$snmpfilesysvolTablevolEntryOptions}{$key} !~ /nosnap=off/ ) {
                                $state = 'CRITICAL';
                                $answer = sprintf ("%s %s Snapshots disabled;",$answer,$snmpresponse{$snmpfilesysvolTablevolEntryvolName}{$key});
                        }
                }
        }
        $answer = sprintf ("%s all enabled",$answer) if $answer eq 'Snapshot status:';
        $perfdata = sprintf("");
} elsif ( $variable eq 'DISKUSED' ) {
        $state = 'OK';
        foreach my $key ( keys %{$snmpresponse{$snmpfilesysdfTabledfEntrydfFileSys}} ) {
                if ( defined $volume ) {
                        if ( $snmpresponse{$snmpfilesysdfTabledfEntrydfFileSys}{$key} eq $volume ) {
                                my $volume = $snmpresponse{$snmpfilesysdfTabledfEntrydfFileSys}{$key};
                                my $used = $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesUsed}{$key};
                                my $total = $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesTotal}{$key};
                                my $avail = $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesAvail}{$key};
                                my $percent = $snmpresponse{$snmpfilesysdfTabledfEntrydfPercentKBytesCapacity}{$key};
                                $answer = sprintf("%s - total: %d Kb - used %d Kb (%d%%) - free: %d Kb",$volume,$total,$used,$percent,$avail);
                                $perfdata = sprintf("NetApp %s Used Space=%dKB;%d;%d;0;%d",$volume,$used,$total*$warning/100,$total*$critical/100,$total);
                                $state = 'WARNING' if ( ( defined $warning ) && ( $percent >= $warning ) );
                                $state = 'CRITICAL' if ( ( defined $warning ) && ( $percent >= $critical ) );
                                last;
                        }
                } else {
                        my $volume = $snmpresponse{$snmpfilesysdfTabledfEntrydfFileSys}{$key};
                        my $used = $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesUsed}{$key};
                        my $total = $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesTotal}{$key};
                        my $avail = $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesAvail}{$key};
                        my $percent = $snmpresponse{$snmpfilesysdfTabledfEntrydfPercentKBytesCapacity}{$key};
                        $answer .= sprintf("%s - total: %d Kb - used %d Kb (%d%%) - free: %d Kb\n",$volume,$total,$used,$percent,$avail);
                        $perfdata .= sprintf("NetApp %s Used Space=%dKB;%d;%d;0;%d",$volume,$used,$total*$warning/100,$total*$critical/100,$total);
                        $state = 'WARNING' if ( ( defined $warning ) && ( $percent >= $warning ) && ( $state ne 'CRITICAL') );
                        $state = 'CRITICAL' if ( ( defined $warning ) && ( $percent >= $critical ) );
                }
        }
        if ( ( ! defined $answer ) && ( defined $volume ) ) {
                $state = 'UNKNOWN';
                $answer = "unknown volume: $volume";
                $perfdata = '';
        }
}


print "$variable $state - $answer|$perfdata\n";
exit $ERRORS{$state};

sub usage () {
        print "\nMissing arguments!\n\n";
        print "check_netapp -H <ip_address> -v variable [-w warn_range] [-c crit_range]\n";
        print "             [-C community] [-t timeout] [-p port-number]\n";
        print "             [-P snmp version] [-L seclevel] [-U secname] [-a authproto]\n";
        print "             [-A authpasswd] [-X privpasswd] [-o volume]\n\n";
        support();
        exit $ERRORS{'UNKNOWN'};
}

sub print_help () {
        print "check_netapp plugin for Nagios monitors the status\n";
        print "of a NetApp system\n\n";
        print "Usage:\n";
        print "  -H, --hostname\n\thostname to query (required)\n";
        print "  -C, --community\n\tSNMP read community (defaults to public)\n";
        print "  -t, --timeout\n\tseconds before the plugin tims out (default=$TIMEOUT)\n";
        print "  -p, --port\n\tSNMP port (default 161\n";
        print "  -P, --snmp_version\n\t1 for SNMP v1 (default), 2 for SNMP v2c\n\t\t3 for SNMP v3 (requires -U)\n";
        print "  -L, --seclevel\n\tchoice of \"noAuthNoPriv\", \"authNoPriv\", \"authpriv\"\n";
        print "  -U, --secname\n\tuser name for SNMPv3 context\n";
        print "  -a, --authproto\n\tauthentication protocol (MD5 or SHA1)\n";
        print "  -A, --authpass\n\tauthentication password\n";
        print "  -X, --privpass\n\tprivacy password in hex with 0x prefix generated by snmpkey\n";
        print "  -V, --version\n\tplugin version\n";
        print "  -w, --warning\n\twarning level\n";
        print "  -c, --critical\n\tcritical level\n";
        print "  -v, --variable\n\tvariable to query, can be:\n";
        print "\t\tCPULOAD - CPU load\n";
        print "\t\tDISKUSED - disk space used\n";
        print "\t\tFAILEDDISK - failed disks\n";
        print "\t\tFAN - fail fan state\n";
        print "\t\tNVRAM - nvram battery status\n";
        print "\t\tPS - power supply\n";
        print "\t\tSNAPSHOT - volume snapshot status\n";
        print "\t\tTEMP - over temperature check\n";
        print "\t\tUPTIME - up time\n";
        print "  -o, --volume\n\tvolume to query (defaults to all)\n";
        print "  -h, --help\n\tusage help\n\n";
        print_revision($PROGNAME,"\$Revision: 1.2 $PROGREVISION\$");
}

sub process_arguments () {
        $status = GetOptions (
                'V' => \$opt_V, 'version' => \$opt_V,
                'h' => \$opt_h, 'help' => \$opt_h,
                'P=i' => \$snmp_version, 'snmp_version=i' => \$snmp_version,
                'C=s' => \$community, 'community=s' => \$community,
                'L=s' => \$seclevel, 'seclevel=s' => \$seclevel,
                'a=s' => \$authproto, 'authproto=s' => \$authproto,
                'U=s' => \$secname, 'secname=s' => \$secname,
                'A=s' => \$authpass, 'authpass=s' => \$authpass,
                'X=s' => \$privpass, 'privpass=s' => \$privpass,
                'H=s' => \$hostname, 'hostname=s' => \$hostname,
                't=i' => \$timeout, 'timeout=i' => \$timeout,
                'v=s' => \$variable, 'variable=s' => \$variable,
                'w=i' => \$warning, 'warning=i' => \$warning,
                'c=i' => \$critical, 'critical=i' => \$critical,
                'o=s' => \$volume, 'volume=s' => \$volume,
        );

        if ( $status == 0 ) {
                print_help();
                exit $ERRORS{'OK'};
        }

        if ( $opt_V ) {
                print_revision($PROGNAME,"\$Revision: 1.2 $PROGREVISION\$");
                exit $ERRORS{'OK'};
        }

        if ( ! utils::is_hostname($hostname) ) {
                usage();
                exit $ERRORS{'UNKNOWN'};
        }

        unless ( defined $timeout ) {
                $timeout = $TIMEOUT;
        }

        if ( ! $snmp_version ) {
                $snmp_version = 1;
        }

        if ( $snmp_version =~ /3/ ) {
                if ( defined $seclevel && defined $secname ) {
                        unless ( $seclevel eq ('noAuthNoPriv' || 'authNopriv' || 'authPriv' ) ) {
                                usage();
                                exit $ERRORS{'UNKNOWN'};
                        }

                        if ( $seclevel eq ('authNoPriv' || 'authPriv' ) ) {
                                unless ( $authproto eq ('MD5' || 'SHA1') ) {
                                        usage();
                                        exit $ERRORS{'UNKNOWN'};
                                }
                                if ( ! defined $authpass ) {
                                        usage();
                                        exit $ERRORS{'UNKNOWN'};
                                } else {
                                        if ( $authpass =~ /^0x/ ) {
                                                $auth = "-authkey => $authpass";
                                        } else {
                                                $auth = "-authpassword => $authpass";
                                        }
                                }
                        }

                        if ( $seclevel eq 'authPriv' ) {
                                if ( ! defined $privpass ) {
                                        usage();
                                        exit $ERRORS{'UNKNOWN'};
                                } else {
                                        if ( $privpass -~ /^0x/ ) {
                                                $priv = "-privkey => $privpass";
                                        } else {
                                                $priv = "-privpassword => $privpass";
                                        }
                                }
                        }
                } else {
                        usage();
                        exit $ERRORS{'UNKNOWN'};
                }
        }

        # create the SNMP session
        if ( $snmp_version =~ /[12]/ ) {
                ($session,$error) = Net::SNMP->session(
                                        -hostname => $hostname,
                                        -community => $community,
                                        -port => $port,
                                        -version => $snmp_version,
                );
                if ( ! defined $session ) {
                        $state = 'UNKNOWN';
                        $answer = $error;
                        print "$state:$answer";
                        exit $ERRORS{$state};
                }
        } elsif ( $snmp_version  =~ /3/ ) {
                if ( $seclevel eq 'noAuthNoPriv' ) {
                        ($session,$error) = Net::SNMP->session(
                                                -hostname => $hostname,
                                                -community => $community,
                                                -port => $port,
                                                -version => $snmp_version,
                                                -username => $secname,
                        );
                } elsif ( $seclevel eq 'authNoPriv' ) {
                        ($session,$error) = Net::SNMP->session(
                                                -hostname => $hostname,
                                                -community => $community,
                                                -port => $port,
                                                -version => $snmp_version,
                                                -username => $secname,
                                                -authprotocol => $authproto,
                                                $auth
                        );
                } elsif ( $seclevel eq 'authPriv' ) {
                        ($session,$error) = Net::SNMP->session(
                                                -hostname => $hostname,
                                                -community => $community,
                                                -port => $port,
                                                -version => $snmp_version,
                                                -username => $secname,
                                                -authprotocol => $authproto,
                                                $auth,
                                                $priv
                        );
                }
                if ( ! defined $session ) {
                        $state = 'UNKNOWN';
                        $answer = $error;
                        print "$state:$answer";
                        exit $ERRORS{$state};
                }
        } else {
                $state = 'UNKNOWN';
                print "$state: No support for SNMP v$snmp_version\n";
                exit $ERRORS{$state};
        }

        # check the supported variables
        if ( ! defined $variable ) {
                print_help();
                exit $ERRORS{'UNKNOWN'};
        } else {
                if ( $variable eq 'UPTIME' ) {
                        $snmpoid = $snmpUptime;
                } elsif ( $variable eq 'FAN' ) {
                        $snmpoid = $snmpFailedFanCount;
                } elsif ( $variable eq 'FAILEDDISK' ) {
                        $snmpoid = $snmpFailedDiskCount;
                } elsif ( $variable eq 'PS' ) {
                        $snmpoid = $snmpFailPowerSupplyCount;
                } elsif ( $variable eq 'CPULOAD' ) {
                        $snmpoid = $snmpcpuBusyTimePerCent;
                } elsif ( $variable eq 'TEMP' ) {
                        $snmpoid = $snmpenvOverTemperature;
                } elsif ( $variable eq 'NVRAM' ) {
                        $snmpoid = $snmpnvramBatteryStatus;
                } elsif ( $variable eq 'SNAPSHOT' ) {
                        $snmpoid = $snmpfilesysvolTable;
                } elsif ( $variable eq 'DISKUSED' ) {
                        $snmpoid = $snmpfilesysdfTabledfEntry;
                } else {
                        print_help();
                        exit $ERRORS{'UNKNOWN'};
                }
        }

        return $ERRORS{'OK'};
}

Advertisements
Leave a comment

18 Comments

  1. Ian Collier

     /  October 13, 2006

    Most helpful.

    BTW, I realised today that you can monitor the snapshots by using something like:

    check_netapp-2 -H netapp -C public -v DISKUSED -o /vol/volume/.snapshot -w 10 -c 5

  2. David

     /  October 13, 2006

    I have found NFS and CIFS ops/s to be very helpful to check.

  3. Willem

     /  April 4, 2007

    check_netapp reports something about snmp v1 back to me. I am running W2K3 servers. Is netapp not compatible with othe snmp versions ?? I am not familiar with snmp at all. I only report what I see in Nagios. Thought I had the right stuff when I saw netapp…. Or is there a little trick ??

  4. Ken M

     /  June 20, 2007

    I like the addition you made to the script I originally wrote when I first started playing with Nagios and plugins. I was planning on doing the same but just never got around to it.

  5. Ian Collier

     /  August 16, 2007

    Hi,

    Discovered an interesting problem. If volumes are over 2TB then netapp’s snmp reports negative values – approx the right magnitude, but negative. So, inserting ‘abs’ ahead of the appropriate values in the diskused section ensures that you get something that makes sense in nagios – and more importantly doesn’t make nagiosgraph barf.

    –Ian

  6. Ian Collier

     /  August 16, 2007

    Hi,

    Discovered an interesting problem. If volumes are over 2TB then netapp’s snmp reports negative values – approx the right magnitude, but negative. So, inserting ‘abs’ ahead of the appropriate values in the diskused section ensures that you get something that makes sense in nagios – and more importantly doesn’t make nagiosgraph barf.

    eg

    my $used = abs $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesUsed}{$key};

    –Ian

  7. Brian

     /  December 28, 2007

    I’m trying to get this going, are there any further updated info for nagios 3?

    How do i define the netapp host configs?
    do I add the “service types” to the commands.cfg?

    Thanks
    brian

  8. Brian,

    I’ve been on holiday for the past week. I will have to look at this when I get back to the office.

    As for Nagios 3 I have not looked at that yet and probably won’t for a couple of months. In meantime check out the Nagios Exchange version for the service config definitions. If I remember correctly I didn’t really change any of them.

    –ken

  9. Charles Richmond

     /  September 11, 2008

    Below is the diff of the check-netapp that corrects the negative numbers with volumes over 2TB and which converts output to GB for readability. The complete text can be found at http://www.iisc.com/check_netapp . The original check_netapp is by Ken Mckinlay with additional Nagios compatibility work by Ken Nerhood. My change is a relatively minor incremental one.

    The result of the change can be seen in this output:
    OLD/check_netapp -H x.x.x.x -C public -v DISKUSED -o /vol/vol_bpdimage/ -w 80 -c 90
    DISKUSED OK – /vol/vol_bpdimage/ – total: 1090519040 Kb – used 589970728 Kb (54%) – free: 500548312 Kb|NetApp /vol/vol_bpdimage/ Used Space=589970728KB;872415232;981467136;0;1090519040

    libexec/check_netapp -H x.x.x.x -C public -v DISKUSED -o /vol/vol_bpdimage/ -w 80 -c 90
    DISKUSED OK – /vol/vol_bpdimage/ – total: 1040 Gb – used 562 Gb (54%) – free: 477 Gb|NetApp /vol/vol_bpdimage/ Used Space=562GB;832;936;0;1040

    Note: I am calculating real GB using 1024*1024. If you want manufacturer’s GB then change the ‘1048576’ to ‘1000000’ in the diff below.

    [nagios@lrdcsvcdsk1 libexec]$ diff check_netapp ../OLD/check_netapp
    39,46d38
    < # Updated by Charles Richmond – http://www.iisc.com/
    < # 2008.09.11
    < #
    < # Modified to correct negative values for volumes larger than
    < # 2TB and modified ‘sprintf’ output to be Gb instead of Kb
    < #
    < #############################################################
    < #
    203,205c195,197
    < my $used = abs $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesUsed}{$key};
    < my $total = abs $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesTotal}{$key};
    my $used = $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesUsed}{$key};
    > my $total = $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesTotal}{$key};
    > my $avail = $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesAvail}{$key};
    207,208c199,200
    < $answer = sprintf(“%s – total: %d Gb – used %d Gb (%d%%) – free: %d Gb”,$volume,$total/1048576,$used/1048576,$percent,$avail/1048576);
    $answer = sprintf(“%s – total: %d Kb – used %d Kb (%d%%) – free: %d Kb”,$volume,$total,$used,$percent,$avail);
    > $perfdata = sprintf(“NetApp %s Used Space=%dKB;%d;%d;0;%d”,$volume,$used,$total*$warning/100,$total*$critical/100,$total);
    215,217c207,209
    < my $used = abs $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesUsed}{$key};
    < my $total = abs $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesTotal}{$key};
    my $used = $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesUsed}{$key};
    > my $total = $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesTotal}{$key};
    > my $avail = $snmpresponse{$snmpfilesysdfTabledfEntrydfKBytesAvail}{$key};
    219,220c211,212
    < $answer .= sprintf(“%s – total: %d Gb – used %d Gb (%d%%) – free: %d Gb\n”,$volume,$total/1048576,$used/1048576,$percent,$avail/1048576);
    $answer .= sprintf(“%s – total: %d Kb – used %d Kb (%d%%) – free: %d Kb\n”,$volume,$total,$used,$percent,$avail);
    > $perfdata .= sprintf(“NetApp %s Used Space=%dKB;%d;%d;0;%d”,$volume,$used,$total*$warning/100,$total*$critical/100,$total);
    450d441
    <
    [nagios@lrdcsvcdsk1 libexec]$

    Charles Richmond http://www.iisc.com
    VDR2 Pit-os Talamban, Cebu City, RP

  10. Looks like Ian found the ‘abs’ before I did…

  11. I’m trying to get this working but the script seems to be missing some lines around 166. Perhaps something got lost in adding to this site?

    andrew.

  12. Andrew,

    Yes there was a problem with the script when I moved it to WordPress.com from my other host. I think I’ve fixed it now, so please try again and let me know if you have any problems.

    –ken

  13. Hi I’m Thomas and i’m a developer of a new open source project named BrainPDM. As you can see from our web site this open source application can store performances data from Nagios and graph the values making Hourly, Daily, Weekly, Montly and Yearly charts. If you want you can try it and give our some feedback….

  14. Ian Collier

     /  November 21, 2008

    Of course what I never got round to adding was that although wrapping that result in abs gets rid of the-ve and so stops nagios returning errors – the overflow still means that the reported sizes are wrong – it starts going down between 2 and 4 TB and then up again at 4, but 4.5 TB, for example reports as .5 TB. I suspect this is a limitation of the information available via snmp.

    –Ian

  15. A quick commercial plug, so take with a grain of salt, but if you wish to solve your NetApp monitoring issues with no configuration or coding, take a look at http://www.logicmonitor.com. It provides complete performance and fault monitoring (including per volume latency and IO operations), it requires no configuration, and deals with volume instance renumbering.
    Not free like Nagios, but requires no investment of time.

  16. Chris Wicklein

     /  April 1, 2009

    I don’t think the use of abs to correct negative numbers is correct. It looks like the problem is with unsigned 32-bit ints being misinterpreted as signed 32-bit ints. An easy way to fix this with with pack/unpack:

    $used = unpack(“I”, pack(“i”, $used));

  17. > It looks like the problem is with unsigned 32-bit ints being misinterpreted as signed 32-bit ints. […]

    To avoid this an other hassles of SNMP-queries I can really recommend using the native NetApp-API. I have written several scripts now using this way and once you understood the logic of this API you get quiet stable and predictable results.

  1. links for 2008-11-25 : JOSHMEANS.COM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: