Nagiosgraph with Windows support

After reviewing the four main tools for graphing performance with Nagios (APAN, Nagiosgraph, Nagiostat, and PerfParse), I decided that Nagiosgraph was the easiest for me to get up and running. Out of the box it worked great for my Linux systems and my network tests, but I needed to add support for monitoring my Windows servers.

I have used APAN in the past, but it was really tough to configure. I also tried PerfParse and liked it. However, it required a lot more resources for the database than I was prepared to handle, and I could probably only keep 30 days of data. But it worked great.

To make things easier I installed the latest CVS nightly of the 1.4.0alpha Nagios Plugins. As of 20040817 these plugins supported performance data output for the check_nt plugin (the one that works with the NSClient service). Once these plugins were complied and installed, I updated the nagiosgraph map file. This file is what is used to parse the output for generating the stats.

Here are the additions that I added to the map file to support graphing of the Windows statistics (note this requires the 1.4.0 plugin install) as well as the fping service.

Update: I have corrected the errors as pointed out in the comments as well added an entry for the MSSQL plugin.

# Service type: ntload
#   check command: check_nt -H Address -v CPULOAD -l5,70,90,30,70,90
#   output: CPU Load 9% (5 min average) 11% (30 min average)
#   perfdata: '5 min avg Load'=9%;70;80;0;100 '30 min avg Load'=11%;70;90;0;100
#/perfdata:.*5 min avg Load'=(\d+)%;(\d+);(\d+);\d+;\d+ '30 min avg Load'=(\d+)%;\d+;\d+;\d+;\d+ /
/output:.*?(\d+)% .*?(\d+)% /
and push @s, [ ntload,
       [ avg05min, GAUGE, $1 ],
       [ avg30min, GAUGE, $2 ] ];

# Service type: ntmem
#   check command: check_nt -H Address -v MEMUSE -w 50 -c 90
#   output: Memory usage: total:2467.75 Mb - used: 510.38 Mb (21%) - free: 1957.37 Mb (79%)
#   perfdata: Memory usage=510.38Mb;1233.88;2220.98;0.00;2467.75
/perfdata:.*usage=([.0-9]+)Mb;([.0-9]+);([.0-9]+);([.0-9]+);([.0-9]+)/
and push @s, [ ntmem,
       [ memused, GAUGE, $1*1024**2 ],
       [ memwarn, GAUGE, $2*1024**2 ],
       [ memcrit, GAUGE, $3*1024**2 ],
       [ memmmax, GAUGE, $5*1024**2 ] ];

# Service type: ntdisk
#   check command: check_nt -H Address -v USEDDISKSPACE -lc -w 75 -c 90
#   output: c:\ - total: 25.87 Gb - used: 4.10 Gb (16%) - free 21.77 Gb (84%)
#   perfdata: c:\ Used Space=4.10Gb;19.40;23.28;0.00;25.87
/perfdata:.*Space=([.0-9]+)Gb;([.0-9]+);([.0-9]+);([.0-9]+);([.0-9]+)/
and push @s, [ ntdisk,
       [ diskused, GAUGE, $1*1024**3 ],
       [ diskwarn, GAUGE, $2*1024**3 ],
       [ diskcrit, GAUGE, $3*1024**3 ],
       [ diskmaxi, GAUGE, $5*1024**3 ] ];

# Service type: fping
#   output:FPING OK - 10.1.1.1 (loss=20%, rta=385.000000 ms)
#   perfdata: loss=20%;79;100;0;100 rta=0.385000s;2.000000;5.000000;0.000000
#/output:PING.*?(\d+)%.+?([.\d]+)\sms/
/perfdata:.*loss=(\d+)%.*rta=([.0-9]+)s;/
and push @s, [ fping,
       [ losspct, GAUGE, $1 ],
       [ rta,     GAUGE, $2 ] ];

# Service type: mssql
#   output:   OK - MS SQL Server 2000 has 42 user(s) connected:  18 appTrendCtrMgr, 1 nagios, 2 NTAUTHORITY\SYSTEM.
#   perfdata: users=36;;;;
#/output:*.MS SQL Server 2000 has ([0-9]+) use/
/perfdata:.*users=([0-9]+)/
and push @s, [ mssql,
       [ users, GAUGE, $1 ] ];

In addition here are the entries that created for serviceextinfo.cfg file to produce the graphs:

define serviceextinfo {
  service_description  FPing
  host_name   host
  notes_url      /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=fping,rta&db=fping,losspct
  icon_image  graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  NTload
  host_name       ntserver1,ntserver2
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntload,avg05min,avg30min
  icon_image      graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  NTmem
  host_name      ntserver1,ntserver2
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntmem,memused,memwarn,memcrit,memmmax
  icon_image      graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  NTdiskC
  host_name      ntserver1,ntserver2
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntdisk,diskused,diskwarn,diskcrit,diskmaxi
  icon_image      graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  NTdiskD
  host_name      ntserver1,ntserver2
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntdisk,diskused,diskwarn,diskcrit,diskmaxi
  icon_image      graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  NTdiskE
  host_name      ntserver1,ntserver2
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntdisk,diskused,diskwarn,diskcrit,diskmaxi
  icon_image      graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  MSSQL
  host_name       bsql1
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=mssql,users
  icon_image      graph.png
  icon_image_alt  View graphs
}
Advertisements
Previous Post
Leave a comment

23 Comments

  1. It took a while to find your posting but its very welcome and valuable as we dont have any perl skills and the map file in nagiosgraph only contains unix command mappings. We are now actively graphing our Disk usage, memory and ping. CPU doesnt work – I think the mapping is wrong – maybe there was a change from plugins1.4Alpha to the release version. Any other maps and serviceextinfo would be very welcome!
    Regards,
    Adrian

  2. I just doubled check what I have running in production and it is exactly as above. I’m now running the released 1.4 plugins and everything is fine.

    One thing to double check, would be to see if you are running the correct check command. The one that I use is check_nt -H Address -v CPULOAD -l5,70,90,30,70,90 and the output should look like this CPU Load 5% (5 min average) 2% (30 min average) | ‘5 min avg Load’=5%;70;90;0;100 ’30 min avg Load’=2%;70;90;0;100.

    If it doesn’t, then post/send me the commend you are using and the output you are getting and I’ll see what I can do.

  3. Hi,

    We have it!

    We needed a d in the map below to get it to work.

    # Service type: ntload
    # check command: check_nt -H Address -v CPULOAD -l5,70,90,30,70,90 # output: CPU Load 9% (5 min average) 11% (30 min average) # perfdata: ‘5 min avg Load’=9%;70;80;0;100 ’30 min avg Load’=11%;70;90;0;100
    #/perfdata:.*5 min avg Load’=(d+)%;(d+);(d+);d+;d+ ’30 min avg Load’=(d+)%;d+;d+;d+;d+ / /output:.*?(d+)% .*?(d+)% / and push @s, [ ntload, [ avg05min, GAUGE, $1 ], [ avg30min, GAUGE, $2 ] ];

    In the map file you have posted, you have (d+) and (d+) seems to work for us.

    Many thanks for all your help.

    Regards,
    Adrian

  4. I’ve got it: This web interface is eating backslashes.
    So if there is d, where you expect number, you have to write down “backslash”d.

  5. Another small bug:
    in serviceextinfo.cfg there have to be memused insted of memfree, so whole line would be:
    /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntmem,memused,memwarn,memcrit,memmmax

  6. I have updated the entry to reflect all of the corrections and changes submitted.

  7. Tim

     /  October 30, 2005

    Thanks for putting this up. It has helped immensely.

  8. hello there..

    thanks for the help..

    once question, how would one create a graph for checking diskspace using check_nrpe:

    Input servicedescr:Disk Space C:
    Input hostname:acsjhba5sp02
    Input perfdata:
    Input lastcheck:1131898498
    Input output:OK: C:: Total: 37.3G – Used: 4.74G (12%) – Free: 32.5G (88%)

    Any help would be great..

    thanks

    Chris

  9. So i’m having problems getting the graphs or the RRD files written for check_nt cpu load.

    the command is:
    $USER1$/check_nt -H $HOSTADDRESS$ -v CPULOAD -l 60,90,95

    the nagiosgraph map file entry is:
    # Service type: ntload
    #check command: $USER1$/check_nt -H $HOSTADDRESS$ -v CPULOAD -l 60,90,95
    # output: CPU Load 9% (60 min average)
    #perfdata: ’60 min avg Load’=9%;90;95;0;100
    #/perfdata:.’60 min avg Load’=(\d+)%;\d+;\d+;\d+;\d+ /
    /output:.*?(\d+)% /
    and push @s, [ ntload,
    [ avg60min, GAUGE, $1 ] ];

    the serviceextinfo entry is:
    define serviceextinfo {
    service_description Windows_Agent_CPU_Load
    # host_name
    hostgroup windows_domain_servers
    notes_url /cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntload,avg60min
    icon_image notify.gif
    icon_image_alt View Ping Graph
    }

    but i’m not getting graphs. The log file thinks its creating the RRD files, but they don’t show up. I’ve got other services running with nagiosgraph, so I’m pretty sure the basics are set up.

    any ideas?

  10. Jeff W

     /  December 6, 2005

    I had trouble with the CPU perfdata too. Here is the map file bit that is working for me.

    # Service type: ntload
    # check command: check_nt -H Address -v CPULOAD -l5,70,90,30,70,90
    # output: CPU Load 9% (5 min average) 11% (10 min average) 3% (30 min average)
    # perfdata: ‘5 min avg Load’=9%;80;90;0;100 ’10 min avg Load’=11%;80;90;0;100 ’30 min average Load’=3%;80;90;0;100
    /output:.*?Load (\d+)% \(5 min average\) (\d+)% \(10 min average\) (\d+)% /
    and push @s, [ ntload,
    [ avg05min, GAUGE, $1 ],
    [ avg10min, GAUGE, $2 ],
    [ avg30min, GAUGE, $3 ] ];

  11. John and Jeff,

    Yes you will have to make modifications to the map based upon the cpu montioring periods you want to monitor. I wanted 5 and 30. If you want more periods then you need to add variables as Jeff did. If you want less you will need to remove. You will also need to set your search string to look for the right string.

  12. Hi,
    I think you’ve done a great work in putting this map and serviceextinfo.cfg on line, they are very useful.

    Regarding to cpu load, I had to modify line 6 in this way to get data correctly understood by insert.pl:

    /output:.*?(\d+)% .*?(\d+)% /

    Hope this helps.

    Sorry for my english, it’s not my main language.

    Greetings,
    Francesco.

  13. Terry Narine

     /  June 19, 2006

    Hello, this has been a brilliant find for me – thanks ever so much for sharing your experience with us newbies. I’ve pretty much got everything work just by copying your examples but… (did you see that coming?) I can’t get the memory maps up. A quick check shows they’re not there in the rrd directory, so the map may be off. My initial suspicion is that I’m using nsclient++ and this returns the status with quotes around the first words eg:

    ‘Memory usage’=464.52Mb;2806.71;3608.63;0.00;4009.59

    Would that make a difference? I have no perl knowledge and would appreciate any input before I go barking up the wrong tree.

    Thanks again
    Terry

  14. Terry,

    I’m glad that it worked and you found it useful. If you have any NetApp products you might want to check out my new post on monitoring and graphing NetApp.

    As for your problem with memory usage, yes the quotes will make a difference. You will need to modify you map entry for the memory service to include the quote. The following should work (If you copy this make sure that you get a “regular” quote mark and not the “fancy curly” quotes)

    /perfdata:.*usage'=([.0-9]+)Mb;([.0-9]+);([.0-9]+);([.0-9]+);([.0-9]+)/

    Good luck and let me know how it goes.

  15. vijitra

     /  February 1, 2007

    Anyone know how to create the script that insert check_mysql and check_ftp performance data to map file of nagiosgraph. Nagiosgraph can’t show the graph of these plugin.

  16. H. Eikelenboom

     /  July 30, 2007

    Can you please send me the configuration files how i can get it to work.
    Or one example of a host.
    I used nrpe and nsclient but i can get it to work.

  17. Paul Nijjar

     /  August 25, 2008

    I apologise if this question is overly dumb, but have your configuration examples disappeared from this post? I don’t see them.

  18. Sorry about that, I just move my site and in the process this post got messed up. I’ve added the code back in.

  19. Hi I’m Thomas and i’m a developer of a new open source project named BrainyPDM. As you can see from our web site this open source application can store performances data from Nagios and graph the values making Hourly, Daily, Weekly, Monthly and Yearly charts. If you want you can try it and give our some feedback. The url of our site is: http://www.brainypdm.org (on source forge: http://sourceforge.net/projects/brainypdm/)

  20. PJ

     /  May 13, 2010

    I’m trying to get both the CPU Load and Disk Space to graph properly. I used the Command and map file in the example for both. The CPU just doesn’t get created in the RRD folder while the Disk Space used does create a file but the show.cgi seems to expect this file name:

    /var/nagios/rrd/Fserver1/C%3A%20Drive%20Space___ntdisk.rrd

    But the output file is:

    /var/nagios/rrd/Fserver1/C%3A%5C%20Drive%20Space___ntdisk.rrd

    Do you have any ideas on either as to why? Mem usage works just fine, if that helps.

  21. PJ,

    I finally ran across the error today (I hadn’t looked at my graphs for may months).

    I don’t know what changed, but I’m guessing an upgrade to Nagios, but the perfdata lines that are getting passed to nagiosgraph no longer contain the single ‘ quote marks after Space and usage

    For example the map entry as above looks like this,
    /perfdata:.*Space’=

    Once I removed the ‘ mark it started working again
    /perfdata:.*Space=

    If you make that change and a simialr change to the memory one you should be good to go.

  22. mwall

     /  December 19, 2010

    would it be ok with you to include these rules in the map examples in the nagiosgraph distribution?

  23. Absolutely. They are quite old, but with the change noted above I think you should be good.

    If you want I’ll see what else I may have mapped and let you know.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: