Nagiosgraph with Windows support

After reviewing the four main tools for graphing performance with Nagios (APAN, Nagiosgraph, Nagiostat, and PerfParse), I decided that Nagiosgraph was the easiest for me to get up and running. Out of the box it worked great for my Linux systems and my network tests, but I needed to add support for monitoring my Windows servers.

I have used APAN in the past, but it was really tough to configure. I also tried PerfParse and liked it. However, it required a lot more resources for the database than I was prepared to handle, and I could probably only keep 30 days of data. But it worked great.

To make things easier I installed the latest CVS nightly of the 1.4.0alpha Nagios Plugins. As of 20040817 these plugins supported performance data output for the check_nt plugin (the one that works with the NSClient service). Once these plugins were complied and installed, I updated the nagiosgraph map file. This file is what is used to parse the output for generating the stats.

Here are the additions that I added to the map file to support graphing of the Windows statistics (note this requires the 1.4.0 plugin install) as well as the fping service.

Update: I have corrected the errors as pointed out in the comments as well added an entry for the MSSQL plugin.


# Service type: ntload
#   check command: check_nt -H Address -v CPULOAD -l5,70,90,30,70,90
#   output: CPU Load 9% (5 min average) 11% (30 min average)
#   perfdata: '5 min avg Load'=9%;70;80;0;100 '30 min avg Load'=11%;70;90;0;100
#/perfdata:.*5 min avg Load'=(\d+)%;(\d+);(\d+);\d+;\d+ '30 min avg Load'=(\d+)%;\d+;\d+;\d+;\d+ /
/output:.*?(\d+)% .*?(\d+)% /
and push @s, [ ntload,
       [ avg05min, GAUGE, $1 ],
       [ avg30min, GAUGE, $2 ] ];

# Service type: ntmem
#   check command: check_nt -H Address -v MEMUSE -w 50 -c 90
#   output: Memory usage: total:2467.75 Mb - used: 510.38 Mb (21%) - free: 1957.37 Mb (79%)
#   perfdata: Memory usage=510.38Mb;1233.88;2220.98;0.00;2467.75
/perfdata:.*usage=([.0-9]+)Mb;([.0-9]+);([.0-9]+);([.0-9]+);([.0-9]+)/
and push @s, [ ntmem,
       [ memused, GAUGE, $1*1024**2 ],
       [ memwarn, GAUGE, $2*1024**2 ],
       [ memcrit, GAUGE, $3*1024**2 ],
       [ memmmax, GAUGE, $5*1024**2 ] ];

# Service type: ntdisk
#   check command: check_nt -H Address -v USEDDISKSPACE -lc -w 75 -c 90
#   output: c:\ - total: 25.87 Gb - used: 4.10 Gb (16%) - free 21.77 Gb (84%)
#   perfdata: c:\ Used Space=4.10Gb;19.40;23.28;0.00;25.87
/perfdata:.*Space=([.0-9]+)Gb;([.0-9]+);([.0-9]+);([.0-9]+);([.0-9]+)/
and push @s, [ ntdisk,
       [ diskused, GAUGE, $1*1024**3 ],
       [ diskwarn, GAUGE, $2*1024**3 ],
       [ diskcrit, GAUGE, $3*1024**3 ],
       [ diskmaxi, GAUGE, $5*1024**3 ] ];

# Service type: fping
#   output:FPING OK - 10.1.1.1 (loss=20%, rta=385.000000 ms)
#   perfdata: loss=20%;79;100;0;100 rta=0.385000s;2.000000;5.000000;0.000000
#/output:PING.*?(\d+)%.+?([.\d]+)\sms/
/perfdata:.*loss=(\d+)%.*rta=([.0-9]+)s;/
and push @s, [ fping,
       [ losspct, GAUGE, $1 ],
       [ rta,     GAUGE, $2 ] ];

# Service type: mssql
#   output:   OK - MS SQL Server 2000 has 42 user(s) connected:  18 appTrendCtrMgr, 1 nagios, 2 NTAUTHORITY\SYSTEM.
#   perfdata: users=36;;;;
#/output:*.MS SQL Server 2000 has ([0-9]+) use/
/perfdata:.*users=([0-9]+)/
and push @s, [ mssql,
       [ users, GAUGE, $1 ] ];

In addition here are the entries that created for serviceextinfo.cfg file to produce the graphs:


define serviceextinfo {
  service_description  FPing
  host_name   host
  notes_url      /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=fping,rta&db=fping,losspct
  icon_image  graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  NTload
  host_name       ntserver1,ntserver2
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntload,avg05min,avg30min
  icon_image      graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  NTmem
  host_name      ntserver1,ntserver2
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntmem,memused,memwarn,memcrit,memmmax
  icon_image      graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  NTdiskC
  host_name      ntserver1,ntserver2
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntdisk,diskused,diskwarn,diskcrit,diskmaxi
  icon_image      graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  NTdiskD
  host_name      ntserver1,ntserver2
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntdisk,diskused,diskwarn,diskcrit,diskmaxi
  icon_image      graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  NTdiskE
  host_name      ntserver1,ntserver2
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntdisk,diskused,diskwarn,diskcrit,diskmaxi
  icon_image      graph.png
  icon_image_alt  View graphs
}

define serviceextinfo {
  service_description  MSSQL
  host_name       bsql1
  notes_url       /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=mssql,users
  icon_image      graph.png
  icon_image_alt  View graphs
}
Explore posts in the same categories: Net Management, Work

Tags: , , , , ,

You can comment below, or link to this permanent URL from your own site.

19 Comments on “Nagiosgraph with Windows support”

  1. Adrian Rath Says:

    It took a while to find your posting but its very welcome and valuable as we dont have any perl skills and the map file in nagiosgraph only contains unix command mappings. We are now actively graphing our Disk usage, memory and ping. CPU doesnt work – I think the mapping is wrong – maybe there was a change from plugins1.4Alpha to the release version. Any other maps and serviceextinfo would be very welcome!
    Regards,
    Adrian

  2. Ken Nerhood Says:

    I just doubled check what I have running in production and it is exactly as above. I’m now running the released 1.4 plugins and everything is fine.

    One thing to double check, would be to see if you are running the correct check command. The one that I use is check_nt -H Address -v CPULOAD -l5,70,90,30,70,90 and the output should look like this CPU Load 5% (5 min average) 2% (30 min average) | ‘5 min avg Load’=5%;70;90;0;100 ‘30 min avg Load’=2%;70;90;0;100.

    If it doesn’t, then post/send me the commend you are using and the output you are getting and I’ll see what I can do.

  3. Adrian Rath Says:

    Hi,

    We have it!

    We needed a d in the map below to get it to work.

    # Service type: ntload
    # check command: check_nt -H Address -v CPULOAD -l5,70,90,30,70,90 # output: CPU Load 9% (5 min average) 11% (30 min average) # perfdata: ‘5 min avg Load’=9%;70;80;0;100 ‘30 min avg Load’=11%;70;90;0;100
    #/perfdata:.*5 min avg Load’=(d+)%;(d+);(d+);d+;d+ ‘30 min avg Load’=(d+)%;d+;d+;d+;d+ / /output:.*?(d+)% .*?(d+)% / and push @s, [ ntload, [ avg05min, GAUGE, $1 ], [ avg30min, GAUGE, $2 ] ];

    In the map file you have posted, you have (d+) and (d+) seems to work for us.

    Many thanks for all your help.

    Regards,
    Adrian

  4. Bohumil Kriz Says:

    I’ve got it: This web interface is eating backslashes.
    So if there is d, where you expect number, you have to write down “backslash”d.

  5. Bohumil Kriz Says:

    Another small bug:
    in serviceextinfo.cfg there have to be memused insted of memfree, so whole line would be:
    /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntmem,memused,memwarn,memcrit,memmmax

  6. Ken Nerhood Says:

    I have updated the entry to reflect all of the corrections and changes submitted.

  7. Tim Says:

    Thanks for putting this up. It has helped immensely.

  8. Chris Says:

    hello there..

    thanks for the help..

    once question, how would one create a graph for checking diskspace using check_nrpe:

    Input servicedescr:Disk Space C:
    Input hostname:acsjhba5sp02
    Input perfdata:
    Input lastcheck:1131898498
    Input output:OK: C:: Total: 37.3G – Used: 4.74G (12%) – Free: 32.5G (88%)

    Any help would be great..

    thanks

    Chris


  9. So i’m having problems getting the graphs or the RRD files written for check_nt cpu load.

    the command is:
    $USER1$/check_nt -H $HOSTADDRESS$ -v CPULOAD -l 60,90,95

    the nagiosgraph map file entry is:
    # Service type: ntload
    #check command: $USER1$/check_nt -H $HOSTADDRESS$ -v CPULOAD -l 60,90,95
    # output: CPU Load 9% (60 min average)
    #perfdata: ‘60 min avg Load’=9%;90;95;0;100
    #/perfdata:.’60 min avg Load’=(\d+)%;\d+;\d+;\d+;\d+ /
    /output:.*?(\d+)% /
    and push @s, [ ntload,
    [ avg60min, GAUGE, $1 ] ];

    the serviceextinfo entry is:
    define serviceextinfo {
    service_description Windows_Agent_CPU_Load
    # host_name
    hostgroup windows_domain_servers
    notes_url /cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntload,avg60min
    icon_image notify.gif
    icon_image_alt View Ping Graph
    }

    but i’m not getting graphs. The log file thinks its creating the RRD files, but they don’t show up. I’ve got other services running with nagiosgraph, so I’m pretty sure the basics are set up.

    any ideas?

  10. Jeff W Says:

    I had trouble with the CPU perfdata too. Here is the map file bit that is working for me.

    # Service type: ntload
    # check command: check_nt -H Address -v CPULOAD -l5,70,90,30,70,90
    # output: CPU Load 9% (5 min average) 11% (10 min average) 3% (30 min average)
    # perfdata: ‘5 min avg Load’=9%;80;90;0;100 ‘10 min avg Load’=11%;80;90;0;100 ‘30 min average Load’=3%;80;90;0;100
    /output:.*?Load (\d+)% \(5 min average\) (\d+)% \(10 min average\) (\d+)% /
    and push @s, [ ntload,
    [ avg05min, GAUGE, $1 ],
    [ avg10min, GAUGE, $2 ],
    [ avg30min, GAUGE, $3 ] ];

  11. kbn Says:

    John and Jeff,

    Yes you will have to make modifications to the map based upon the cpu montioring periods you want to monitor. I wanted 5 and 30. If you want more periods then you need to add variables as Jeff did. If you want less you will need to remove. You will also need to set your search string to look for the right string.


  12. Hi,
    I think you’ve done a great work in putting this map and serviceextinfo.cfg on line, they are very useful.

    Regarding to cpu load, I had to modify line 6 in this way to get data correctly understood by insert.pl:

    /output:.*?(\d+)% .*?(\d+)% /

    Hope this helps.

    Sorry for my english, it’s not my main language.

    Greetings,
    Francesco.

  13. Terry Narine Says:

    Hello, this has been a brilliant find for me – thanks ever so much for sharing your experience with us newbies. I’ve pretty much got everything work just by copying your examples but… (did you see that coming?) I can’t get the memory maps up. A quick check shows they’re not there in the rrd directory, so the map may be off. My initial suspicion is that I’m using nsclient++ and this returns the status with quotes around the first words eg:

    ‘Memory usage’=464.52Mb;2806.71;3608.63;0.00;4009.59

    Would that make a difference? I have no perl knowledge and would appreciate any input before I go barking up the wrong tree.

    Thanks again
    Terry

  14. kbn Says:

    Terry,

    I’m glad that it worked and you found it useful. If you have any NetApp products you might want to check out my new post on monitoring and graphing NetApp.

    As for your problem with memory usage, yes the quotes will make a difference. You will need to modify you map entry for the memory service to include the quote. The following should work (If you copy this make sure that you get a “regular” quote mark and not the “fancy curly” quotes)

    /perfdata:.*usage'=([.0-9]+)Mb;([.0-9]+);([.0-9]+);([.0-9]+);([.0-9]+)/

    Good luck and let me know how it goes.

  15. vijitra Says:

    Anyone know how to create the script that insert check_mysql and check_ftp performance data to map file of nagiosgraph. Nagiosgraph can’t show the graph of these plugin.

  16. H. Eikelenboom Says:

    Can you please send me the configuration files how i can get it to work.
    Or one example of a host.
    I used nrpe and nsclient but i can get it to work.

  17. Paul Nijjar Says:

    I apologise if this question is overly dumb, but have your configuration examples disappeared from this post? I don’t see them.

  18. kbn Says:

    Sorry about that, I just move my site and in the process this post got messed up. I’ve added the code back in.

  19. TB Says:

    Hi I’m Thomas and i’m a developer of a new open source project named BrainyPDM. As you can see from our web site this open source application can store performances data from Nagios and graph the values making Hourly, Daily, Weekly, Monthly and Yearly charts. If you want you can try it and give our some feedback. The url of our site is: http://www.brainypdm.org (on source forge: http://sourceforge.net/projects/brainypdm/)


Comment: