Nagiosgraph with Windows support
After reviewing the four main tools for graphing performance with Nagios (APAN, Nagiosgraph, Nagiostat, and PerfParse), I decided that Nagiosgraph was the easiest for me to get up and running. Out of the box it worked great for my Linux systems and my network tests, but I needed to add support for monitoring my Windows servers.
I have used APAN in the past, but it was really tough to configure. I also tried PerfParse and liked it. However, it required a lot more resources for the database than I was prepared to handle, and I could probably only keep 30 days of data. But it worked great.
To make things easier I installed the latest CVS nightly of the 1.4.0alpha Nagios Plugins. As of 20040817 these plugins supported performance data output for the check_nt plugin (the one that works with the NSClient service). Once these plugins were complied and installed, I updated the nagiosgraph map file. This file is what is used to parse the output for generating the stats.
Here are the additions that I added to the map file to support graphing of the Windows statistics (note this requires the 1.4.0 plugin install) as well as the fping service.
Update: I have corrected the errors as pointed out in the comments as well added an entry for the MSSQL plugin.
# Service type: ntload
# check command: check_nt -H Address -v CPULOAD -l5,70,90,30,70,90
# output: CPU Load 9% (5 min average) 11% (30 min average)
# perfdata: '5 min avg Load'=9%;70;80;0;100 '30 min avg Load'=11%;70;90;0;100
#/perfdata:.*5 min avg Load'=(\d+)%;(\d+);(\d+);\d+;\d+ '30 min avg Load'=(\d+)%;\d+;\d+;\d+;\d+ /
/output:.*?(\d+)% .*?(\d+)% /
and push @s, [ ntload,
[ avg05min, GAUGE, $1 ],
[ avg30min, GAUGE, $2 ] ];
# Service type: ntmem
# check command: check_nt -H Address -v MEMUSE -w 50 -c 90
# output: Memory usage: total:2467.75 Mb - used: 510.38 Mb (21%) - free: 1957.37 Mb (79%)
# perfdata: Memory usage=510.38Mb;1233.88;2220.98;0.00;2467.75
/perfdata:.*usage=([.0-9]+)Mb;([.0-9]+);([.0-9]+);([.0-9]+);([.0-9]+)/
and push @s, [ ntmem,
[ memused, GAUGE, $1*1024**2 ],
[ memwarn, GAUGE, $2*1024**2 ],
[ memcrit, GAUGE, $3*1024**2 ],
[ memmmax, GAUGE, $5*1024**2 ] ];
# Service type: ntdisk
# check command: check_nt -H Address -v USEDDISKSPACE -lc -w 75 -c 90
# output: c:\ - total: 25.87 Gb - used: 4.10 Gb (16%) - free 21.77 Gb (84%)
# perfdata: c:\ Used Space=4.10Gb;19.40;23.28;0.00;25.87
/perfdata:.*Space=([.0-9]+)Gb;([.0-9]+);([.0-9]+);([.0-9]+);([.0-9]+)/
and push @s, [ ntdisk,
[ diskused, GAUGE, $1*1024**3 ],
[ diskwarn, GAUGE, $2*1024**3 ],
[ diskcrit, GAUGE, $3*1024**3 ],
[ diskmaxi, GAUGE, $5*1024**3 ] ];
# Service type: fping
# output:FPING OK - 10.1.1.1 (loss=20%, rta=385.000000 ms)
# perfdata: loss=20%;79;100;0;100 rta=0.385000s;2.000000;5.000000;0.000000
#/output:PING.*?(\d+)%.+?([.\d]+)\sms/
/perfdata:.*loss=(\d+)%.*rta=([.0-9]+)s;/
and push @s, [ fping,
[ losspct, GAUGE, $1 ],
[ rta, GAUGE, $2 ] ];
# Service type: mssql
# output: OK - MS SQL Server 2000 has 42 user(s) connected: 18 appTrendCtrMgr, 1 nagios, 2 NTAUTHORITY\SYSTEM.
# perfdata: users=36;;;;
#/output:*.MS SQL Server 2000 has ([0-9]+) use/
/perfdata:.*users=([0-9]+)/
and push @s, [ mssql,
[ users, GAUGE, $1 ] ];
In addition here are the entries that created for serviceextinfo.cfg file to produce the graphs:
define serviceextinfo {
service_description FPing
host_name host
notes_url /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=fping,rta&db=fping,losspct
icon_image graph.png
icon_image_alt View graphs
}
define serviceextinfo {
service_description NTload
host_name ntserver1,ntserver2
notes_url /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntload,avg05min,avg30min
icon_image graph.png
icon_image_alt View graphs
}
define serviceextinfo {
service_description NTmem
host_name ntserver1,ntserver2
notes_url /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntmem,memused,memwarn,memcrit,memmmax
icon_image graph.png
icon_image_alt View graphs
}
define serviceextinfo {
service_description NTdiskC
host_name ntserver1,ntserver2
notes_url /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntdisk,diskused,diskwarn,diskcrit,diskmaxi
icon_image graph.png
icon_image_alt View graphs
}
define serviceextinfo {
service_description NTdiskD
host_name ntserver1,ntserver2
notes_url /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntdisk,diskused,diskwarn,diskcrit,diskmaxi
icon_image graph.png
icon_image_alt View graphs
}
define serviceextinfo {
service_description NTdiskE
host_name ntserver1,ntserver2
notes_url /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntdisk,diskused,diskwarn,diskcrit,diskmaxi
icon_image graph.png
icon_image_alt View graphs
}
define serviceextinfo {
service_description MSSQL
host_name bsql1
notes_url /nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=mssql,users
icon_image graph.png
icon_image_alt View graphs
}
Tags: check_nt, graph, monitoring, nagios, opensource, windows
You can comment below, or link to this permanent URL from your own site.
May 26, 2005 at 7:52 am
It took a while to find your posting but its very welcome and valuable as we dont have any perl skills and the map file in nagiosgraph only contains unix command mappings. We are now actively graphing our Disk usage, memory and ping. CPU doesnt work – I think the mapping is wrong – maybe there was a change from plugins1.4Alpha to the release version. Any other maps and serviceextinfo would be very welcome!
Regards,
Adrian
May 26, 2005 at 8:30 am
I just doubled check what I have running in production and it is exactly as above. I’m now running the released 1.4 plugins and everything is fine.
One thing to double check, would be to see if you are running the correct check command. The one that I use is check_nt -H Address -v CPULOAD -l5,70,90,30,70,90 and the output should look like this CPU Load 5% (5 min average) 2% (30 min average) | ‘5 min avg Load’=5%;70;90;0;100 ‘30 min avg Load’=2%;70;90;0;100.
If it doesn’t, then post/send me the commend you are using and the output you are getting and I’ll see what I can do.
May 26, 2005 at 12:41 pm
Hi,
We have it!
We needed a d in the map below to get it to work.
# Service type: ntload
# check command: check_nt -H Address -v CPULOAD -l5,70,90,30,70,90 # output: CPU Load 9% (5 min average) 11% (30 min average) # perfdata: ‘5 min avg Load’=9%;70;80;0;100 ‘30 min avg Load’=11%;70;90;0;100
#/perfdata:.*5 min avg Load’=(d+)%;(d+);(d+);d+;d+ ‘30 min avg Load’=(d+)%;d+;d+;d+;d+ / /output:.*?(d+)% .*?(d+)% / and push @s, [ ntload, [ avg05min, GAUGE, $1 ], [ avg30min, GAUGE, $2 ] ];
In the map file you have posted, you have (d+) and (d+) seems to work for us.
Many thanks for all your help.
Regards,
Adrian
May 31, 2005 at 4:34 am
I’ve got it: This web interface is eating backslashes.
So if there is d, where you expect number, you have to write down “backslash”d.
May 31, 2005 at 6:52 am
Another small bug:
in serviceextinfo.cfg there have to be memused insted of memfree, so whole line would be:
/nagiosgraph/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntmem,memused,memwarn,memcrit,memmmax
June 6, 2005 at 4:48 pm
I have updated the entry to reflect all of the corrections and changes submitted.
October 30, 2005 at 9:04 pm
Thanks for putting this up. It has helped immensely.
November 13, 2005 at 12:25 pm
hello there..
thanks for the help..
once question, how would one create a graph for checking diskspace using check_nrpe:
Input servicedescr:Disk Space C:
Input hostname:acsjhba5sp02
Input perfdata:
Input lastcheck:1131898498
Input output:OK: C:: Total: 37.3G – Used: 4.74G (12%) – Free: 32.5G (88%)
Any help would be great..
thanks
Chris
November 23, 2005 at 2:00 pm
So i’m having problems getting the graphs or the RRD files written for check_nt cpu load.
the command is:
$USER1$/check_nt -H $HOSTADDRESS$ -v CPULOAD -l 60,90,95
the nagiosgraph map file entry is:
# Service type: ntload
#check command: $USER1$/check_nt -H $HOSTADDRESS$ -v CPULOAD -l 60,90,95
# output: CPU Load 9% (60 min average)
#perfdata: ‘60 min avg Load’=9%;90;95;0;100
#/perfdata:.’60 min avg Load’=(\d+)%;\d+;\d+;\d+;\d+ /
/output:.*?(\d+)% /
and push @s, [ ntload,
[ avg60min, GAUGE, $1 ] ];
the serviceextinfo entry is:
define serviceextinfo {
service_description Windows_Agent_CPU_Load
# host_name
hostgroup windows_domain_servers
notes_url /cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=ntload,avg60min
icon_image notify.gif
icon_image_alt View Ping Graph
}
but i’m not getting graphs. The log file thinks its creating the RRD files, but they don’t show up. I’ve got other services running with nagiosgraph, so I’m pretty sure the basics are set up.
any ideas?
December 6, 2005 at 4:19 pm
I had trouble with the CPU perfdata too. Here is the map file bit that is working for me.
# Service type: ntload
# check command: check_nt -H Address -v CPULOAD -l5,70,90,30,70,90
# output: CPU Load 9% (5 min average) 11% (10 min average) 3% (30 min average)
# perfdata: ‘5 min avg Load’=9%;80;90;0;100 ‘10 min avg Load’=11%;80;90;0;100 ‘30 min average Load’=3%;80;90;0;100
/output:.*?Load (\d+)% \(5 min average\) (\d+)% \(10 min average\) (\d+)% /
and push @s, [ ntload,
[ avg05min, GAUGE, $1 ],
[ avg10min, GAUGE, $2 ],
[ avg30min, GAUGE, $3 ] ];
December 6, 2005 at 5:39 pm
John and Jeff,
Yes you will have to make modifications to the map based upon the cpu montioring periods you want to monitor. I wanted 5 and 30. If you want more periods then you need to add variables as Jeff did. If you want less you will need to remove. You will also need to set your search string to look for the right string.
April 6, 2006 at 6:07 am
Hi,
I think you’ve done a great work in putting this map and serviceextinfo.cfg on line, they are very useful.
Regarding to cpu load, I had to modify line 6 in this way to get data correctly understood by insert.pl:
/output:.*?(\d+)% .*?(\d+)% /
Hope this helps.
Sorry for my english, it’s not my main language.
Greetings,
Francesco.
June 19, 2006 at 6:22 pm
Hello, this has been a brilliant find for me – thanks ever so much for sharing your experience with us newbies. I’ve pretty much got everything work just by copying your examples but… (did you see that coming?) I can’t get the memory maps up. A quick check shows they’re not there in the rrd directory, so the map may be off. My initial suspicion is that I’m using nsclient++ and this returns the status with quotes around the first words eg:
‘Memory usage’=464.52Mb;2806.71;3608.63;0.00;4009.59
Would that make a difference? I have no perl knowledge and would appreciate any input before I go barking up the wrong tree.
Thanks again
Terry
June 20, 2006 at 8:43 am
Terry,
I’m glad that it worked and you found it useful. If you have any NetApp products you might want to check out my new post on monitoring and graphing NetApp.
As for your problem with memory usage, yes the quotes will make a difference. You will need to modify you map entry for the memory service to include the quote. The following should work (If you copy this make sure that you get a “regular” quote mark and not the “fancy curly” quotes)
/perfdata:.*usage'=([.0-9]+)Mb;([.0-9]+);([.0-9]+);([.0-9]+);([.0-9]+)/Good luck and let me know how it goes.
February 1, 2007 at 1:52 pm
Anyone know how to create the script that insert check_mysql and check_ftp performance data to map file of nagiosgraph. Nagiosgraph can’t show the graph of these plugin.
July 30, 2007 at 7:58 am
Can you please send me the configuration files how i can get it to work.
Or one example of a host.
I used nrpe and nsclient but i can get it to work.
August 25, 2008 at 10:12 pm
I apologise if this question is overly dumb, but have your configuration examples disappeared from this post? I don’t see them.
August 26, 2008 at 7:33 am
Sorry about that, I just move my site and in the process this post got messed up. I’ve added the code back in.
October 15, 2008 at 12:25 pm
Hi I’m Thomas and i’m a developer of a new open source project named BrainyPDM. As you can see from our web site this open source application can store performances data from Nagios and graph the values making Hourly, Daily, Weekly, Monthly and Yearly charts. If you want you can try it and give our some feedback. The url of our site is: http://www.brainypdm.org (on source forge: http://sourceforge.net/projects/brainypdm/)