Wednesday, September 5, 2012

Server resource utilization monitoring

Pretty much every Linux distribution has tools to help sys admins monitor the utilization of common server resources such as CPU, memory and storage. For advanced monitoring and reporting we may need to install 3rd-party tools (e.g. Nagios or FAN) but sometimes we just want a simple way to get warnings when critical server resources may be close to overload.

Here is a set of simple shell scripts (tested in RHEL and CentOS) you can use to monitor CPU, memory and storage space. If utilization goes over a threshold these scripts will send you an email warning hopefully soon enough so that you can fix it before a system crash. Assuming you have access to basic commands like vmstat, df, mail and crontab these scripts can be setup even if you do not have root access to the server.


CPU Monitor

This will send you an email notification if the average CPU utilization of the last 10 seconds is higher than the configured threshold.
#!/bin/sh

#
#CONFIGURATION BLOCK:
#

#Set the alarm threshold here
#e.g. set to 90 to make it send notifications if average utilization goes past 90%
THRESHOLD=90
#Add your email address here
MAILTO="myemail@eventfy.com"
#The temp file below will be replaced everytime this script runs
TEMPFILE=/tmp/cpu.temp

#
#LOGIC BLOCK:
#

HOSTNAME=`hostname`
rm -f $TEMPFILE 

#Calculate the CPU Utilization with the command below
#Note: this command will calculate the avg CPU utilization of last 10 secs
CPU=$(vmstat 10 2 | tail -1 | awk '{print $15}' | sed 's/%//')
let "CPU = 100 - $CPU"
#echo "`date` - CPU Utilization: $CPU"

#Compare the current value with the threshold one.
if [ $(expr $CPU ">=" $THRESHOLD) -ne 0 ]
then
echo "Warning - CPU utilization on server $HOSTNAME is ${CPU}%" >> $TEMPFILE
fi

#Send an email if /tmp/cpu.temp is present.
if [ -e $TEMPFILE ]
then
mail -s "$HOSTNAME CPU Warning" $MAILTO < $TEMPFILE
fi

rm -f $TEMPFILE


Memory Monitor

This will send you an email notification if the percentage of used RAM + swap is higher than the configured threshold.
#!/bin/sh

#
#CONFIGURATION BLOCK:
#

#Set the alarm threshold here
#e.g. set to 90 to make it send notifications if memory utilization goes past 90%
THRESHOLD=90
#Add your email address here
MAILTO="myemail@eventfy.com"
#The temp file below will be replaced everytime this script runs
TEMPFILE=/tmp/memory.temp

#
#LOGIC BLOCK:
#

HOSTNAME=`hostname`
rm -f $TEMPFILE

#Calculate the SWAP Utilization with the below command.
#Note: this script will take avg utilization of last 10 secs
TOTAL_RAM=$(vmstat 1 2 -s | grep "total memory" | awk '{print $1}')
#echo "`date` - TOTAL_RAM: $TOTAL_RAM"
USED_RAM=$(vmstat 1 2 -s | grep "used memory" | awk '{print $1}')
#echo "`date` - USED_RAM: $USED_RAM"

TOTAL_SWAP=$(vmstat 1 2 -s | grep "total swap" | awk '{print $1}')
#echo "`date` - TOTAL_SWAP: $TOTAL_SWAP"
USED_SWAP=$(vmstat 1 2 -s | grep "used swap" | awk '{print $1}')
#echo "`date` - USED_SWAP: $USED_SWAP"

PERCENT_USE=0
let "PERCENT_USE=100 * ($USED_RAM + $USED_SWAP) / ($TOTAL_RAM + $TOTAL_SWAP)"
#echo "`date` - Memory Utilization: $PERCENT_USE"

#Compare the current value with the threshold one.
if [ $(expr $PERCENT_USE ">=" $THRESHOLD) -ne 0 ]
then
echo "Warning - memory utilization on server $HOSTNAME is ${PERCENT_USE}%" >> $TEMPFILE
fi

#Send an email if /tmp/memory.temp is present.
if [ -e $TEMPFILE ]
then
mail -s "$HOSTNAME Memory Warning" $MAILTO < $TEMPFILE
fi

rm -f $TEMPFILE


Disk Space Monitor

This will send you an email notification if a target partition's percentage used space is higher than the configured threshold.
#!/bin/sh

#
#CONFIGURATION BLOCK:
#

#Set the alarm threshold here
#e.g. set to 70 to make it send notifications if partition utilization goes past 70%
THRESHOLD=70
#Add your email address here
MAILTO="myemail@eventfy.com"
#The temp file below will be replaced everytime this script runs
TEMPFILE=/tmp/diskspace.temp
#Set target partition with unique substring identifier based on the output of 'df -P'
TARGET_PARTITION="LogVol00"

#
#LOGIC BLOCK:
#

HOSTNAME=`hostname`
rm -f $TEMPFILE

#Calculate the partition utilization with the command below
PERCENT_USE=$(df -P | grep "$TARGET_PARTITION" | awk '{print $5}' | sed 's/%//')
#echo "`date` - Disk space usage: $PERCENT_USE"

#Compare the current value with the threshold one.
if [ $(expr $PERCENT_USE ">=" $THRESHOLD) -ne 0 ]
then
echo "Warning - low disk space (${PERCENT_USE}%) available on partition ${TARGET_PARTITION} in server $HOSTNAME" >> $TEMPFILE
fi

#Send an email if /tmp/diskspace.temp is present.
if [ -e $TEMPFILE ]
then
mail -s "$HOSTNAME Disk Space Warning" $MAILTO < $TEMPFILE
fi

rm -f $TEMPFILE


Setup

1. With a text editor (e.g. vi) create new shell script files so you can copy/paste the code above.
2. Make sure you edit the configuration part of each script (e.g. your email address, target partition etc).
3. Setup crontab to execute these scripts every X minutes:

Use "crontab -e" to edit and save your changes. Assuming you will keep these scripts in "/root/monitor" and want to run each one every 10 minutes your crontab configuration would look like this:
0,10,20,30,40,50 * * * * /root/monitor/cpu.sh
1,11,21,31,41,51 * * * * /root/monitor/memory.sh
2,12,22,32,42,52 * * * * /root/monitor/diskspace.sh

Notes

These scripts do not require special permissions and should run fine with non-root users in most systems. It may be a good idea to setup these scripts as root if you have root access though.
These scripts are very lightweight but it may still be a good idea to set them so they don't run at the same time like above.
You can use "crontab -l" to list and review the crontab configuration.

Additional Notes

You should test these scripts in your server if you plan to rely on these notifications. You can run a quick test by setting the thresholds to some low number and validate that you are getting emails.

If you get error "mail: command not found" you may need to run this as root:
yum install mailx
And finally, keep in mind that these scripts will be sending you emails until the problem is fixed. It would be simple to change it to only send emails when the situation changes, but I was okay with emails every 10 mins until the problem is fixed.

No comments:

Post a Comment