Setting up MySQL monitoring with Nagios

October 8th, 2008 by atomic

It’s been a while since my last post! I have a few half-written entries, but I figured I just get one out there so I don’t appear to have dropped off the face of the planet :)

Being more of a DBA than a sysadmin myself, I’ve noticed that there is a surprising dearth of information regarding how to actually get nagios set up to monitor MySQL, especially if you’re not experienced with nagios.

In my own experience, MySQL monitoring often ends up being a homegrown thing, especially if you’re a small shop. It is definitely a good idea to consolidate all of your monitoring, DB-related or not, into one solution: roll-your-own monitoring solutions can be brittle and burn you in the end. One example that comes to mind, a few years back i had a quick perl-based script to monitor replication and set it to mail a few people based on some threshold. Took me 1 hour to write and worked well enough, except, six months later, when some package upgrades were done on the machine I had the monitoring script running on, the mailer stopped working, and we found out the hard way about an alert situation with our replication.

Nagios

Nagios is something your systems/operations people are probably familiar with. In case you are the systems/ops person, it’s a 15-20 minute process to get nagios installed on a typical ubuntu machine. Make sure to set up nagios with the embedded perl interpreter in case you decide to make use of this feature in the future. Once you have it set up and monitoring, here’s how you can set up nagios to monitor your mysql databases:

You can grab some perl-based nagios plugins to monitor mysql here:

http://www.shinguz.ch/MySQL/nagios-mysql-plugins-0.3.tar.gz

Untar the plugins into /usr/local/nagios/libexec

If you want to check out how a particular plugin is executed, run it from the command line as follows:

vmware@vmware-desktop:~/dl/nagios-mysql-plugins-0.3$ perl check_repl_mysql_seconds_behind_master.pl --help

SYNOPSIS

check_repl_mysql_seconds_behind_master.pl flags

DESCRIPTION

Nagios plugin to check how many seconds slave is behind master...

FLAGS

help, ?      Print this help.
host, h      Host where database is located (default localhost)
password, p  Password of user root (default ><)
port         Port where database listens to (default 3306)
user, u      User for connecting to the database (default root)
critical|c   Seconds for critical level (default 600)
warning|w    Seconds for warning level (default 60)

PARAMETERS

none

As an example, let’s say we want to monitor the lag of a particular replicator. Nagios operates on thresholds, the first stage being a warning, the second being a critical alert.

Adding a Nagios Command

In your /usr/local/nagios/etc/objects/commands.cfg file, add the following “command” entry — you should see others like it. This specifies a type of command you will run on a given host:

# 'check_repl_mysql_seconds_behind_master' command definition
define command{
command_name    check_repl_mysql_seconds_behind_master
command_line    $USER1$/check_repl_mysql_seconds_behind_master.pl --host=$HOSTADDRESS$ --password=$ARG1
}

In this case we’re just using most of the default parameters, but if we wanted to specify a different critical or warning threshold, we could add it with –critical=$ARG2, etc.

Then, if you’re monitoring localhost (probably not in most cases) you would just add the following entry to /usr/local/nagios/etc/localhost.cfg:

define service{
use                             local-service         ; Name of service template to use
host_name                       localhost
service_description             MySQL Replication
check_command                   check_repl_mysql_seconds_behind_master!"<password>"
notifications_enabled           0
}

If you want to add another host to be monitored from this nagios instance, you can specify a new configration file, say, db01.cfg and write that to your etc/objects directory. Then add an entry to /usr/local/nagios/etc/nagios.cfg as follows, after making the appropriate adjustments to the hostname in the service:

cfg_file=/usr/local/nagios/etc/objects/db01.cfg

If everything has been configured correctly, you should be able to restart nagios with /etc/init.d/nagios restart, and then go into your nagios console and see the configuration entries for your new command and service, and also see if your slaves are far behind :-)

Once you have one service added in this way, adding new ones becomes a lot easier. I hope to investigate some more of the performance monitoring features of the nagios mysql plugins, but i’ll leave that for a future blog post.

Posted in monitoring, mysql, replication | 3 Comments »

mysql proxy 0.6.1 performance tests

July 2nd, 2008 by atomic

The mysql proxy project has tremendous potential to make mysql administration and usage easier. I decided to throw some load at it to get a feel for how stable and performant it is.

On EC2, I set up 6 “small” images in an example proxy setup:

- One client machine to run sysbench
- One machine to act as a mysql proxy machine, running 0.6.1 (FC4 binary)
- Four identical database servers, running mysql 5.0.45

The database configuration was largely default, with InnoDB configured for 64MB buffer pool (just enough to ensure the sysbench table was entirely in memory), 512MB log files, and 1024 max connections.

mysql-proxy was run with the following command:

mysql-proxy –proxy-backend-addresses=ip-10-251-66-63.ec2.internal:3306 –proxy-backend-addresses=ip-10-251-71-21.ec2.internal:3306 –proxy-backend-addresses=ip-10-251-43-70.ec2.internal:3306 –proxy-backend-addresses=ip-10-251-75-52.ec2.internal:3306

Using the sysbench database benchmark utility to generate some heavy load on the boxes individually (to verify they were all more or less equal), and on the proxy itself, I got some interesting results:

Read/write operations/second:



Transactions/second:



Somewhat as I expected, mysql-proxy presents a bit of overhead when load is light. With only 8 concurrent threads, running against the proxy pointing to 4 equal database servers is slower than hitting one server itself.

As load goes up, however, you can see that the proxy scales reasonably well, while the accesses to the direct systems begin to tank rapidly after 64 concurrent threads (I lost some of my 128 thread results inadvertently and the instances are destroyed :( I’m fairly comfortable in interoplating the result to be roughly in between 64 and 256 )

Somewhat unsettling, however, was the fact that sysbench or mysql proxy failed to work beyond about 500 concurrent threads. My debug output for sysbench just stops here:

DEBUG: mysql_real_connect(0×9adec70, “ip-10-251-75-160″, “sbtest”, “(null)”, “sbtest”, 4040, “(null)”, CLIENT_MULTI_STATEMENTS)
DEBUG: mysql_init(0×9ae51b0)
DEBUG: mysql_options(0×9ae51b0, MYSQL_READ_DEFAULT_

On the servers themselves, I can see the 125+ connections to each individual box, and all looks normal. I tried tweaking some of the thread settings in ulimit and sysbench itself to no avail, but didn’t have a ton of time to play with it. I decided the results were already interesting enough and will see if I can revisit this another time.

Another reason i am not so concerned is that more than one proxy can be utilised in many typical environments, so the ability of a single mysql-proxy instance to handle 512+ concurrent connections might not be show-stopper.

If anyone out there has done some similar tests with mysql proxy, i welcome your feedback!

Posted in mysql, mysql-proxy, performance | 8 Comments »

A lesson in never letting your filesystem fill up too much

June 26th, 2008 by atomic

A good rule of thumb is that a DBA should not let the size of his/her database grow to much more than 50% of the available disk space. I’m sure everyone has a story of being burned in some way by not following this advice.

Before promoting 5.1 to some beefier hardware, I’ve got a few large tables sitting on a small SATA-based machine. Noticing that a few were growing faster than I expected, i decided to pause my loading and convert another one of them to the archive storage engine to save some space.

I certainly saved space.

By trying to do too much at one time, I had some other temp files lying around on the same filesystem, and my conversion to the archive storage engine failed at about 80% through with my disk full.

mysql> alter table w_stats engine = Archive;
ERROR 1030 (HY000): Got error -1 from storage engine

mysql 5.1 has an unfortunate bug that I discovered, and reported, that causes the unfinished archive files to remain after the failed conversion, and not the original data. My disk space problems are no longer much of a problem :)

Now I know that this was basically my fault, and that I should have been more careful with my system, but I would have preferred maybe an embarassing error message, a slap on the wrist, not the loss of a 150GB table! For example:

mysql> alter table w_stats engine = Archive;
ERROR 1030 (HY000):  df -h is your friend

Well, it sounded funny in my head at least…

Posted in mysql | 1 Comment »

« Previous Entries Next Entries »