Setting up MySQL monitoring with Nagios

October 8th, 2008 by atomic

It’s been a while since my last post! I have a few half-written entries, but I figured I just get one out there so I don’t appear to have dropped off the face of the planet :)

Being more of a DBA than a sysadmin myself, I’ve noticed that there is a surprising dearth of information regarding how to actually get nagios set up to monitor MySQL, especially if you’re not experienced with nagios.

In my own experience, MySQL monitoring often ends up being a homegrown thing, especially if you’re a small shop. It is definitely a good idea to consolidate all of your monitoring, DB-related or not, into one solution: roll-your-own monitoring solutions can be brittle and burn you in the end. One example that comes to mind, a few years back i had a quick perl-based script to monitor replication and set it to mail a few people based on some threshold. Took me 1 hour to write and worked well enough, except, six months later, when some package upgrades were done on the machine I had the monitoring script running on, the mailer stopped working, and we found out the hard way about an alert situation with our replication.

Nagios

Nagios is something your systems/operations people are probably familiar with. In case you are the systems/ops person, it’s a 15-20 minute process to get nagios installed on a typical ubuntu machine. Make sure to set up nagios with the embedded perl interpreter in case you decide to make use of this feature in the future. Once you have it set up and monitoring, here’s how you can set up nagios to monitor your mysql databases:

You can grab some perl-based nagios plugins to monitor mysql here:

http://www.shinguz.ch/MySQL/nagios-mysql-plugins-0.3.tar.gz

Untar the plugins into /usr/local/nagios/libexec

If you want to check out how a particular plugin is executed, run it from the command line as follows:

vmware@vmware-desktop:~/dl/nagios-mysql-plugins-0.3$ perl check_repl_mysql_seconds_behind_master.pl --help

SYNOPSIS

check_repl_mysql_seconds_behind_master.pl flags

DESCRIPTION

Nagios plugin to check how many seconds slave is behind master...

FLAGS

help, ?      Print this help.
host, h      Host where database is located (default localhost)
password, p  Password of user root (default ><)
port         Port where database listens to (default 3306)
user, u      User for connecting to the database (default root)
critical|c   Seconds for critical level (default 600)
warning|w    Seconds for warning level (default 60)

PARAMETERS

none

As an example, let’s say we want to monitor the lag of a particular replicator. Nagios operates on thresholds, the first stage being a warning, the second being a critical alert.

Adding a Nagios Command

In your /usr/local/nagios/etc/objects/commands.cfg file, add the following “command” entry — you should see others like it. This specifies a type of command you will run on a given host:

# 'check_repl_mysql_seconds_behind_master' command definition
define command{
command_name    check_repl_mysql_seconds_behind_master
command_line    $USER1$/check_repl_mysql_seconds_behind_master.pl --host=$HOSTADDRESS$ --password=$ARG1
}

In this case we’re just using most of the default parameters, but if we wanted to specify a different critical or warning threshold, we could add it with –critical=$ARG2, etc.

Then, if you’re monitoring localhost (probably not in most cases) you would just add the following entry to /usr/local/nagios/etc/localhost.cfg:

define service{
use                             local-service         ; Name of service template to use
host_name                       localhost
service_description             MySQL Replication
check_command                   check_repl_mysql_seconds_behind_master!"<password>"
notifications_enabled           0
}

If you want to add another host to be monitored from this nagios instance, you can specify a new configration file, say, db01.cfg and write that to your etc/objects directory. Then add an entry to /usr/local/nagios/etc/nagios.cfg as follows, after making the appropriate adjustments to the hostname in the service:

cfg_file=/usr/local/nagios/etc/objects/db01.cfg

If everything has been configured correctly, you should be able to restart nagios with /etc/init.d/nagios restart, and then go into your nagios console and see the configuration entries for your new command and service, and also see if your slaves are far behind :-)

Once you have one service added in this way, adding new ones becomes a lot easier. I hope to investigate some more of the performance monitoring features of the nagios mysql plugins, but i’ll leave that for a future blog post.

Posted in monitoring, mysql, replication | 3 Comments »

ALTER TABLE flakiness and mysql replication

May 20th, 2008 by atomic

Over the past few years of dealing with mysql in larger environments, one thing I’ve always felt that ALTER TABLE statements are flaky with replication, but could never really prove it. I never had a chance to dig into some of strange things I saw, and would tend to overlook and fix them

While working for a client, I encountered an issue that I could no longer ignore. In order to reload a small table from a master database to a number of slaves, I simply piped the output of mysqldump into the master and expected everything to flow into the slaves.

The bug is very specific, and probably not very common. If you send a statement like below, as mysqldump automatically adds, to your master:

/*!40000 ALTER TABLE table_name DISABLE KEYS */;

and have configured something like

replicate-rewrite-db=mydb->mydb_slave

on your slave,  it will cause memory corruption on the slave.

You might see something like this in your mysql error logs:

*** glibc detected *** /opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug: double free or corruption (out): 0×0a2eb1f0 ***
======= Backtrace: =========
/lib/libc.so.6[0x166f5d]
/lib/libc.so.6(cfree+0×90)[0x16a5b0]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug(my_no_flags_free+0×64)[0x8481d8c]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug(free_root+0×122)[0x848279a]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug(free_rows+0×1c)[0x82e62d7]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug(unpack_fields+0×3fb)[0x82e734b]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug[0x82e9a72]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug(mysql_real_query+0xdf)[0x82e9c82]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug[0x82d6f66]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug(handle_slave_io+0×3c7)[0x82dd0cd]
/lib/libpthread.so.0[0xcf82db]
/lib/libc.so.6(clone+0×5e)[0x1ce12e]
This goes to show the risk involved in using ‘nifty’ but perhaps not-so-often used features in a complex product like a database. If we’d just kept the same database names we’d have been fine. C’est la vie.

Posted in mysql, replication | No Comments »

Subtle mysqlhotcopy bug fix finally accepted!

April 2nd, 2008 by atomic

I’m happy to report that my patch for a potentially nasty bug in the mysqlhotcopy script has been accepted into the mysql codebase. It’s a great feeling to finally contribute something, however small, back to the mysql community.

If you have a master/slave environment with multiple slaves, and you do periodic backups of your system on the slave with mysqlhotcopy, the –record_log_pos parameter will pull the wrong co-ordinates from the master and cause you to restore new copies of your database that are potentially corrupt.
An example failure scenario can be found here.

Posted in mysql, mysqlhotcopy, replication | No Comments »