Sun caught in a pincer with MySQL

June 2nd, 2008 by atomic

Over the years, the database world has been buzzing with the strategic threat posed to the established players by upstart open-source database systems. Oracle and IBM would no longer be able to gouge defenseless small and medium-sized businesses of non-trivial portions of their IT budgets for a mere database licence. Oracle, IBM and Microsoft, for their part, have tried their best to respond to this threat, but it is clear that they cannot simply squash open-source products, but rather evolve with the changing landscape.

the countered threat from Oracle

Oracle made some strategic purchases in the past few years to establish a foothold in the embedded and front-end database market by acquiring Sleepycat (maintainers of BerkeleyDB) and InnoBase (makers of InnoDB storage engine for MySQL). These two also happened to provide the only two transactional backends for MySQL, whlie InnoDB is the only one to be used widely in practice. While this was looked upon as a disaster for MySQL, it was really not a major issue for a couple of reasons:

  • The InnoDB source has been GPL’d, so even if Innobase were to completely abandon maintenance of the codebase (which it has not), the community can step in to fix bugs. For now, the status quo prevails
  • MySQL acquired Netfrastructure (sp?) and begun the process of porting the backend to the new Falcon transactional storage engine
  • MySQL also began the process of fixing up some of the severe defecits in the MyISAM storage engine and branded it Maria

So, from a strategic perspective, it looks like MySQL is taking the right steps to counter the threat from Oracle. An infusion of money from Sun will speed up development on many of these initiatives.

a general picture of the database landscape

Let’s take a step back for the moment and take an unscientific look at a few players in the current database market. This is not rigorous, exhaustive market research, it’s just my observations over the past 9-10 years or so:

Teradata: The data-warehousing champ, a reputation for high quality but also prohibitively expensive for many with large data volumes

DB2: Strong in the institutional market, mainframes and data warehousing; not much use in the web/internet world

Oracle: The jack of all trades. Expensive, but no CTO would ever be fired for picking Oracle for almost any purpose, whether its an OLTP system or data warehouse

PostgreSQL: The Betacam of open source databases. Highly-functional, stable and scalable. Over the years, unfortunately saddled with a somewhat-unjustified reputation for being slow and difficult to use (in comparison to MySQL) and a militant userbase that spends an inordinate amount of time bashing MySQL instead of evangelising its capabilities

MySQL: A simple, fast database with a reduced featureset that works well for web applications

My belief is that MySQL owes its popularity mostly due to the fact it is perceived to be very fast when using it to build simple apps. Over the years it became the ‘default’ web database, with most hosting providers using it as the backend in combination with PHP. Many are now providing PostgreSQL hosting, but this wasn’t always the case.

mysql vs. postgresql

While they are both OSS databases, MySQL and PostgreSQL are very different. PostgreSQL has its roots in academa and the defense industry, and a trip into the source code is like a trip down a memory from a computer science class. MySQL, on the other hand, feels and has been developed much more like a commercial product, with a focus on functionality and speed and less interest in elegance and standards compliance. These are sweeping generalisations of course.

My reason for bringing up this comparison is to make the following important assertion:

For a particular project, given the choice, DBAs choose PostgreSQL, developers choose MySQL.

DBAs like tablespaces. Good query optimizers. Tables that don’t randomly corrupt.

Developers like databases that reduce their need to think. A system that will let you send 100 queries over 100 separate connections without any apparent overhead compared to sending 100 queries over the same connection.

I come to this conclusion after nearly 2.5 years as a MySQL “DBA” working for companies of various sizes in the internet industry. MySQL, in many cases, has been reduced to a glorified flat file system, and many non-junior developers do not even understand the most basic SQL optimization.

It’s not hard to see why MySQL is far more popular than PostgreSQL, given that developers are more numerous and higher up the application stack.

Amazon SimpleDB and Google BigTable

The users of MySQL may prove to be a fickle bunch, however. Oracle never was never that much of a threat all along, and two unlikely competitors in the database space will change the rules of the game. Developers like the simplicity and avoiding-of-thinking they get with MySQL. Once a site gets to a certain scale, the database become a major bottleneck. Complex yet surprisingly-robust sharding architectures have been developed to deal with scaling MySQL beyond the capacity of one machine, but this is not for the faint of heart.

BigTable and SimpleDB look ready to take developers back to the simpler days when MySQL was a fast, reliable persistent store, allowing them to focus on their strengths. HBase, while still very alpha-ish, also holds great promise. Many people have criticised BigTable and SimpleDB for being, well, just a big table and a simple database. But that’s precisely what MySQL was, and it did it quite well for a long time. Developers in the internet age simply don’t care about the things DBAs and database developers of a previous era did. They want three things: performance, availability and more performance. Strict ACID compliance is simply not that important in an age when entire internet empires are built off of clicks worth as little as $0.07 and page views worth $0.00001 — but high availability is important.

the pincer

Sun has acquired MySQL at a time when the old guard of the database world is becoming more aggressive, and the new guard of software-as-a-service providers are swooping into the space to appeal to IT managers initially happy with the open source licensing, but not thrilled with the non-trivial total cost of ownership (read: paying a DBA who must continually wake up at 230 to repair your corrupted MyISAM tables)

prognosis

Well, let’s put it this way — as someone that abandoned the proprietary database world a few years ago to work solely with open-source technologies, I feel that I need to start learning more about HBase, BigTable and the like to survive in this marketplace. MySQL may not have seen this strategic threat coming, but they better start working, quickly, to make MySQL scale better and more easily, or it will begin to lose its place as the “default” database of new web applications.

Posted in Uncategorized, mysql | 8 Comments »

ALTER TABLE flakiness and mysql replication

May 20th, 2008 by atomic

Over the past few years of dealing with mysql in larger environments, one thing I’ve always felt that ALTER TABLE statements are flaky with replication, but could never really prove it. I never had a chance to dig into some of strange things I saw, and would tend to overlook and fix them

While working for a client, I encountered an issue that I could no longer ignore. In order to reload a small table from a master database to a number of slaves, I simply piped the output of mysqldump into the master and expected everything to flow into the slaves.

The bug is very specific, and probably not very common. If you send a statement like below, as mysqldump automatically adds, to your master:

/*!40000 ALTER TABLE table_name DISABLE KEYS */;

and have configured something like

replicate-rewrite-db=mydb->mydb_slave

on your slave,  it will cause memory corruption on the slave.

You might see something like this in your mysql error logs:

*** glibc detected *** /opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug: double free or corruption (out): 0×0a2eb1f0 ***
======= Backtrace: =========
/lib/libc.so.6[0x166f5d]
/lib/libc.so.6(cfree+0×90)[0x16a5b0]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug(my_no_flags_free+0×64)[0x8481d8c]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug(free_root+0×122)[0x848279a]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug(free_rows+0×1c)[0x82e62d7]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug(unpack_fields+0×3fb)[0x82e734b]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug[0x82e9a72]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug(mysql_real_query+0xdf)[0x82e9c82]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug[0x82d6f66]
/opt/mysql-5.0.51a-linux-i686-glibc23/bin/mysqld-debug(handle_slave_io+0×3c7)[0x82dd0cd]
/lib/libpthread.so.0[0xcf82db]
/lib/libc.so.6(clone+0×5e)[0x1ce12e]
This goes to show the risk involved in using ‘nifty’ but perhaps not-so-often used features in a complex product like a database. If we’d just kept the same database names we’d have been fine. C’est la vie.

Posted in mysql, replication | No Comments »

A case for Kettle for your next ETL or data warehouse project

May 10th, 2008 by atomic

I am, for the most part, a do-it-yourself type of person. I fix my own car if I can; I even have four healthy tomato plants growing in pots outside as we speak — the plants will take that little extra CO2 out of the air and give me great tasting tomatoes (soon… i hope!)

But I digress.

Whether to use an ETL tool such as Kettle (aka Penatho Data Integration) for a project involving large data transfers is a typical “build vs. buy” type of decision, one that is fairly well understood and I don’t wish to repeat it all here — putting together some Perl scripts to do the job, you typically get great performance, development speed and accessibility. This would need to be balanced against the benefits of ETL tools and their potential drawbacks (development speed, license costs and performance implications).

A few things have happened in the last couple of years that make some of the typical reasons why you’d build your own no longer valid.

cost

Kettle is an open-source product, so not only is it priced right, you have access to the source code, which can be surprisingly handy. The 2.x series had some nasty bugs, and there were many times we dug into the source code to diagnose problems at my previous company. Don’t worry though, the 3.0 series from my experience is vastly improved.

a new multi-core order

Entry-level desktops today typically come with dual-core CPUs, and four or more cores on server hardware
is now common. This trend will continue as chip makers shift towards adding cores to improve performance rather than simply increasing clock speed.

I’m willing to bet that your average Perl ETL script is not written in a multi-threaded fashion. Perhaps you can architect the process so that you can run several of them at once to take full advantage of the hardware, but this is essentially polluting your ETL logic with hardware dependencies.

With Kettle, leveraging all the cores on your ETL machine is almost free. Each transformation step instantiates a thread within the java virtual machine (JVM) which in turn will be run as a native thread. If you have more than a few steps in your transformation, and data is flowing through fairly evenly (no large bottlenecks), chances are that you are coming close to maximizing use of your hardware.

Surprisingly there are some still out there that have been underneath a rock since 1998 and think java is “slow”; all modern JVMs just-in-time compile your Java code natively and performance in some cases can be faster than C/C++. It is definitely much faster than any interpreted language like Perl.

In my experience, I’ve noticed Kettle can be much faster at sorting and grouping large volumes of data than MySQL itself, at least out of the box, without turning too many knobs on the MySQL end. Good software should do that. I will post some tests of this assertion when I get a chance.

visibility

Kettle allows me to see, if I want to, exactly how much data each step has processed and get row counts/second to measure throughput and performance. Now that I’ve been spoiled, I can never go back to issuing a query like the below:

SELECT uid, count(*)
FROM hundred_million_row_table GROUP BY uid;

The corresponding operation in Kettle:

Spoon UI in action

No more running iostat, top and hacking around to get a vague idea how long it might take MySQL (or whatever RDBMS) to run that beast of a query! Kettle gives you output at select intervals, so you can see how many rows the sort has processed (the most time consuming part of this process).

Posted in kettle, performance | No Comments »

mysql I/O performance analysis with iostat

April 2nd, 2008 by atomic

Here is a situation I’ve run into a few times when dealing with mysql databases. We’re trying to run a one-off query against a high-traffic, large table and the WHERE condition is against a non-indexed field. Let’s say our table is 5GB in size. We issue the following:

SELECT count(*) from five_gb_myisam_table WHERE non_idx_field = 'asdf';

and we wait…

and wait some more.

5GB is not a small table, but this ideally should not take more than a few minutes on a relatively modern system.

iostat is your friend

In cases like this, iostat -x 5 is your friend. While the query was running, this was a typical 5 second interval:

avg-cpu: %user %nice %sys %iowait %idle
2.30 0.00 1.30 96.40 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.70 172.50 1.20 3340.00 15.20 1670.00 7.60 19.32 1.02 5.90 5.60 97.22

While the numbers fluctuated, the effective read throughput never went much over 1800 kb/s. The drive is certainly not the problem — a quick run of hdparm -T -t showed an average read throughput of nearly 50mb/s on this 7200RPM SATA drive.

The numbers for this 5 second period tell an interesting story.

The system sent an average 172.5 read requests/second to the drive, with each read fetching only 19.32 512-byte sectors (about 9.6kb), giving you 9.6*172.5 = 1656kb/s.
The average wait time (apart from queuing) was 5.6ms, so (172.5+1.20)*5.6 = 972ms of each second was spent waiting on some I/O.

Needless to say, the statistics suggest the single disk head is scrambling around fetching data from all different parts of the drive and returning the data back quite slowly overall.

myisamchk statistics

Taking the server down to do some deeper investigation, myisamchk -r -i reports a key statistic:

Blocks/Record 1.34

The table has about 6 million rows, so over 8 million records are stored with as many as 34% of the rows fragmented. This is a common problem for myisam tables containing variable-length fields that are frequently updated. Refer to this page for details.

the way it should be

While I was running the stats above, i tried the same query on a different box with the same table and disk that i had recently restored. Running the same query takes nearly 1/20th of the time, and the iostat (with -m) shows:

avg-cpu: %user %nice %system %iowait %steal %idle
0.50 5.30 25.40 27.00 0.00 41.80


Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 47.50 0.00 378.64 0.00 40.62 0.00 219.69 1.68 4.30 2.01 76.01

Now that’s more like it!

We manage nearly double the reads/s, and each read is pulling 219.69 segments = 109kb of data. Our effective throughput is over 20 times what it was before.

the lesson

While painful for large tables, this is an example of why frequent OPTIMIZE statements, especially against myisam tables, are a very good idea. Not only do they defragment your tables, improving performance on any queries involving table scans, they reduce the risk of table corruption by reducing the number of pointers between split records.

Posted in iostat, mysql, performance | No Comments »

Subtle mysqlhotcopy bug fix finally accepted!

April 2nd, 2008 by atomic

I’m happy to report that my patch for a potentially nasty bug in the mysqlhotcopy script has been accepted into the mysql codebase. It’s a great feeling to finally contribute something, however small, back to the mysql community.

If you have a master/slave environment with multiple slaves, and you do periodic backups of your system on the slave with mysqlhotcopy, the –record_log_pos parameter will pull the wrong co-ordinates from the master and cause you to restore new copies of your database that are potentially corrupt.
An example failure scenario can be found here.

Posted in mysql, mysqlhotcopy, replication | No Comments »

Next Entries »