mysql proxy 0.6.1 performance tests

July 2nd, 2008 by atomic

The mysql proxy project has tremendous potential to make mysql administration and usage easier. I decided to throw some load at it to get a feel for how stable and performant it is.

On EC2, I set up 6 “small” images in an example proxy setup:

- One client machine to run sysbench
- One machine to act as a mysql proxy machine, running 0.6.1 (FC4 binary)
- Four identical database servers, running mysql 5.0.45

The database configuration was largely default, with InnoDB configured for 64MB buffer pool (just enough to ensure the sysbench table was entirely in memory), 512MB log files, and 1024 max connections.

mysql-proxy was run with the following command:

mysql-proxy –proxy-backend-addresses=ip-10-251-66-63.ec2.internal:3306 –proxy-backend-addresses=ip-10-251-71-21.ec2.internal:3306 –proxy-backend-addresses=ip-10-251-43-70.ec2.internal:3306 –proxy-backend-addresses=ip-10-251-75-52.ec2.internal:3306

Using the sysbench database benchmark utility to generate some heavy load on the boxes individually (to verify they were all more or less equal), and on the proxy itself, I got some interesting results:

Read/write operations/second:



Transactions/second:



Somewhat as I expected, mysql-proxy presents a bit of overhead when load is light. With only 8 concurrent threads, running against the proxy pointing to 4 equal database servers is slower than hitting one server itself.

As load goes up, however, you can see that the proxy scales reasonably well, while the accesses to the direct systems begin to tank rapidly after 64 concurrent threads (I lost some of my 128 thread results inadvertently and the instances are destroyed :( I’m fairly comfortable in interoplating the result to be roughly in between 64 and 256 )

Somewhat unsettling, however, was the fact that sysbench or mysql proxy failed to work beyond about 500 concurrent threads. My debug output for sysbench just stops here:

DEBUG: mysql_real_connect(0×9adec70, “ip-10-251-75-160″, “sbtest”, “(null)”, “sbtest”, 4040, “(null)”, CLIENT_MULTI_STATEMENTS)
DEBUG: mysql_init(0×9ae51b0)
DEBUG: mysql_options(0×9ae51b0, MYSQL_READ_DEFAULT_

On the servers themselves, I can see the 125+ connections to each individual box, and all looks normal. I tried tweaking some of the thread settings in ulimit and sysbench itself to no avail, but didn’t have a ton of time to play with it. I decided the results were already interesting enough and will see if I can revisit this another time.

Another reason i am not so concerned is that more than one proxy can be utilised in many typical environments, so the ability of a single mysql-proxy instance to handle 512+ concurrent connections might not be show-stopper.

If anyone out there has done some similar tests with mysql proxy, i welcome your feedback!

Posted in mysql, mysql-proxy, performance | 9 Comments »

Testing MySQL sequential IO performance with different IO schedulers - Part 1

June 23rd, 2008 by atomic

Testing MySQL sequential IO performance with different IO schedulers - Part 1

I stumbled upon an interesting article discussing the different Linux I/O schedulers that are available in the 2.6 kernel, and also discussing some of the deficiencies of the 2.4 “Linus elevator” I/O scheduler.

The default scheduler as of Kernel 2.6.18 is the “Completely fair queuing” scheduler, or CFQ. The previous default was the anticipatory scheduler between 2.6.0 and 2.6.17. Also available is a buffed-up version of the 2.4 scheduler called the deadline scheduler, and a “noop” scheduler. This article has a great discussion on what these schedulers mean for databases at a higher level.

Although old, some documentation in the kernel states  that database workloads should typically use the deadline scheduler. This tuning guide for Oracle on Linux gives, without much explanation, the same advice .

I don’t have the time at the moment to do a full analysis (hence part 1), but I did some quick tests and already got some pretty surprising results. I’m hoping to see if anyone else has done some tests along these lines that can comment further.

The test

I have a 140GB table sitting on a test machine with a simple 320GB SATA drive.

Changing the scheduler in newer kernels is a simple tunable parameter. You can change it to the deadline scheduler for your sda device, for example, as follows:

echo deadline > /sys/block/sda/queue/scheduler

I ran a simple count(*) on one of the partitions, 13GB in size, to see if there was a measurable difference between the schedulers. Memory size on this system is 4GB.

These are the results I got on an otherwise idle machine. The CFQ and anticipatory results were very consistent, almost to the second each time, while the deadline and no-op schedulers fluctuated as much as 50s per test. Not sure at this point why this is the case.

Scheduler
Average runtime (s)
Average Throughput (MiB/s)
CFQ
303
43.04
Anticipatory
370
35.4
Deadline
492
26.5
No-op
471
27.6



Now, these results come with the following caveats:

- These are not extensive, robust benchmarks, they’re just a few tests I ran against my very specific workload. I encourage people to try the different schedulers on different workloads and hardware and share results!
- These are test results with a large, sequential I/O workload that is very different from the random I/O intensive tests that most people are interested in.
- These are test results against a single ATA drive. These results are probably irrelevant in a hardware RAID context

In a future blog post, I will try to dig a bit deeper into these results and try the tests against hardware and software RAID. What i think can be taken from this already, and the referenced articles, is that CFQ should probably be used if your kernel has direct knowledge of the physical layout of your disks, eg. through LVM. Otherwise, deadline or even noop is probably your best bet if you are using hardware RAID, putting the burden on the hardware to decide how best to execute reads and writes.

Posted in mysql, performance | 3 Comments »

Building a data warehouse on a budget with MySQL 5.1

June 16th, 2008 by atomic

If there is one thing that a DBA or data warehouse architect can count on, it is that data volumes will increase while budgets will decrease.

This is why MySQL 5.1 and its partitioning capabilities are so interesting. I’m going to demonstrate how you can build a small/medium-sized data warehouse or data mart (1-10 TB range) on a shoe-string budget.

the mission

I decided to convert a relatively large statistics table (750m rows, 140GB in size in about 10 partitions) on a test machine from MyISAM to the Archive storage engine. After a long conversion process, my data, on disk, ended up being about 21GB, for an impressive compression ratio of 6.7:1.

Prior to MySQL 5.1, one of the drawbacks to the archive storage engine was that you could not index it; however, with partition pruning, you can get yourself a “free” index on a large archive table by splitting it into date-based chunks, whether by day, week, month or year, depending on the volume of data you want to make accessible.

I decided to see how long it would take to run a simple aggregate query against a portion of one of the partitions I had created.

mysql> explain partitions select count(*) from stats where StatsTime between ‘2007-05-01′ and ‘2007-05-31′;

+—-+————-+————–+————+——+—————+——+———+——+———-+————-+
| id | select_type | table        | partitions | type | possible_keys | key  | key_len | ref  | rows     | Extra       |
+—-+————-+————–+————+——+—————+——+———+——+———-+————-+
|  1 | SIMPLE      | stats        | p_2007q2   | ALL  | NULL          | NULL | NULL    | NULL | 59558354 | Using where |
+—-+————-+————–+————+——+—————+——+———+——+———-+————-+
1 row in set (0.00 sec)

On disk, the partition was about 1.7G:

-rw-rw—- 1 mysql mysql 1.7G Jun 16 17:16 stats#P#p_2007q2.ARZ

After issuing the query, I watched iostat. The file being accessed was completely absent from filesystem cache because i had previous run a long query against a large MyISAM table. This is what I saw, fairly consistently:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
24.96    0.00    0.75    0.88    0.00   73.42

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda              48.55     0.40 62.44  0.60     7.36     0.00   239.32     0.39    6.20   5.42  34.20


As expected with a highly-compressed table, the query speed is CPU-limited: the disk can go much faster but we are fully utilizing one of the four cores (notice 25% cpu utilization).

Back-of-the-envelope calculations suggest that, at 7.3mb/s read throughput from disk, and a 1.7G file to be read, that the query will complete in about:

1700mb / 7.3mb/s = 232s = 3m52s

mysql> select count(*) from stats where StatsTime between ‘2007-05-01′ and ‘2007-05-31′;
+———-+
| count(*) |
+———-+
|        0 |
+———-+
1 row in set (3 min 52.19 sec)

Dead on!

the possibilities

What is interesting is that from this we can deduce the effective throughput of an archive table based on how much data the disk is feeding the cpu.

If we attempted to run such a query on the original MyISAM partition (about 11.5GB), in order to complete the result in 232s, we would require a throughput of :

11500 / 232 = 49mb/s

50mb/s is not lightning fast, it’s about what you’d expect from a single SATA disk, but it’s still fairly good considering the amount of compression we’re getting.

But wait! You still have 3 more cores left on the machine. Now to confirm another hypothesis:

Testing two simultaneous queries, against the same partition:

select count(*) from stats where StatsTime between ‘2007-05-01′ and ‘2007-05-29′;
select count(*) from stats where StatsTime between ‘2007-05-01′ and ‘2007-05-30′;

Most of the data was in fs cache, so our iostat looked like this for most of the time:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
50.70    0.00    0.85    0.00    0.00   48.45

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.50  0.00  0.80     0.00     0.01    13.00     0.00    0.38   0.37   0.03
hda               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Final runtime for both queries was 208s and 217s; suggesting that the queries were almost perfectly parallelized.

If we needed to run a beefy calculation against a year’s worth of data, for example, we could run four parallel queries (one per core per quarter) and aggregate the results for each query in a separate client. Now our “equivalent” 50mb/s throughput on one core has become 50mb/s * 4 = 200mb/s.

If you get a slightly faster disk, you could fully utilize an 8-core box for a throughput of 400mb/s.

And if your disk space is 750GB, factoring in the compression ratio, you are storing the equivalent of 5TB of non-compressed data with a potential read throughput of 200mb/s. Not too shabby for low-end hardware!

Posted in mysql, performance | 5 Comments »

A case for Kettle for your next ETL or data warehouse project

May 10th, 2008 by atomic

I am, for the most part, a do-it-yourself type of person. I fix my own car if I can; I even have four healthy tomato plants growing in pots outside as we speak — the plants will take that little extra CO2 out of the air and give me great tasting tomatoes (soon… i hope!)

But I digress.

Whether to use an ETL tool such as Kettle (aka Penatho Data Integration) for a project involving large data transfers is a typical “build vs. buy” type of decision, one that is fairly well understood and I don’t wish to repeat it all here — putting together some Perl scripts to do the job, you typically get great performance, development speed and accessibility. This would need to be balanced against the benefits of ETL tools and their potential drawbacks (development speed, license costs and performance implications).

A few things have happened in the last couple of years that make some of the typical reasons why you’d build your own no longer valid.

cost

Kettle is an open-source product, so not only is it priced right, you have access to the source code, which can be surprisingly handy. The 2.x series had some nasty bugs, and there were many times we dug into the source code to diagnose problems at my previous company. Don’t worry though, the 3.0 series from my experience is vastly improved.

a new multi-core order

Entry-level desktops today typically come with dual-core CPUs, and four or more cores on server hardware
is now common. This trend will continue as chip makers shift towards adding cores to improve performance rather than simply increasing clock speed.

I’m willing to bet that your average Perl ETL script is not written in a multi-threaded fashion. Perhaps you can architect the process so that you can run several of them at once to take full advantage of the hardware, but this is essentially polluting your ETL logic with hardware dependencies.

With Kettle, leveraging all the cores on your ETL machine is almost free. Each transformation step instantiates a thread within the java virtual machine (JVM) which in turn will be run as a native thread. If you have more than a few steps in your transformation, and data is flowing through fairly evenly (no large bottlenecks), chances are that you are coming close to maximizing use of your hardware.

Surprisingly there are some still out there that have been underneath a rock since 1998 and think java is “slow”; all modern JVMs just-in-time compile your Java code natively and performance in some cases can be faster than C/C++. It is definitely much faster than any interpreted language like Perl.

In my experience, I’ve noticed Kettle can be much faster at sorting and grouping large volumes of data than MySQL itself, at least out of the box, without turning too many knobs on the MySQL end. Good software should do that. I will post some tests of this assertion when I get a chance.

visibility

Kettle allows me to see, if I want to, exactly how much data each step has processed and get row counts/second to measure throughput and performance. Now that I’ve been spoiled, I can never go back to issuing a query like the below:

SELECT uid, count(*)
FROM hundred_million_row_table GROUP BY uid;

The corresponding operation in Kettle:

Spoon UI in action

No more running iostat, top and hacking around to get a vague idea how long it might take MySQL (or whatever RDBMS) to run that beast of a query! Kettle gives you output at select intervals, so you can see how many rows the sort has processed (the most time consuming part of this process).

Posted in kettle, performance | No Comments »

mysql I/O performance analysis with iostat

April 2nd, 2008 by atomic

Here is a situation I’ve run into a few times when dealing with mysql databases. We’re trying to run a one-off query against a high-traffic, large table and the WHERE condition is against a non-indexed field. Let’s say our table is 5GB in size. We issue the following:

SELECT count(*) from five_gb_myisam_table WHERE non_idx_field = 'asdf';

and we wait…

and wait some more.

5GB is not a small table, but this ideally should not take more than a few minutes on a relatively modern system.

iostat is your friend

In cases like this, iostat -x 5 is your friend. While the query was running, this was a typical 5 second interval:

avg-cpu: %user %nice %sys %iowait %idle
2.30 0.00 1.30 96.40 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.70 172.50 1.20 3340.00 15.20 1670.00 7.60 19.32 1.02 5.90 5.60 97.22

While the numbers fluctuated, the effective read throughput never went much over 1800 kb/s. The drive is certainly not the problem — a quick run of hdparm -T -t showed an average read throughput of nearly 50mb/s on this 7200RPM SATA drive.

The numbers for this 5 second period tell an interesting story.

The system sent an average 172.5 read requests/second to the drive, with each read fetching only 19.32 512-byte sectors (about 9.6kb), giving you 9.6*172.5 = 1656kb/s.
The average wait time (apart from queuing) was 5.6ms, so (172.5+1.20)*5.6 = 972ms of each second was spent waiting on some I/O.

Needless to say, the statistics suggest the single disk head is scrambling around fetching data from all different parts of the drive and returning the data back quite slowly overall.

myisamchk statistics

Taking the server down to do some deeper investigation, myisamchk -r -i reports a key statistic:

Blocks/Record 1.34

The table has about 6 million rows, so over 8 million records are stored with as many as 34% of the rows fragmented. This is a common problem for myisam tables containing variable-length fields that are frequently updated. Refer to this page for details.

the way it should be

While I was running the stats above, i tried the same query on a different box with the same table and disk that i had recently restored. Running the same query takes nearly 1/20th of the time, and the iostat (with -m) shows:

avg-cpu: %user %nice %system %iowait %steal %idle
0.50 5.30 25.40 27.00 0.00 41.80


Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 47.50 0.00 378.64 0.00 40.62 0.00 219.69 1.68 4.30 2.01 76.01

Now that’s more like it!

We manage nearly double the reads/s, and each read is pulling 219.69 segments = 109kb of data. Our effective throughput is over 20 times what it was before.

the lesson

While painful for large tables, this is an example of why frequent OPTIMIZE statements, especially against myisam tables, are a very good idea. Not only do they defragment your tables, improving performance on any queries involving table scans, they reduce the risk of table corruption by reducing the number of pointers between split records.

Posted in iostat, mysql, performance | No Comments »