<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>Alex Tomic's tech blog</title>
	<atom:link href="http://tomictech.com/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://tomictech.com</link>
	<description></description>
	<pubDate>Thu, 19 Mar 2009 18:43:57 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>IBM would add further complications for MySQL</title>
		<link>http://tomictech.com/?p=28</link>
		<comments>http://tomictech.com/?p=28#comments</comments>
		<pubDate>Thu, 19 Mar 2009 18:43:57 +0000</pubDate>
		<dc:creator>atomic</dc:creator>
		
		<category><![CDATA[mysql]]></category>

		<category><![CDATA[ibm]]></category>

		<category><![CDATA[sun]]></category>

		<guid isPermaLink="false">http://tomictech.com/?p=28</guid>
		<description><![CDATA[The rumoured acquisition of Sun by IBM makes sense on many levels. IBM has always been a strong proponent of business-friendly open source and is probably more invested in Java than Sun itself. The indirect impact on recently-acquired MySQL is a bit personal for me, however.
Many moons ago, I worked in the IBM software lab [...]]]></description>
			<content:encoded><![CDATA[<p>The rumoured acquisition of Sun by IBM makes sense on many levels. IBM has always been a strong proponent of business-friendly open source and is probably more invested in Java than Sun itself. The indirect impact on recently-acquired MySQL is a bit personal for me, however.</p>
<p>Many moons ago, I worked in the <a href="http://www.ibm.com/software/ca/en/torontolab/">IBM software lab in Toronto</a> where DB2 UDB is primarily developed. I learned a great deal there, but something stirred within me, a desire for freedom and flexibility in software. I took a risk and moved to California, working for a number of start ups, believing that open source database technology (and software in general) was the wave of the future, and I wanted to work in companies that were willing to use OSS software.</p>
<p>To say the least, many of my then-current and former IBM colleagues that had taken cushy jobs with big companies to administer large DB2 installations, or were on track to do so, thought I was crazy. MySQL was a toy database, and PostgreSQL was just a research project! In retrospect I feel it was the right decision.</p>
<p>So, here we are, years later, and IBM may now end up owning MySQL.</p>
<p>On the positive side, I have no doubt of IBM&#8217;s commitment to open source and that they will continue to support MySQL with the tremendous resources it can bring to bear. It would be quite ironic if the Toronto lab had some people shift to MySQL development!</p>
<p>However, if IBM&#8217;s previous acquisitions are any indication, they will almost certainly push MySQL as a free &#8220;entry-level&#8221; database to a larger pool of customers and smooth the way for eventual upgrading to <span style="text-decoration: line-through;">more expensive</span> enterprise products. This is not inherently a bad thing, but it could translate into poor support and release-inflation (i.e. beta = alpha, GA = beta, etc) as a way to encourage captive customers to just bite the bullet and upgrade to DB2.</p>
<p>IBM may be able to squeeze quite a bit of profit out of MySQL in this way from enterprise customers who are used to this kind of abuse, but there are significant challenges:</p>
<p>They will need to go out of their way to convince skeptics like myself that quality and stability will not be compromised as MySQL gets integrated into IBM&#8217;s product offerings, otherwise MySQL users without deep pockets may begin defecting to postgres rather than be stuck with a key part of their infrastructure poorly-supported.</p>
<p>Then there is the still-unresolved problem of Oracle owning the maker of MySQL&#8217;s only viable transactional engine, InnoDB, an issue <a href="http://tomictech.com/2008/06/02/sun-caught-in-a-pincer-with-mysql/">I wrote about last year</a> when Sun acquired MySQL. With the release of the new InnoDB as a downloadable &#8220;plug-in&#8221;, Oracle has made it clear that they want to track and find as many of MySQL&#8217;s customers as they can.</p>
<p>Most important is that IBM will need to maintain MySQL&#8217;s reputation as the DB of choice for new web sites and other budget-conscious startups. Mindshare is everything. This is inherently difficult to track and acheive, because the next twitter will be developed by a few clever people who start up something for fun and just download all the OSS tools they need, based on what others before them did. Right now, typically, that&#8217;s a standard LAMP stack. If people starting up sites today begin to question the &#8220;M&#8221; part of LAMP (eg. swap postgresql for mysql), this effect on MySQL&#8217;s reputation and stature won&#8217;t be obvious for quite a while but it will be difficult to reverse.</p>
]]></content:encoded>
			<wfw:commentRss>http://tomictech.com/?feed=rss2&amp;p=28</wfw:commentRss>
		</item>
		<item>
		<title>On the need for an agile approach to data warehousing</title>
		<link>http://tomictech.com/?p=19</link>
		<comments>http://tomictech.com/?p=19#comments</comments>
		<pubDate>Tue, 27 Jan 2009 20:34:47 +0000</pubDate>
		<dc:creator>atomic</dc:creator>
		
		<category><![CDATA[agile development]]></category>

		<category><![CDATA[data warehousing]]></category>

		<category><![CDATA[agile]]></category>

		<guid isPermaLink="false">http://tomictech.com/?p=19</guid>
		<description><![CDATA[I&#8217;d like to take a step back from technical issues to distill some of my thoughts on the challenges of data warehousing in the 21st century.
Having worked on a number of warehouse projects in different industries over the years, I&#8217;ve encountered many challenges, some failures, some successes. One thing is certain: all organizations that have [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;d like to take a step back from technical issues to distill some of my thoughts on the challenges of data warehousing in the 21st century.</p>
<p>Having worked on a number of warehouse projects in different industries over the years, I&#8217;ve encountered many challenges, some failures, some successes. One thing is certain: all organizations that have a reasonable amount of data should be building a data warehouse if they don&#8217;t already have one. In 2009, given the economic atmosphere, no one wants to wait as long, or pay as much, as they did in 1999 to get one.</p>
<p>While this is a huge opportunity for open-source competitors like MySQL, it comes with big challenges for an organization that thinks it will get a $10MM warehouse (in 1999 dollars) for $300,000 (2009 dollars).</p>
<p>My contention is that in a web-connected, high-traffic and high-speed world, a monolithic approach with a rigid set of requirements, and a project team isolated from the business through layers of product and project managers is an almost guarantee of a disastrous investment of resources into a product that will not satisfy the needs of an ever-changing business. Most of my reasons for believing this follow directly from the arguments <a id="wgec" title="made in favour of agile development" href="http://en.wikipedia.org/wiki/Agile_software_development">made in favour of agile development</a> in software engineering. In a data warehousing effort, I believe some of these reasons are even more compelling, yet many do not see a warehouse project as a typical example of something that can be made &#8220;agile&#8221;.</p>
<p>For those unfamiliar with agile development, I&#8217;ll crudely summarize it as follows: a project that is lightly specified and built with a continuous loop of user/business feedback is more flexible and produces better results. This usually takes the form of implementing small, usable features called &#8220;stories&#8221; every few weeks. They can be thought of as a series of small prototypes. The strongest argument for developing in this way is that requirements and the business environment can change rapidly. Users may know what they want at a given point in time (and even that&#8217;s a somewhat optimistic assumption) but may change their minds and produce more innovative results if they are able and encouraged to make course modifications after seeing initial prototypes.</p>
<p>In undertaking a new, or improved warehousing effort in an organization, it is crucial that users have access to and are encouraged to use new data that are being produced. This makes users more comfortable with the results they work with, easing over time any psychological aversion they may have to trusting new spins on numbers. More importantly, developers can write specified test cases until they are blue in the face, but nothing can replace the eyeballs of a user that can immediately tell when numbers do not look right.</p>
<p>In a monolithic approach, a bug can creep into the processing pipeline and cause many GB or TB of data to be incorrectly processed until someone notices it, if no one is working with the data. If users frequently make use of new data, bugs can be caught quickly and the time spent fixing them far less.</p>
<p><strong>A working example</strong></p>
<p>While this sounds good in theory, an entire book could be written on how examples of this could work in practice. I&#8217;ll just use a contrived example that i&#8217;m sure many can relate to.</p>
<p>Your company has grown quickly in the last couple of years, and has always collected in log form tons of data that never get used. Other priorities prevented much from being done with it, but recently someone in management read an article in a CIO/CTO magazine about &#8220;data mining&#8221; and now you&#8217;ve been stuck with the task of buiding a warehouse with all of this data.</p>
<p>In a monolithic, 1999-style approach, the CTO would have Oracle or IBM come in with a team of 10 highly-compensated consultants. A few feasability studies, $10MM and 2 years later, you&#8217;d have a brand-spanking-new data warehouse full of data that a large majority of which the spindles will not touch even once.</p>
<p>Luckily, we are in more enlightened, frugal times, and many companies now undertake this effort themselves after recovering from Oracle sticker shock. Here&#8217;s a series of starting tips to the company in the situation above:</p>
<ul>
<li>This is kind of a rule of thumb, but to begin with, forget about all historical data that is too old to impact payments or revenue. If your company reconciles payments and/or revenues, say, every 60 days, the degree of interest your analysts and management team have in data from 61 days ago compared to 59 days ago is probably substantial.</li>
<li>Listen to the CTO&#8217;s and management team&#8217;s vision for what they want to do with all this data. Listen to their thoughts and keep them in the back of your mind, but alsoremember that Rome wasn&#8217;t built in a day</li>
<li>Now that you&#8217;ve discarded the mental burden of having to come up with a way of scaling your computations on many years worth of data, get an off-the-shelf box with a couple of large hard drives, talk to your analysts about something interesting that can be done with the last 60 days worth of data with a maximum development time of 2-3 weeks. This will <strong>force</strong> the people that work with your data on a daily basis to decide what is most important to them. Be highly suspicous of those that say they need &#8220;everything&#8221;</li>
<li>After 3 weeks, you should have a self-maintaining system, processing whatever it processes on its own and without your intervention. Show it to the anlaysts, and get their feedback. If all goes well, they will be excited to use the new data and it will very quickly becomea critical component of their jobs . Remember another rule of thumb: once users have been given data, and actively use it, you take it away from them at your peril!</li>
<li>If your first &#8220;user story&#8221; was a success, most likely you will have to fight with the analysts over what data gets included next. Again, do not reprocess back data no matter how much they say it &#8220;would be useful&#8221; &#8212; stick to that 60-day window.</li>
<li>Rinse. Repeat.</li>
<li>After a few iterations, you should be in the enviable position of having a hacked together system running on an off-the-shelf box. And this is fine, because far more important things are solid &#8212; the user&#8217;s confidence in the data they are getting, the management team&#8217;s confidence that you are giving the business the data they need, and your confidence that what you are building is being used.</li>
<li>Now sit the managers and analysts down for &#8220;the talk&#8221; in which you describe to them that you need the next 3 weeks to prevent the system from breaking. This may involve getting proper hardware, or simply re-architecting the system. They will hate you for it just like a child will hate you for taking their steady supply of candy away, but remember that you must be a responsible parent.</li>
<li>Rinse. Repeat.</li>
<li>Receive accolades.</li>
</ul>
<p>I hope that no one takes my tips too seriously. Certainly referring to your users as &#8220;children&#8221; will not win you many allies. I also certainly do not advise a completely chaotic development model with zero planning. But I think those of you that have been involved in multi-year data warehouse projects with vast project teams doing gap analyses, feasibility studies and book-length project plans will agree that working on something collaboratively over a few short iterations can&#8217;t be all bad. You may even give the users something they want and not break the bank while you&#8217;re at it.</p>
]]></content:encoded>
			<wfw:commentRss>http://tomictech.com/?feed=rss2&amp;p=19</wfw:commentRss>
		</item>
		<item>
		<title>Some partition movement tricks with MySQL 5.1</title>
		<link>http://tomictech.com/?p=16</link>
		<comments>http://tomictech.com/?p=16#comments</comments>
		<pubDate>Thu, 22 Jan 2009 01:56:19 +0000</pubDate>
		<dc:creator>atomic</dc:creator>
		
		<category><![CDATA[mysql]]></category>

		<category><![CDATA[partitioning]]></category>

		<guid isPermaLink="false">http://tomictech.com/?p=16</guid>
		<description><![CDATA[I have been using partitioning in MySQL 5.1 since it has been in late beta, and with the odd bump I&#8217;ve been quite satisfied with its stability and ease of use. Perhaps the days of maintaining nasty application logic to maintain tables like table_yyyymm manually are finally over!
While some control freaks out there might still [...]]]></description>
			<content:encoded><![CDATA[<p>I have been using partitioning in MySQL 5.1 since it has been in late beta, and with the odd bump I&#8217;ve been quite satisfied with its stability and ease of use. Perhaps the days of maintaining nasty application logic to maintain tables like table_yyyymm manually are finally over!</p>
<p>While some control freaks out there might still prefer the painful approach of manually keeping dated tables and using merge tables/views to link them together, you still have a lot of flexibility in how you manage individual partitions with the ALTER TABLE REORGANIZE/CHANGE/ADD PARTITION commands.</p>
<p>One trick i&#8217;ve discovered to work pretty well, is that you can swap in and out partitions right out from under MySQL&#8217;s nose. This can be useful in some situations.</p>
<p><strong>WARNING!!</strong> This is not supported by MySQL to the best of my knowledge, and possibly is a risky, bad idea. Proceed only if you think you know what you&#8217;re doing!</p>
<p>Let&#8217;s say we have a table big_table, with self-explanatory partitions</p>
<p>p_200811<br />
p_200812<br />
p_200901</p>
<p>In your data directory, assuming you are using MyISAM, you will see MySQL has created files like these:</p>
<p>big_table#P#p_200811.MYD/MYI<br />
big_table#P#p_200812.MYD/MYI<br />
big_table#P#p_200901.MYD/MYI</p>
<p>Now on the server, running live, to make sure MySQL releases any locks on files at the OS level:<br />
LOCK TABLE big_table WRITE;  # alternatively use FLUSH TABLES WITH READ LOCK to lock whole DB<br />
FLUSH TABLE big_table;</p>
<p>Now let&#8217;s say you want to do some intensive processing on your November data set. You can simply copy  the file to a temporary location, or possibly to another machine:</p>
<p>cp $DATADIR/dbname/big_table*200811* /tmplocation</p>
<p>Then you can unlock your table with UNLOCK TABLES;</p>
<p>Let&#8217;s say you have a box you want to do this on. Simply create a table-structure only copy of your table on that machine (*exactly* the same layout!! Bad Things(TM) will probably happen if this gets confused). Then repeat the procedure above in MySQL and on the OS:</p>
<p>mysql&gt; FLUSH TABLES WITH READ LOCK;<br />
$ cp /tmploc/big_table*200811* $DATADIR_SECOND_BOX/dbname/ # this overwrites the empty files; make sure they are owned by &#8220;mysql&#8221; or the db owner!<br />
mysql&gt; UNLOCK TABLES;</p>
<p>After unlocking, mysql will reload the files, and assuming the permissions and table structures are all correct, doing a count(*) on your date range should confirm that the data has moved over successfully. Now apply your datafixes, and repeat the procedure to return the data back to the host machine!</p>
<p>One possible application of this approach could be to distribute a very large index rebuild to a number of machines if your tables are partitioned into several chunks.</p>
<p>Given that InnoDB is sensitive to the dbname, I don&#8217;t expect that this would be possible with innodb tables, although I have not tried. I expect it to work with the ARCHIVE engine but also have not confirmed.</p>
<p>Hoping to get some comments and suggestions on how to improve on this process, or hear about other people&#8217;s experience hacking at partitioned tables.</p>
]]></content:encoded>
			<wfw:commentRss>http://tomictech.com/?feed=rss2&amp;p=16</wfw:commentRss>
		</item>
		<item>
		<title>Setting up MySQL monitoring with Nagios</title>
		<link>http://tomictech.com/?p=15</link>
		<comments>http://tomictech.com/?p=15#comments</comments>
		<pubDate>Thu, 09 Oct 2008 04:33:48 +0000</pubDate>
		<dc:creator>atomic</dc:creator>
		
		<category><![CDATA[monitoring]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[replication]]></category>

		<category><![CDATA[nagios]]></category>

		<guid isPermaLink="false">http://tomictech.com/?p=15</guid>
		<description><![CDATA[It&#8217;s been a while since my last post! I have a few half-written entries, but I figured I just get one out there so I don&#8217;t appear to have dropped off the face of the planet  
Being more of a DBA than a sysadmin myself, I&#8217;ve noticed that there is a surprising dearth of [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been a while since my last post! I have a few half-written entries, but I figured I just get one out there so I don&#8217;t appear to have dropped off the face of the planet <img src='http://tomictech.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Being more of a DBA than a sysadmin myself, I&#8217;ve noticed that there is a surprising dearth of information regarding how to actually get nagios set up to monitor MySQL, especially if you&#8217;re not experienced with nagios.</p>
<p>In my own experience, MySQL monitoring often ends up being a homegrown thing, especially if you&#8217;re a small shop. It is definitely a good idea to consolidate all of your monitoring, DB-related or not, into one solution: roll-your-own monitoring solutions can be brittle and burn you in the end. One example that comes to mind, a few years back i had a quick perl-based script to monitor replication and set it to mail a few people based on some threshold. Took me 1 hour to write and worked well enough, except, six months later, when some package upgrades were done on the machine I had the monitoring script running on, the mailer stopped working, and we found out the hard way about an alert situation with our replication.</p>
<h3>Nagios</h3>
<p><a href="http://www.nagios.org/">Nagios</a> is something your systems/operations people are probably familiar with. In case <strong>you </strong>are the systems/ops person, it&#8217;s a 15-20 minute process to <a href="http://nagios.sourceforge.net/docs/3_0/quickstart.html">get nagios installed</a> on a typical ubuntu machine. Make sure to set up nagios with the embedded perl interpreter in case you decide to make use of this feature in the future. Once you have it set up and monitoring, here&#8217;s how you can set up nagios to monitor your mysql databases:</p>
<p>You can grab some perl-based nagios plugins to monitor mysql here:</p>
<p><span style="font-family: Calibri,Verdana,Helvetica,Arial;"><a href="redir.aspx?URL=http%3a%2f%2fwww.shinguz.ch%2fMySQL%2fnagios-mysql-plugins-0.3.tar.gz" target="_blank">http://www.shinguz.ch/MySQL/nagios-mysql-plugins-0.3.tar.gz</a></span></p>
<p>Untar the plugins into /usr/local/nagios/libexec</p>
<p>If you want to check out how a particular plugin is executed, run it from the command line as follows:</p>
<pre>vmware@vmware-desktop:~/dl/nagios-mysql-plugins-0.3$ perl check_repl_mysql_seconds_behind_master.pl --help

SYNOPSIS

check_repl_mysql_seconds_behind_master.pl flags

DESCRIPTION

Nagios plugin to check how many seconds slave is behind master...

FLAGS

help, ?      Print this help.
host, h      Host where database is located (default localhost)
password, p  Password of user root (default &gt;&lt;)
port         Port where database listens to (default 3306)
user, u      User for connecting to the database (default root)
critical|c   Seconds for critical level (default 600)
warning|w    Seconds for warning level (default 60)

PARAMETERS

none
</pre>
<p>As an example, let&#8217;s say we want to monitor the lag of a particular replicator. Nagios operates on thresholds, the first stage being a warning, the second being a critical alert.</p>
<h3>Adding a Nagios Command</h3>
<p>In your /usr/local/nagios/etc/objects/commands.cfg file, add the following &#8220;command&#8221; entry &#8212; you should see others like it. This specifies a type of command you will run on a given host:</p>
<pre># 'check_repl_mysql_seconds_behind_master' command definition
define command{
command_name    check_repl_mysql_seconds_behind_master
command_line    $USER1$/check_repl_mysql_seconds_behind_master.pl --host=$HOSTADDRESS$ --password=$ARG1
}
</pre>
<p>In this case we&#8217;re just using most of the default parameters, but if we wanted to specify a different critical or warning threshold, we could add it with &#8211;critical=$ARG2, etc.</p>
<p>Then, if you&#8217;re monitoring localhost (probably not in most cases) you would just add the following entry to /usr/local/nagios/etc/localhost.cfg:</p>
<pre>define service{
use                             local-service         ; Name of service template to use
host_name                       localhost
service_description             MySQL Replication
check_command                   check_repl_mysql_seconds_behind_master!"&lt;password&gt;"
notifications_enabled           0
}
</pre>
<p>If you want to add another host to be monitored from this nagios instance, you can specify a new configration file, say, db01.cfg and write that to your etc/objects directory. Then add an entry to /usr/local/nagios/etc/nagios.cfg as follows, after making the appropriate adjustments to the hostname in the service:</p>
<p>cfg_file=/usr/local/nagios/etc/objects/db01.cfg</p>
<p>If everything has been configured correctly, you should be able to restart nagios with /etc/init.d/nagios restart, and then go into your nagios console and see the configuration entries for your new command and service, and also see if your slaves are far behind <img src='http://tomictech.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Once you have one service added in this way, adding new ones becomes a lot easier. I hope to investigate some more of the performance monitoring features of the nagios mysql plugins, but i&#8217;ll leave that for a future blog post.</p>
]]></content:encoded>
			<wfw:commentRss>http://tomictech.com/?feed=rss2&amp;p=15</wfw:commentRss>
		</item>
		<item>
		<title>mysql proxy 0.6.1 performance tests</title>
		<link>http://tomictech.com/?p=14</link>
		<comments>http://tomictech.com/?p=14#comments</comments>
		<pubDate>Wed, 02 Jul 2008 17:11:13 +0000</pubDate>
		<dc:creator>atomic</dc:creator>
		
		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<category><![CDATA[performance]]></category>

		<category><![CDATA[sysbench]]></category>

		<guid isPermaLink="false">http://tomictech.com/?p=14</guid>
		<description><![CDATA[The mysql proxy project has tremendous potential to make mysql administration and usage easier. I decided to throw some load at it to get a feel for how stable and performant it is. On EC2, I set up 6 &#8220;small&#8221; images in an example proxy setup:- One client machine to run sysbench- One machine to [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://forge.mysql.com/wiki/MySQL_Proxy">mysql proxy project</a> has tremendous potential to make mysql administration and usage easier. I decided to throw some load at it to get a feel for how stable and performant it is. <br id="tw_32" /><br id="f6su" />On EC2, I set up 6 &#8220;small&#8221; images in an example proxy setup:<br id="lf25" /><br id="lf250" />- One client machine to run sysbench<br id="lf251" />- One machine to act as a mysql proxy machine, running 0.6.1 (FC4 binary)<br id="lf252" />- Four identical database servers, running mysql 5.0.45<br id="z.di" /><br id="z.di0" />The database configuration was largely default, with InnoDB configured for 64MB buffer pool (just enough to ensure the sysbench table was entirely in memory), 512MB log files, and 1024 max connections.<br id="z.di1" /><br id="z.di2" />mysql-proxy was run with the following command:<br id="z.di3" /><br id="z.di4" />mysql-proxy &#8211;proxy-backend-addresses=ip-10-251-66-63.ec2.internal:3306 &#8211;proxy-backend-addresses=ip-10-251-71-21.ec2.internal:3306 &#8211;proxy-backend-addresses=ip-10-251-43-70.ec2.internal:3306 &#8211;proxy-backend-addresses=ip-10-251-75-52.ec2.internal:3306<br id="z.di5" /><br id="z.di6" />Using the sysbench database benchmark utility to generate some heavy load on the boxes individually (to verify they were all more or less equal), and on the proxy itself, I got some interesting results:<br id="kjc84" /></p>
<p><span style="text-decoration: underline;">Read/write operations/second:</span><br />
<br id="a6q-" /><img src="http://spreadsheets.google.com/pub?key=pP8KmhvZIzEceeZoKrtmxYw&amp;oid=1&amp;output=image" alt="" /><br id="a6q-0" /><br id="xyrj0" /><span style="text-decoration: underline;">Transactions/second:</span></p>
<p><img src="http://spreadsheets.google.com/pub?key=pP8KmhvZIzEceeZoKrtmxYw&amp;oid=2&amp;output=image" alt="" /><br />
<br id="a6q-1" /></p>
<p>Somewhat as I expected, mysql-proxy presents a bit of overhead when load is light. With only 8 concurrent threads, running against the proxy pointing to 4 equal database servers is slower than hitting one server itself.<br id="eq68" /><br id="eq680" />As load goes up, however, you can see that the proxy scales reasonably well, while the accesses to the direct systems begin to tank rapidly after 64 concurrent threads (I lost some of my 128 thread results inadvertently and the instances are destroyed <img src='http://tomictech.com/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> I&#8217;m fairly comfortable in interoplating the result to be roughly in between 64 and 256 )<br id="p1qw" /><br id="p1qw0" />Somewhat unsettling, however, was the fact that sysbench or mysql proxy failed to work beyond about 500 concurrent threads. My debug output for sysbench just stops here:<br id="lrj3" /> <br id="lrj30" /> DEBUG: mysql_real_connect(0&#215;9adec70, &#8220;ip-10-251-75-160&#8243;, &#8220;sbtest&#8221;, &#8220;(null)&#8221;, &#8220;sbtest&#8221;, 4040, &#8220;(null)&#8221;, CLIENT_MULTI_STATEMENTS)<br id="lrj31" /> DEBUG: mysql_init(0&#215;9ae51b0)<br id="lrj32" /> DEBUG: mysql_options(0&#215;9ae51b0, MYSQL_READ_DEFAULT_<br id="lrj33" /></p>
<p>On the servers themselves, I can see the 125+ connections to each individual box, and all looks normal. I tried tweaking some of the thread settings in ulimit and sysbench itself to no avail, but didn&#8217;t have a ton of time to play with it. I decided the results were already interesting enough and will see if I can revisit this another time. <br id="u1ma" /> <br id="u1ma0" /> Another reason i am not so concerned is that more than one proxy can be utilised in many typical environments, so the ability of a single mysql-proxy instance to handle 512+ concurrent connections might not be show-stopper. <br id="ya:y" /> <br id="ya:y0" /> If anyone out there has done some similar tests with mysql proxy, i welcome your feedback!</p>
]]></content:encoded>
			<wfw:commentRss>http://tomictech.com/?feed=rss2&amp;p=14</wfw:commentRss>
		</item>
		<item>
		<title>A lesson in never letting your filesystem fill up too much</title>
		<link>http://tomictech.com/?p=13</link>
		<comments>http://tomictech.com/?p=13#comments</comments>
		<pubDate>Thu, 26 Jun 2008 16:01:04 +0000</pubDate>
		<dc:creator>atomic</dc:creator>
		
		<category><![CDATA[mysql]]></category>

		<category><![CDATA[archive]]></category>

		<guid isPermaLink="false">http://tomictech.com/?p=13</guid>
		<description><![CDATA[A good rule of thumb is that a DBA should not let the size of his/her database grow to much more than 50% of the available disk space. I&#8217;m sure everyone has a story of being burned in some way by not following this advice.
Before promoting 5.1 to some beefier hardware, I&#8217;ve got a few [...]]]></description>
			<content:encoded><![CDATA[<p>A good rule of thumb is that a DBA should not let the size of his/her database grow to much more than 50% of the available disk space. I&#8217;m sure everyone has a story of being burned in some way by not following this advice.</p>
<p>Before promoting 5.1 to some beefier hardware, I&#8217;ve got a few large tables sitting on a small SATA-based machine. Noticing that a few were growing faster than I expected, i decided to pause my loading and convert another one of them to the archive storage engine to save some space.</p>
<p>I certainly saved space.</p>
<p>By trying to do too much at one time, I had some other temp files lying around on the same filesystem, and my conversion to the archive storage engine failed at about 80% through with my disk full.</p>
<p>mysql&gt; alter table w_stats engine = Archive;<br />
ERROR 1030 (HY000): Got error -1 from storage engine</p>
<p>mysql 5.1 has <a href="http://bugs.mysql.com/bug.php?id=37648">an unfortunate bug</a> that I discovered, and reported, that causes the unfinished archive files to remain after the failed conversion, and not the original data. My disk space problems are no longer much of a problem <img src='http://tomictech.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Now I know that this was basically my fault, and that I should have been more careful with my system, but I would have preferred maybe an embarassing error message, a slap on the wrist, not the loss of a 150GB table! For example:</p>
<p>mysql&gt; alter table w_stats engine = Archive;<br />
ERROR 1030 (HY000):  df -h is your friend</p>
<p>Well, it sounded funny in my head at least&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://tomictech.com/?feed=rss2&amp;p=13</wfw:commentRss>
		</item>
		<item>
		<title>Testing MySQL sequential IO performance with different IO schedulers - Part 1</title>
		<link>http://tomictech.com/?p=12</link>
		<comments>http://tomictech.com/?p=12#comments</comments>
		<pubDate>Mon, 23 Jun 2008 19:52:06 +0000</pubDate>
		<dc:creator>atomic</dc:creator>
		
		<category><![CDATA[mysql]]></category>

		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://tomictech.com/?p=12</guid>
		<description><![CDATA[Testing MySQL sequential IO performance with different IO schedulers - Part 1I stumbled upon an interesting article discussing the different Linux I/O schedulers that are available in the 2.6 kernel, and also discussing some of the deficiencies of the 2.4 &#8220;Linus elevator&#8221; I/O scheduler. The default scheduler as of Kernel 2.6.18 is the &#8220;Completely fair [...]]]></description>
			<content:encoded><![CDATA[<p>Testing MySQL sequential IO performance with different IO schedulers - Part 1<br id="ov.f" /><br id="ov.f0" />I stumbled upon an <a href="http://kerneltrap.org/node/7637">interesting article</a> discussing the different Linux I/O schedulers that are available in the 2.6 kernel, and also discussing some of the deficiencies of the 2.4 &#8220;Linus elevator&#8221; I/O scheduler. <br id="df9g" /><br id="gn2o" />The default scheduler as of Kernel 2.6.18 is the &#8220;Completely fair queuing&#8221; scheduler, or CFQ. The previous default was the anticipatory scheduler between 2.6.0 and 2.6.17. Also available is a buffed-up version of the 2.4 scheduler called the deadline scheduler, and a &#8220;noop&#8221; scheduler. <a href="http://www.fishpool.org/post/2008/03/31/Optimizing-Linux-I/O-on-hardware-RAID">This article</a> has a great discussion on what these schedulers mean for databases at a higher level. <br id="gn2o0" /><br id="ckz5" />Although old, <a href="http://www.linuxhq.com/kernel/v2.6/8/Documentation/as-iosched.txt">some documentation in the kernel</a> states  that database workloads should typically use the deadline scheduler. <a href="http://www.puschitz.com/TuningLinuxForOracle.shtml#KernelBootParameters">This tuning guide for Oracle on Linux</a> gives, without much explanation, the same advice . <br id="oi6j" /> <br id="df9g0" />I don&#8217;t have the time at the moment to do a full analysis (hence part 1), but I did some quick tests and already got some pretty surprising results. I&#8217;m hoping to see if anyone else has done some tests along these lines that can comment further.<br id="w1ve" /><br id="xlvy" />The test<br id="xlvy0" /><br id="w1ve0" />I have a 140GB table sitting on a test machine with a simple 320GB SATA drive.<br id="t9z6" /><br id="t9z60" />Changing the scheduler in newer kernels is a simple tunable parameter. You can change it to the deadline scheduler for your sda device, for example, as follows:<br id="t9z61" /><br id="t9z62" />echo deadline &gt; /sys/block/sda/queue/scheduler<br id="t9z63" /><br id="t9z64" />I ran a simple count(*) on one of the partitions, 13GB in size, to see if there was a measurable difference between the schedulers. Memory size on this system is 4GB.<br id="ndza" /><br id="ndza0" />These are the results I got on an otherwise idle machine. The CFQ and anticipatory results were very consistent, almost to the second each time, while the deadline and no-op schedulers fluctuated as much as 50s per test. Not sure at this point why this is the case.<br id="udh5" /><br id="udh50" /></p>
<div id="auaz">
<table id="h1di" border="1" cellspacing="0" cellpadding="3" width="100%" bordercolor="#999999">
<tbody id="auaz0">
<tr id="auaz1">
<td id="auaz2" width="33%">Scheduler<br id="auaz3" /></td>
<td id="auaz4" width="33%">Average runtime (s)<br id="auaz5" /></td>
<td id="auaz6" width="33%">Average Throughput (MiB/s)<br id="auaz7" /></td>
</tr>
<tr id="auaz8">
<td id="auaz9" width="33%">CFQ<br id="auaz10" /></td>
<td id="auaz11" width="33%">303<br id="auaz12" /></td>
<td id="auaz13" width="33%">43.04<br id="auaz14" /></td>
</tr>
<tr id="auaz15">
<td id="auaz16" width="33%">Anticipatory<br id="auaz17" /></td>
<td id="auaz18" width="33%">370<br id="auaz19" /></td>
<td id="auaz20" width="33%">35.4<br id="auaz21" /></td>
</tr>
<tr id="auaz22">
<td id="auaz23" width="33%">Deadline<br id="auaz24" /></td>
<td id="auaz25" width="33%">492<br id="auaz26" /></td>
<td id="auaz27" width="33%">26.5<br id="auaz28" /></td>
</tr>
<tr id="auaz29">
<td id="auaz30" width="33%">No-op<br id="auaz31" /></td>
<td id="auaz32" width="33%">471<br id="auaz33" /></td>
<td id="auaz34" width="33%">27.6<br id="auaz35" /></td>
</tr>
</tbody>
</table>
</div>
<p><br id="lued" /><br id="o_u3" /> Now, these results come with the following caveats:<br id="lued0" /><br id="lued1" />- These are not extensive, robust benchmarks, they&#8217;re just a few tests I ran against my very specific workload. I encourage people to try the different schedulers on different workloads and hardware and share results!<br id="lued2" />- These are test results with a large, sequential I/O workload that is very different from the random I/O intensive tests that most people are interested in. <br id="utbj" /> - These are test results against a single ATA drive. These results are probably irrelevant in a hardware RAID context<br id="uhuq" /> <br id="uhuq0" /> In a future blog post, I will try to dig a bit deeper into these results and try the tests against hardware and software RAID. What i think can be taken from this already, and the referenced articles, is that CFQ should probably be used if your kernel has direct knowledge of the physical layout of your disks, eg. through LVM. Otherwise, deadline or even noop is probably your best bet if you are using hardware RAID, putting the burden on the hardware to decide how best to execute reads and writes.</p>
]]></content:encoded>
			<wfw:commentRss>http://tomictech.com/?feed=rss2&amp;p=12</wfw:commentRss>
		</item>
		<item>
		<title>Building a data warehouse on a budget with MySQL 5.1</title>
		<link>http://tomictech.com/?p=11</link>
		<comments>http://tomictech.com/?p=11#comments</comments>
		<pubDate>Tue, 17 Jun 2008 02:59:13 +0000</pubDate>
		<dc:creator>atomic</dc:creator>
		
		<category><![CDATA[mysql]]></category>

		<category><![CDATA[performance]]></category>

		<category><![CDATA[data warehouse]]></category>

		<category><![CDATA[datawarehouse]]></category>

		<guid isPermaLink="false">http://tomictech.com/?p=11</guid>
		<description><![CDATA[If there is one thing that a DBA or data warehouse architect can count on, it is that data volumes will increase while budgets will decrease.This is why MySQL 5.1 and its partitioning capabilities are so interesting. I&#8217;m going to demonstrate how you can build a small/medium-sized data warehouse or data mart (1-10 TB range) [...]]]></description>
			<content:encoded><![CDATA[<p>If there is one thing that a DBA or data warehouse architect can count on, it is that data volumes will increase while budgets will decrease.<br id="zwv8" /><br id="zwv80" />This is why MySQL 5.1 and its partitioning capabilities are so interesting. I&#8217;m going to demonstrate how you can build a small/medium-sized data warehouse or data mart (1-10 TB range) on a shoe-string budget.</p>
<h3>the mission</h3>
<p>I decided to convert a relatively large statistics table (750m rows, 140GB in size in about 10 partitions) on a test machine from MyISAM to the Archive storage engine. After a long conversion process, my data, on disk, ended up being about 21GB, for an impressive compression ratio of 6.7:1.<br id="c1y6" /><br id="c1y60" />Prior to MySQL 5.1, one of the drawbacks to the archive storage engine was that you could not index it; however, with partition pruning, you can get yourself a &#8220;free&#8221; index on a large archive table by splitting it into date-based chunks, whether by day, week, month or year, depending on the volume of data you want to make accessible.<br id="k5ha" /><br id="k5ha0" />I decided to see how long it would take to run a simple aggregate query against a portion of one of the partitions I had created.<br id="mveb0" /><span style="font-family: Courier New;"><br id="o8.1" />mysql&gt; explain partitions select count(*) from stats where StatsTime between &#8216;2007-05-01&#8242; and &#8216;2007-05-31&#8242;;</span><br id="tfq:" style="font-family: Courier New;" /> <span style="font-family: Courier New;">+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;-+</span><br id="tfq:0" style="font-family: Courier New;" /> <span style="font-family: Courier New;">| id | select_type | table        | partitions | type | possible_keys | key  | key_len | ref  | rows     | Extra       |</span><br id="tfq:1" style="font-family: Courier New;" /> <span style="font-family: Courier New;">+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;-+</span><br id="tfq:2" style="font-family: Courier New;" /> <span style="font-family: Courier New;">|  1 | SIMPLE      | stats        | p_2007q2   | ALL  | NULL          | NULL | NULL    | NULL | 59558354 | Using where | </span><br id="tfq:3" style="font-family: Courier New;" /> <span style="font-family: Courier New;">+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;-+</span><br id="tfq:4" style="font-family: Courier New;" /> <span style="font-family: Courier New;">1 row in set (0.00 sec)</span><br id="tfq:5" /> <br id="y3_0" />On disk, the partition was about 1.7G:<br id="y3_00" /><br id="i6tf0" /> -rw-rw&#8212;- 1 mysql mysql 1.7G Jun 16 17:16 stats#P#p_2007q2.ARZ<br id="y3_01" /><br id="tfq:6" />After issuing the query, I watched iostat. The file being accessed was completely absent from filesystem cache because i had previous run a long query against a large MyISAM table. This is what I saw, fairly consistently:<br id="jgk2" /> <br id="jgk20" /><span style="font-family: Courier New;">avg-cpu:  %user   %nice %system %iowait  %steal   %idle</span><br id="tt4k0" style="font-family: Courier New;" /> <span style="font-family: Courier New;"> 24.96    0.00    0.75    0.88    0.00   73.42</span><br id="tt4k1" style="font-family: Courier New;" /> <br id="tt4k2" style="font-family: Courier New;" /> <span style="font-family: Courier New;">Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util</span><br id="tt4k3" style="font-family: Courier New;" /> <span style="font-family: Courier New;">sda              48.55     0.40 62.44  0.60     7.36     0.00   239.32     0.39    6.20   5.42  34.20</span><br id="tt4k4" style="font-family: Courier New;" /> <br id="wfq4" /><br id="lst7" />As expected with a highly-compressed table, the query speed is CPU-limited: the disk can go much faster but we are fully utilizing one of the four cores (notice 25% cpu utilization).<br id="ht7e" /><br id="ht7e0" />Back-of-the-envelope calculations suggest that, at 7.3mb/s read throughput from disk, and a 1.7G file to be read, that the query will complete in about: <br id="clrg" /><br id="clrg0" />1700mb / 7.3mb/s = 232s = 3m52s<br id="o40y" /><br id="o40y0" /><span style="font-family: Courier New;">mysql&gt; select count(*) from stats where StatsTime between &#8216;2007-05-01&#8242; and &#8216;2007-05-31&#8242;;</span><br id="b:9:1" style="font-family: Courier New;" /> <span style="font-family: Courier New;">+&#8212;&#8212;&#8212;-+</span><br id="b:9:2" style="font-family: Courier New;" /> <span style="font-family: Courier New;">| count(*) |</span><br id="b:9:3" style="font-family: Courier New;" /> <span style="font-family: Courier New;">+&#8212;&#8212;&#8212;-+</span><br id="b:9:4" style="font-family: Courier New;" /> <span style="font-family: Courier New;">|        0 | </span><br id="b:9:5" style="font-family: Courier New;" /> <span style="font-family: Courier New;">+&#8212;&#8212;&#8212;-+</span><br id="b:9:6" style="font-family: Courier New;" /> <span style="font-family: Courier New;">1 row in set (3 min 52.19 sec)</span><br id="b:9:7" /> <br id="f_pj" />Dead on!</p>
<h3>the possibilities</h3>
<p>What is interesting is that from this we can deduce the effective throughput of an archive table based on how much data the disk is feeding the cpu.</p>
<p>If we attempted to run such a query on the original MyISAM partition (about 11.5GB), in order to complete the result in 232s, we would require a throughput of :<br id="g:811" /><br id="g:812" />11500 / 232 = 49mb/s<br id="btud" /><br id="btud0" />50mb/s is not lightning fast, it&#8217;s about what you&#8217;d expect from a single SATA disk, but it&#8217;s still fairly good considering the amount of compression we&#8217;re getting.<br id="o.-q" /><br id="o.-q0" />But wait! You still have 3 more cores left on the machine. Now to confirm another hypothesis:<br id="x9js" /><br id="x9js0" />Testing two simultaneous queries, against the same partition:<br id="j3i60" /> <br id="j3i61" /> <span style="font-family: Courier New;">select count(*) from stats where </span><span style="font-family: Courier New;">StatsTime </span><span style="font-family: Courier New;">between &#8216;2007-05-01&#8242; and &#8216;2007-05-29&#8242;;</span><br id="v9x-" style="font-family: Courier New;" /><span style="font-family: Courier New;"> select count(*) from stats where </span><span style="font-family: Courier New;">StatsTime </span><span style="font-family: Courier New;">between &#8216;2007-05-01&#8242; and &#8216;2007-05-30&#8242;;</span><br id="ti:t" style="font-family: Courier New;" /><br id="b-lz" />Most of the data was in fs cache, so our iostat looked like this for most of the time:<br id="lf6c" /><br id="f_pj0" style="font-family: Courier New;" /><span style="font-family: Courier New;">avg-cpu:  %user   %nice %system %iowait  %steal   %idle</span><br id="f_pj1" style="font-family: Courier New;" /><span style="font-family: Courier New;"> 50.70    0.00    0.85    0.00    0.00   48.45</span><br id="f_pj2" style="font-family: Courier New;" /><br id="f_pj3" style="font-family: Courier New;" /><span style="font-family: Courier New;">Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util</span><br id="f_pj4" style="font-family: Courier New;" /><span style="font-family: Courier New;">sda               0.00     0.50  0.00  0.80     0.00     0.01    13.00     0.00    0.38   0.37   0.03</span><br id="f_pj5" style="font-family: Courier New;" /><span style="font-family: Courier New;">hda               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00</span><br id="f_pj6" /><br id="j3i6" />Final runtime for both queries was 208s and 217s; suggesting that the queries were almost perfectly parallelized.<br id="t5ms" /><br id="t5ms0" />If we needed to run a beefy calculation against a year&#8217;s worth of data, for example, we could run four parallel queries (one per core per quarter) and aggregate the results for each query in a separate client. Now our &#8220;equivalent&#8221; 50mb/s throughput on one core has become 50mb/s * 4 = 200mb/s. <br id="ieou" /><br id="ieou0" />If you get a slightly faster disk, you could fully utilize an 8-core box for a throughput of 400mb/s. <br id="ltyv" /><br id="ltyv0" />And if your disk space is 750GB, factoring in the compression ratio, you are storing the equivalent of 5TB of non-compressed data with a potential read throughput of 200mb/s. Not too shabby for low-end hardware!<br id="x.i:" /><br id="jgk29" /></p>
]]></content:encoded>
			<wfw:commentRss>http://tomictech.com/?feed=rss2&amp;p=11</wfw:commentRss>
		</item>
		<item>
		<title>MySQL Repair/Optimize Partition Errors</title>
		<link>http://tomictech.com/?p=10</link>
		<comments>http://tomictech.com/?p=10#comments</comments>
		<pubDate>Fri, 06 Jun 2008 06:24:52 +0000</pubDate>
		<dc:creator>atomic</dc:creator>
		
		<category><![CDATA[mysql]]></category>

		<category><![CDATA[partitioning]]></category>

		<guid isPermaLink="false">http://tomictech.com/blog/?p=10</guid>
		<description><![CDATA[mysql 5.1 is nearing release, with the present release candidate 5.1.24.
The most important new feature, in my eyes, is the new partitioning capability. When I get some time, I will write up a more complete post on my experiences so far with 5.1 partitioning, but I am going to try to keep the turnover on [...]]]></description>
			<content:encoded><![CDATA[<p>mysql 5.1 is nearing release, with the present release candidate 5.1.24.</p>
<p>The most important new feature, in my eyes, is the new partitioning capability. When I get some time, I will write up a more complete post on my experiences so far with 5.1 partitioning, but I am going to try to keep the turnover on posts a bit higher, and post smaller things on here more regularly.</p>
<p>Partitioning has the potential to make large tables in mysql manageable once again. This is music to the ears of anyone that has had the misfortune of having to learn, the hard way, about MyISAM&#8217;s often <a href="http://peter-zaitsev.livejournal.com/11772.html">painfully slow</a> &#8220;Repair by keycache&#8221; loading and repairing of large tables with unique keys. Add that to MyISAM&#8217;s propensity to table corruption, especially with large tables, and you have a ticking timebomb on many pre-5.1 servers out there. If you are lucky, you can repair a 10GB table in a few hours, something you don&#8217;t want to have to wait around for at 3am which is invariably when those big tables get screwed up and you have to repair them.</p>
<p>Of course, many people use application-level partitioning to deal with this problem, or used the inadequate MERGE storage engine to deal with this. But partitioning at the database level has the potential to make things just so much easier. If you partition wisely, you can turn a 10GB behemoth table into a series of 500mb chunks, each of which is (I&#8217;m hoping here) less likely to corrupt, and far more manageable by MySQL individually.</p>
<p>Beware of <a href="http://bugs.mysql.com/bug.php?id=20129">this bug</a> though! It&#8217;s duplicated many times, and might catch you off guard if you plan to be an early adopter of 5.1.  Essentially, you can&#8217;t really repair or optimize partitioned tables, at least not directly. The documentation <a href="http://dev.mysql.com/doc/refman/5.1/en/alter-table.html">still contains references</a> to the ALTER TABLE &#8230; REPAIR PARTITION statements which don&#8217;t work at all, as of 5.1.24. For eg, with my partitioned InnoDB table tst_p:</p>
<pre>mysql&gt; optimize table tst_p;
+-------------+----------+----------+-----------------------------------------------------------+
| Table       | Op       | Msg_type | Msg_text                                                  |
+-------------+----------+----------+-----------------------------------------------------------+
| stats.tst_p | optimize | note     | The storage engine for the table doesn't support optimize |
+-------------+----------+----------+-----------------------------------------------------------+
1 row in set (0.00 sec)
mysql&gt; alter table tst_p repair partition p_pre2007;
ERROR 1178 (42000): The storage engine for the table doesn't support repair partition</pre>
<p>Luckily, there is a workaround. Using ALTER TABLE REORGANIZE PARTITION, you can effectively rebuild the partition which will optimize/repair it. I&#8217;ve yet to test this with a truly corrupted table, however. The downside of this is that you have to have the partition definition handy. Luckily it seems to be smart enough to stop you from making a mess &#8212; notice the error in the first one, and the second statement goes through correctly:</p>
<pre>mysql&gt; alter table tst_p reorganize partition p_pre2007 into (partition p_pre2007 values less than (to_days('2007-12-31')));
ERROR 1520 (HY000): Reorganize of range partitions cannot change total ranges except for last partition where it can extend the range
mysql&gt;
mysql&gt; alter table tst_p reorganize partition p_pre2007 into (partition p_pre2007 values less than (to_days('2007-01-01')));
Query OK, 0 rows affected (0.08 sec)
Records: 0  Duplicates: 0  Warnings: 0</pre>
<p>As I get more time i&#8217;ll post more cool partitioning tricks.</p>
]]></content:encoded>
			<wfw:commentRss>http://tomictech.com/?feed=rss2&amp;p=10</wfw:commentRss>
		</item>
		<item>
		<title>Sun caught in a pincer with MySQL</title>
		<link>http://tomictech.com/?p=9</link>
		<comments>http://tomictech.com/?p=9#comments</comments>
		<pubDate>Tue, 03 Jun 2008 07:20:44 +0000</pubDate>
		<dc:creator>atomic</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[bigtable]]></category>

		<category><![CDATA[hbase]]></category>

		<category><![CDATA[postgresql]]></category>

		<category><![CDATA[scaling]]></category>

		<guid isPermaLink="false">http://tomictech.com/blog/?p=9</guid>
		<description><![CDATA[Over the years, the database world has been buzzing with the strategic threat posed to the established players by upstart open-source database systems. Oracle and IBM would no longer be able to gouge defenseless small and medium-sized businesses of non-trivial portions of their IT budgets for a mere database licence. Oracle, IBM and Microsoft, for [...]]]></description>
			<content:encoded><![CDATA[<p>Over the years, the database world has been buzzing with the strategic threat posed to the established players by upstart open-source database systems. Oracle and IBM would no longer be able to gouge defenseless small and medium-sized businesses of non-trivial portions of their IT budgets for a mere database licence. Oracle, IBM and Microsoft, for their part, have tried their best to respond to this threat, but it is clear that they cannot simply squash open-source products, but rather evolve with the changing landscape.</p>
<h3>the countered threat from Oracle</h3>
<p>Oracle made some strategic purchases in the past few years to establish a foothold in the embedded and front-end database market by acquiring Sleepycat (maintainers of BerkeleyDB) and InnoBase (makers of InnoDB storage engine for MySQL). These two also happened to provide the only two transactional backends for MySQL, whlie InnoDB is the only one to be used widely in practice. While this was looked upon as a disaster for MySQL, it was really not a major issue for a couple of reasons:</p>
<ul>
<li>The InnoDB source has been GPL&#8217;d, so even if Innobase were to completely abandon maintenance of the codebase (which it has not), the community can step in to fix bugs. For now, the status quo prevails</li>
<li>MySQL acquired Netfrastructure (sp?) and begun the process of porting the backend to the new Falcon transactional storage engine</li>
<li>MySQL also began the process of fixing up some of the severe defecits in the MyISAM storage engine and branded it <a href="http://dev.mysql.com/downloads/maria/index.html">Maria</a></li>
</ul>
<p>So, from a strategic perspective, it looks like MySQL is taking the right steps to counter the threat from Oracle. An infusion of money from Sun will speed up development on many of these initiatives.</p>
<h3>a general picture of the database landscape</h3>
<p>Let&#8217;s take a step back for the moment and take an unscientific look at a few players in the current database market. This is not rigorous, exhaustive market research, it&#8217;s just my observations over the past 9-10 years or so:</p>
<p>Teradata: The data-warehousing champ, a reputation for high quality but also prohibitively expensive for many with large data volumes</p>
<p>DB2: Strong in the institutional market, mainframes and data warehousing; not much use in the web/internet world</p>
<p>Oracle: The jack of all trades. Expensive, but no CTO would ever be fired for picking Oracle for almost any purpose, whether its an OLTP system or data warehouse</p>
<p>PostgreSQL: The Betacam of open source databases. Highly-functional, stable and scalable. Over the years, unfortunately saddled with a somewhat-unjustified reputation for being slow and difficult to use (in comparison to MySQL) and a militant userbase that spends an inordinate amount of time bashing MySQL instead of evangelising its capabilities</p>
<p>MySQL: A simple, fast database with a reduced featureset that works well for web applications</p>
<p>My belief is that MySQL owes its popularity mostly due to the fact it is perceived to be very fast when using it to build simple apps. Over the years it became the &#8216;default&#8217; web database, with most hosting providers using it as the backend in combination with PHP. Many are now providing PostgreSQL hosting, but this wasn&#8217;t always the case.</p>
<h3>mysql vs. postgresql</h3>
<p>While they are both OSS databases, MySQL and PostgreSQL are very different. <a href="http://www.postgresql.org/docs/7.4/static/history.html">PostgreSQL has its roots in academa and the defense industry</a>, and a trip into the source code is like a trip down <a href="http://www.google.com/codesearch?hl=en&amp;q=show:BdE2aF3-8-4:_2IWH1yq2KM:Sya6mTOdewo&amp;sa=N&amp;ct=rd&amp;cs_p=http://ftp.osuosl.org/pub/nslu2/sources/postgresql-8.1.4.tar.bz2&amp;cs_f=postgresql-8.1.4/src/backend/optimizer/geqo/geqo_main.c&amp;start=1">a memory from a computer science class</a>. <a href="http://www.antihoe.org/oreilly/weblinux/sql/ch01_04.htm">MySQL, on the other hand</a>, feels and has been developed much more like a commercial product, with a focus on functionality and speed and less interest in elegance and standards compliance. These are sweeping generalisations of course.</p>
<p>My reason for bringing up this comparison is to make the following important assertion:</p>
<p><em>For a particular project, given the choice, DBAs choose PostgreSQL, developers choose MySQL.</em></p>
<p>DBAs like tablespaces. Good query optimizers. Tables that don&#8217;t randomly corrupt.</p>
<p>Developers like databases that reduce their need to think. A system that will let you send 100 queries over 100 separate connections without any apparent overhead compared to sending 100 queries over the same connection.</p>
<p>I come to this conclusion after nearly 2.5 years as a MySQL &#8220;DBA&#8221; working for companies of various sizes in the internet industry. MySQL, in many cases, has been reduced to a glorified flat file system, and many non-junior developers do not even understand the most basic SQL optimization.</p>
<p>It&#8217;s not hard to see why MySQL is far more popular than PostgreSQL, given that developers are more numerous and higher up the application stack.</p>
<h3>Amazon SimpleDB and Google BigTable</h3>
<p>The users of MySQL may prove to be a fickle bunch, however. Oracle never was never that much of a threat all along, and two unlikely competitors in the database space will change the rules of the game.  Developers like the simplicity and avoiding-of-thinking they get with MySQL. Once a site gets to a certain scale, the database become a major bottleneck. Complex yet surprisingly-robust <a href="http://highscalability.com/unorthodox-approach-database-design-coming-shard">sharding architectures</a> have been developed to deal with scaling MySQL beyond the capacity of one machine, but this is not for the faint of heart.</p>
<p><a href="http://labs.google.com/papers/bigtable.html">BigTable </a>and <a href="http://www.amazon.com/SimpleDB-AWS-Service-Pricing/b?ie=UTF8&amp;node=342335011">SimpleDB </a>look ready to take developers back to the simpler days when MySQL was a fast, reliable persistent store, allowing them to focus on their strengths. <a href="http://wiki.apache.org/hadoop/Hbase">HBase</a>, while still very alpha-ish, also holds great promise. Many people have criticised BigTable and SimpleDB for being, well, just a big table and a simple database. But that&#8217;s precisely what MySQL was, and it did it quite well for a long time. Developers in the internet age simply don&#8217;t care about the things DBAs and database developers of a previous era did. They want three things: performance, availability and more performance. Strict ACID compliance is simply not that important in an age when entire internet empires are built off of clicks worth as little as $0.07 and page views worth $0.00001 &#8212; but high availability <em>is </em>important.</p>
<h3>the pincer</h3>
<p>Sun has acquired MySQL at a time when the old guard of the database world is becoming more aggressive, and the new guard of software-as-a-service providers are swooping into the space to appeal to IT managers initially happy with the open source licensing, but not thrilled with the non-trivial total cost of ownership (read: paying a DBA who must continually wake up at 230 to repair your corrupted MyISAM tables)</p>
<h3>prognosis</h3>
<p>Well, let&#8217;s put it this way &#8212; as someone that abandoned the proprietary database world a few years ago to work solely with open-source technologies, I feel that I need to start learning more about HBase, BigTable and the like to survive in this marketplace. MySQL may not have seen this strategic threat coming, but they better start working, quickly, to make MySQL scale better and more easily, or it will begin to lose its place as the &#8220;default&#8221; database of new web applications.</p>
]]></content:encoded>
			<wfw:commentRss>http://tomictech.com/?feed=rss2&amp;p=9</wfw:commentRss>
		</item>
	</channel>
</rss>
