Sun caught in a pincer with MySQL
atomic Over the years, the database world has been buzzing with the strategic threat posed to the established players by upstart open-source database systems. Oracle and IBM would no longer be able to gouge defenseless small and medium-sized businesses of non-trivial portions of their IT budgets for a mere database licence. Oracle, IBM and Microsoft, for their part, have tried their best to respond to this threat, but it is clear that they cannot simply squash open-source products, but rather evolve with the changing landscape.
the countered threat from Oracle
Oracle made some strategic purchases in the past few years to establish a foothold in the embedded and front-end database market by acquiring Sleepycat (maintainers of BerkeleyDB) and InnoBase (makers of InnoDB storage engine for MySQL). These two also happened to provide the only two transactional backends for MySQL, whlie InnoDB is the only one to be used widely in practice. While this was looked upon as a disaster for MySQL, it was really not a major issue for a couple of reasons:
- The InnoDB source has been GPL’d, so even if Innobase were to completely abandon maintenance of the codebase (which it has not), the community can step in to fix bugs. For now, the status quo prevails
- MySQL acquired Netfrastructure (sp?) and begun the process of porting the backend to the new Falcon transactional storage engine
- MySQL also began the process of fixing up some of the severe defecits in the MyISAM storage engine and branded it Maria
So, from a strategic perspective, it looks like MySQL is taking the right steps to counter the threat from Oracle. An infusion of money from Sun will speed up development on many of these initiatives.
a general picture of the database landscape
Let’s take a step back for the moment and take an unscientific look at a few players in the current database market. This is not rigorous, exhaustive market research, it’s just my observations over the past 9-10 years or so:
Teradata: The data-warehousing champ, a reputation for high quality but also prohibitively expensive for many with large data volumes
DB2: Strong in the institutional market, mainframes and data warehousing; not much use in the web/internet world
Oracle: The jack of all trades. Expensive, but no CTO would ever be fired for picking Oracle for almost any purpose, whether its an OLTP system or data warehouse
PostgreSQL: The Betacam of open source databases. Highly-functional, stable and scalable. Over the years, unfortunately saddled with a somewhat-unjustified reputation for being slow and difficult to use (in comparison to MySQL) and a militant userbase that spends an inordinate amount of time bashing MySQL instead of evangelising its capabilities
MySQL: A simple, fast database with a reduced featureset that works well for web applications
My belief is that MySQL owes its popularity mostly due to the fact it is perceived to be very fast when using it to build simple apps. Over the years it became the ‘default’ web database, with most hosting providers using it as the backend in combination with PHP. Many are now providing PostgreSQL hosting, but this wasn’t always the case.
mysql vs. postgresql
While they are both OSS databases, MySQL and PostgreSQL are very different. PostgreSQL has its roots in academa and the defense industry, and a trip into the source code is like a trip down a memory from a computer science class. MySQL, on the other hand, feels and has been developed much more like a commercial product, with a focus on functionality and speed and less interest in elegance and standards compliance. These are sweeping generalisations of course.
My reason for bringing up this comparison is to make the following important assertion:
For a particular project, given the choice, DBAs choose PostgreSQL, developers choose MySQL.
DBAs like tablespaces. Good query optimizers. Tables that don’t randomly corrupt.
Developers like databases that reduce their need to think. A system that will let you send 100 queries over 100 separate connections without any apparent overhead compared to sending 100 queries over the same connection.
I come to this conclusion after nearly 2.5 years as a MySQL “DBA” working for companies of various sizes in the internet industry. MySQL, in many cases, has been reduced to a glorified flat file system, and many non-junior developers do not even understand the most basic SQL optimization.
It’s not hard to see why MySQL is far more popular than PostgreSQL, given that developers are more numerous and higher up the application stack.
Amazon SimpleDB and Google BigTable
The users of MySQL may prove to be a fickle bunch, however. Oracle never was never that much of a threat all along, and two unlikely competitors in the database space will change the rules of the game. Developers like the simplicity and avoiding-of-thinking they get with MySQL. Once a site gets to a certain scale, the database become a major bottleneck. Complex yet surprisingly-robust sharding architectures have been developed to deal with scaling MySQL beyond the capacity of one machine, but this is not for the faint of heart.
BigTable and SimpleDB look ready to take developers back to the simpler days when MySQL was a fast, reliable persistent store, allowing them to focus on their strengths. HBase, while still very alpha-ish, also holds great promise. Many people have criticised BigTable and SimpleDB for being, well, just a big table and a simple database. But that’s precisely what MySQL was, and it did it quite well for a long time. Developers in the internet age simply don’t care about the things DBAs and database developers of a previous era did. They want three things: performance, availability and more performance. Strict ACID compliance is simply not that important in an age when entire internet empires are built off of clicks worth as little as $0.07 and page views worth $0.00001 — but high availability is important.
the pincer
Sun has acquired MySQL at a time when the old guard of the database world is becoming more aggressive, and the new guard of software-as-a-service providers are swooping into the space to appeal to IT managers initially happy with the open source licensing, but not thrilled with the non-trivial total cost of ownership (read: paying a DBA who must continually wake up at 230 to repair your corrupted MyISAM tables)
prognosis
Well, let’s put it this way — as someone that abandoned the proprietary database world a few years ago to work solely with open-source technologies, I feel that I need to start learning more about HBase, BigTable and the like to survive in this marketplace. MySQL may not have seen this strategic threat coming, but they better start working, quickly, to make MySQL scale better and more easily, or it will begin to lose its place as the “default” database of new web applications.
Posted in Uncategorized, mysql |
June 3rd, 2008 at 4:43 pm
Great article! Followed the link from slashdot. It will be interesting to see what happens in the DB ‘wars’ of the next 5 or 6 years. My money’s on the MySQL team though.
June 3rd, 2008 at 6:07 pm
@Bob: I honestly hope so too — i’ve got quite a bit invested in my knowledge of MySQL! But I can also see the writing on the wall, and don’t want to be one of those old-school people stuck in a previous era as the industry evolves
June 3rd, 2008 at 6:59 pm
I agree with your articles slant.
The “real” problem in the relational database world is not really the pro’s and con’s of any particular rdbms. The truth is that if you really know what you are doing you can successfully implement your system on almost any of the solutions out there today proprietary / open source regardless.
The real problem is - how many java / C# / language de jour developers are experienced in the theory behind relational design and performant access to the data it contains?
The truth is not many. I don’t mean this is a slander to our developer brethren - I have been a DBA on wall street of 10 years now and I learn something new about relational design almost every day.
Designing a data model that meets your requirements and performant access to it is *really* hard. Most developers spend 70-80% of their time in the application tier - that is where their skills lie. To expect your average application developer to understand all the intricacies of the relational storage model is just not realistic IMHO. Again, I have the highest respect for application developers, but if 80% of your job is optimal OO design, its simply unrealistic to also expect you to be an expect at relational design, and worse deal with the impedance mismatch between them elegantly.
OK - so you could employ an expert data modeler and SQL query author, but again to be realistic how many firms actually do that? How many modelers can actually communicate and partner effectively with the application team? (Hats off to the ones that do)
One way to solve this problem is to present an expert interface that unburdens this responsibility from the application team and tries to do the “right thing” most of the time. These interfaces do not have a good track record at present, but such toolsets such as Hibernate continue to improve and are starting to show their value.
The heart of this problem is spanning this bridge so that every one wins and remains productive in their expert areas. Progress has been made, but nothing yet has completely disarmed this serious problem to software development.
June 3rd, 2008 at 7:23 pm
@dave74737: I agree with your sentiments completely, and wanted to get more into the details you discussed, but figured i’d never get the post finished
One of the things I wanted to delve into more was the split between “RDBMSs” and “glorified hash-tables”, and how this relates to PostgreSQL and MySQL. PostgreSQL is evolving into a viable RDBMS alternative to Oracle and DB2 for systems of moderate scale. With “commodity” machines these days coming with 16-32gb of ram, and multi-TB storage arrays doing 5000 IOPS costing mere thousands of dollars, performance can be impressive and the feature set is rich. For data warehousing especially, an easy upgrade path to commercial solutions like Greenplum are also available if you outgrow single-machine postgresql.
MySQL is trying to keep up in the “real” RDBMS market, but I think anyone seriously considering replacing Oracle or DB2 for a non-web app where ACID and features are important should be thinking about PostgreSQL and it will take a long time for MySQL to catch up in the minds of the hardcore DBAs.
On the other hand, MySQL’s traditional strength in the web world is going to be eroded once developers begin to realise just how easy it is to use something like bigtable or simpledb — none of them will shed a tear to know they can’t do cascading FK deletes if they can shut their brain off in how they access their data.
That, in a nutshell, is what i think is the pincer that Sun/MySQL will find themselves in.
June 3rd, 2008 at 10:26 pm
hey i followed your link from slashdot
this is a really good article, i’m a newbie at servers and i’ve been thinking about why we use isam at work, and if it’d be smart to move to innodb
i’m glad that you mentioned that innodb isn’t mysql anymore, sun bought it
June 3rd, 2008 at 11:24 pm
@vraa: Move off of MyISAM if you still can! I advise using MyISAM only if there is a very good reason, e.g. read-only and very infrequently-modified tables.
June 4th, 2008 at 9:06 am
While I can appreciate the complexities of RDBMS design / development (I used to do it to a degree - it’s hard), let me speak to the “turning off of the brain” attitude that is lurking here towards application devs who are in favour of the “big hash tables”.
Problem is, application developers (at least good OO ones I’ve been around) have __already__ modeled the domain objects for the application(s) in question. So for yet another tier to come in (a good DBA / DB dev), redesign it with a whole new set of constraints / behaviors, is, well, painful.
For example, the last project I worked on was driven entirely by the database design guys. Table after table, constraint after contraint, it grew into the application. However, we began to notice that most of the cascading FK deletes that were put into the model (because is was relationaly “correct”), actually caused the application to deviate from the URS/FRS - user’s __wanted__ to delete an item from a list table without affecting the entity itself. Why? To prohibit future use of that value. And let’s not even begin to talk about the poor error reporting that is common in these environments - did you ever see a users’ face when they see “Violation of constraint ‘FK_tblSomeData_001′…” or an ‘ORA-10..’??? It means nothing to them, and it adds even more complexity to intercept these messages / conditions, and translate them into meaningful phrases.
Now, that said…there’s nothing wrong with either approach…it’s just that the 2 camps are approaching storage from 2 sides. Application devs see storage as a necessary facility, used only to maintain state of the actual business objects that are in use (since all parent/child relationships are already modeled). DB devs see storage as the entire application, where every aspect needs modeled and fit into the relational algebra. Each can be right at different times depending on the situation, but they are never both applicable at the same time.
As any smart developer (regardless of where in the stack you live), we should use the correct tools at the correct time. There is no single correct answer for storage method, search alorithm, etc.
We live, we learn.
June 4th, 2008 at 11:22 am
@MikeY: Your points are well-taken. While it was not my intention to sound like I am a bitter person on the DB side of an app / DB mudslinging contest, I can see why my language might have given that impression. My apologies to anyone that got that impression.
When I say “turning off your brain” I’m referring only to the manner in which data is stored — not the process of development itself. The way I see it, from the appdev side, the DB is just a necessary evil, something that lets you persist information. For many years flatfiles were common, but people needed things like concurrency, ACID transactions, etc. Now, RDBMSs seem to be getting less relevant, having been designed for a different time and era in computing. In the future, building a internet app on a traditional RDBMS instead of a distributed hashtable may make no more sense than building one on a series of flatfiles today. Why would you go through the pain of handling FK errors, deadlocks, etc. if you can just make one simple web service call to update or retrieve your data.