Drupal supports MySQL as it’s primary database. Postgres is also supported, but core development usually lags behind MySQL and contributed modules rarely support it. We experimented with converting to DB2, since IBM was showing some signs of supporting it, but after being swarmed by IBM reps with dollar signs in their eyes this started to seem like a bad idea.

The reason we started looking at the database was because it represents the first and most fundamental bottleneck for large Drupal sites (as well as any dynamic site for that matter). A single, uncached Drupal page can easily make over 100 calls to the database. Dries has a good blogpost about this. With a bunch of sloppy modules this can easily run into several hundred calls per page. Even a cached page has 10 or more database queries (though with memcached you can do it with no database calls).

This bottleneck can be handled first and foremost by limiting your queries with good coding practices. Scrutinize contributed modules carefully for bad practice like querying the db in a loop instead of using a JOIN. Seems obvious I know, but you’d be surprised. Beyond the basics, there are a few other fundamental strategies.

Slave Labor

MySQL supports setting up databases in a master-slave relationship called replication. This entails one database (the master) sending all it’s INSERT and UPDATE statements to a second (the slave), which (mostly) keeps them in sync all the time, so you are then free to do SELECT statements from either database.

It isn’t that difficult to redirect a single read from the Master to a Slave, and you can make short work out of redirecting your most frequently called SELECT statements to the Slaves. You can even grep SQL statements and dynamically send them to the slave, but this is ill advised. The gotcha is the replication lag.

The typical replication lag is well under a second. Here’s a graph of a 4 day period:

Typical MySQL Replication Lag

Typical MySQL Replication Lag (in milliseconds)

As you can see, the lag never went longer than 1/2 second. While 1/2 second is not long, it is far longer than the time it takes to make a typical request on a healthy site. It is not uncommon for Drupal modules to write to the database and then read the data back in a single request. It is very common to write the data in a POST then redirect the response to a GET which immediately reads the data back. If you are writing to the master and reading from the slave, either of these scenarios will look like a site failure. If user’s don’t see their comments, pictures, hopes and dreams show up on your site immediately they will post repeatedly or leave in disgust.

With this in mind, you need to be certain that you know under what conditions a SELECT statement is made before you send it to a slave.

One Bad Apple

In a high volume setup, it only takes one bad query to bring your site down. We recently had a situation where a query generated automatically from a view (more on that later) might take as long as a minute to run. This query was not run very frequently, and most of the time, even in high traffic periods, every thing was fine. However, if too many of these queries where called under moderately heavy traffic it would cause the MySQL threads to start piling up. Once you have a stressed database, queries that normally take 10 milliseconds or less can start taking several seconds, multiply that by 1000 and you’ve quickly used all your database connections and your site is down.

The most effective way to find the bad queries is with the slow query log, which records queries that take longer than a set threshold. We keep this at it’s lowest threshold (1 second), and leave it on all the time. I have a DBA go through this log at least once a month and after every major release. Queries that were fine last month can become slow once a table gets too big, and queries that were fine in QA can collapse under a real world load. A great tool for trolling through these logs is mysqlsla, it will very quickly bring issues to light.

Hazy Views

A View is an automated way of creating queries in Drupal. The powerful thing about them is that many modules use Views to allow for flexible data sets. So for instance if a image gallery module gets its it’s data from a View, you can create galleries of photos, blog posts, videos, or anything with an image. They are great and we use them all the time.

Of course, whenever you are automating query creation you are asking for trouble. Views will not create the best query all the time, and they are frequently the source of queries that show up in the slow query log. Sometimes you can fix it by adding an index, but more often you need to throw away the view entirely and write a specific query for a specific type of content. So it goes.

Sharding Possibilities

Really, really big sites have sometimes dealt with database bottlenecks by sharding. Sharding means breaking up your data across several databases. Facebook, Friendster, Flickr and LiveJournal all do this in one form or another. I don’t know of anyone doing it (truly) in Drupal. We do a modest form of sharding by breaking up different sections of the site into different Drupal instances running off of different databases. To the user it looks like one site, but in the background it is several sites with apache handling the request distribution based on url paths.

We’ve occasionally considered the effort involved in truly sharding Drupal, breaking up the node table or something along those lines, but in the end we’ve always fallen back on simpler strategies which have proven to be effective.

My Favorite Tune

There are many more common MySQL tuning techniques that I didn’t mention but you should look into. Things like MyISAM vs. InnoDB, adding RAM and config tuning. All this is good for any site using MySQL and not Drupal specific.

Where Do You Go From Here?

If all this isn’t doing it for you there are a few other things we’ve considered but not tried. Commercial replication products are available for scaling your Master databases horizontally but they have limits. MySQL does support Master-Master replication out of the box, but as far as I know everyone who’s tried this with Drupal has failed to get it to work as the databases quickly get out of sync. There are also hardware solutions tuned to get the most from a single MySQL master.

If you’re considering any of these however you should probably take another look at your architecture, because in the end your site will only be truly ready to scale if it’s built that way, and with a little effort (ok, more than a little) you can build it in Drupal.

Your Drupal On Memcached

November 25, 2009

The first post on how to run big drupal has got to be on memcached. It is only one aspect of managing high volume with drupal, but it is the critical one. If you don’t think you need memcached then you don’t need to be reading this blog. If you desperately need to be reading this blog because you can’t keep you site up, you need memcached.

Memcached is a distributed caching system that is the lifeblood of most major sites on the internet. It is certainly the lifeblood of any Web 2.0 site that encourages user interaction and is therefore truly dynamic. I won’t go into the details of how memcached works and what makes it so good, suffice to say the clientele speaks for itself.

Drupal has a good emphasis on caching and future posts will go deeper into the mechanics of how Drupal manages caching. By default however, all the caching is managed in mysql. This may seem counter-intuitive since typically caching is used to avoid going to the database, but if you think about it this makes sense. The caching mechanism simply denormalizes the data and sticks it into cache tables. In this way you get all the goodness of separating data from presentation, something which drupal excels at, but you don’t get all the resource sucking JOINs. MySql is fast, and this works fine for most drupal sites, but not big drupal sites.

For big drupal sites the database will always be your first bottleneck. Of course once you start using memcached and optimize your mysql you will find new bottle necks, but this is your first one. Drupal has a module for mostly anything and this includes a pretty good memcache module for swapping out the mysql caching. You will need to patch the caching mechanism as well to get the most out of this, but there’s a module for that too. Here a few more hard earned lessons on the setup.

Use multiple memcached daemons per server. Drupal conveniently breaks up the caching by object types to store them in different mysql tables, you’ll want use this same mechanism to break up your memcached traffic as well. Each daemon can run on a different port with different memory settings. You can combine some, but the biggest ones you’ll want to run separately, particularly your page cache, node cache, menu cache, path cache, block cache and probably your views cache (though earlier versions of drupal only cache view query syntax, not view data). The advantage to doing this is that you can manage your cache sizes more granularly and most importantly measure your cache effectiveness. If you lump all your cache buckets together it is much more difficult to measure your hit rates and traffic volume, which is critical for tuning. We use cacti to measure and monitor memcached, but that’s a future post.

Use separate memcached servers. It is tempting to run you memcached daemons on your apache servers, but resist that temptation. When traffic spikes, you need to know if it is your apache or memcache that is causing the high loads (hint: it’s apache). Also, if your apache servers start getting backed up under load it will slow down your memcached, which will take down your whole farm. So by separating your memcached servers you will be less likely to get into a cascading load issue. More on that in future posts as well. Finally, as with separate daemons, separate servers will help you tune and diagnose your setup.

Scale your servers to meet your needs. Even pre-launch you can get a decent idea of how much memory and how many servers you’ll want. Generally these servers don’t need much CPU. The amount or RAM necesary will depend on your application. Do the math! I cannot emphasize this enough and will continually go back to it in this blog because it is something that is consistently left out. How much memory do you need? Look at your object sizes in the mysql cache tables and add them up. (hint: probably not a lot). You don’t need to be exact, you just need to get it within the correct order of magnitude. We typically run 3:1 apache servers to memcached servers. When we separated the servers out on mylifetime.com it made a tremendous difference.

That’s it for now. Future posts will cover mysql optimization, apache optimization, diagnostic tools and monitoring packages, and much more, all with drupal specifically in mind.