Your Drupal On Memcached

November 25, 2009

The first post on how to run big drupal has got to be on memcached. It is only one aspect of managing high volume with drupal, but it is the critical one. If you don’t think you need memcached then you don’t need to be reading this blog. If you desperately need to be reading this blog because you can’t keep you site up, you need memcached.

Memcached is a distributed caching system that is the lifeblood of most major sites on the internet. It is certainly the lifeblood of any Web 2.0 site that encourages user interaction and is therefore truly dynamic. I won’t go into the details of how memcached works and what makes it so good, suffice to say the clientele speaks for itself.

Drupal has a good emphasis on caching and future posts will go deeper into the mechanics of how Drupal manages caching. By default however, all the caching is managed in mysql. This may seem counter-intuitive since typically caching is used to avoid going to the database, but if you think about it this makes sense. The caching mechanism simply denormalizes the data and sticks it into cache tables. In this way you get all the goodness of separating data from presentation, something which drupal excels at, but you don’t get all the resource sucking JOINs. MySql is fast, and this works fine for most drupal sites, but not big drupal sites.

For big drupal sites the database will always be your first bottleneck. Of course once you start using memcached and optimize your mysql you will find new bottle necks, but this is your first one. Drupal has a module for mostly anything and this includes a pretty good memcache module for swapping out the mysql caching. You will need to patch the caching mechanism as well to get the most out of this, but there’s a module for that too. Here a few more hard earned lessons on the setup.

Use multiple memcached daemons per server. Drupal conveniently breaks up the caching by object types to store them in different mysql tables, you’ll want use this same mechanism to break up your memcached traffic as well. Each daemon can run on a different port with different memory settings. You can combine some, but the biggest ones you’ll want to run separately, particularly your page cache, node cache, menu cache, path cache, block cache and probably your views cache (though earlier versions of drupal only cache view query syntax, not view data). The advantage to doing this is that you can manage your cache sizes more granularly and most importantly measure your cache effectiveness. If you lump all your cache buckets together it is much more difficult to measure your hit rates and traffic volume, which is critical for tuning. We use cacti to measure and monitor memcached, but that’s a future post.

Use separate memcached servers. It is tempting to run you memcached daemons on your apache servers, but resist that temptation. When traffic spikes, you need to know if it is your apache or memcache that is causing the high loads (hint: it’s apache). Also, if your apache servers start getting backed up under load it will slow down your memcached, which will take down your whole farm. So by separating your memcached servers you will be less likely to get into a cascading load issue. More on that in future posts as well. Finally, as with separate daemons, separate servers will help you tune and diagnose your setup.

Scale your servers to meet your needs. Even pre-launch you can get a decent idea of how much memory and how many servers you’ll want. Generally these servers don’t need much CPU. The amount or RAM necesary will depend on your application. Do the math! I cannot emphasize this enough and will continually go back to it in this blog because it is something that is consistently left out. How much memory do you need? Look at your object sizes in the mysql cache tables and add them up. (hint: probably not a lot). You don’t need to be exact, you just need to get it within the correct order of magnitude. We typically run 3:1 apache servers to memcached servers. When we separated the servers out on mylifetime.com it made a tremendous difference.

That’s it for now. Future posts will cover mysql optimization, apache optimization, diagnostic tools and monitoring packages, and much more, all with drupal specifically in mind.

Advertisements