Wordpress Amazon S3 CDN plug-in


First of all; you kids are going to have to forgive me for a non-photography post; I’ve coded a brand new thing which I’m rather excited about, and I figured I’d share it with the world.

Basically, I had a rather unique problem: I’ve been using Donncha’s rather awesome WP Super Cache for a long time. Then, in a moment’s weakness, I switched the Photocritic blog to W3 Total Cache. Mostly because I quite fancied using their Amazon S3 cache. The problem here was that the W3 Total Cache plug-in doesn’t play nice with Total Cache. Which is fine, I thought. But I was wrong: It turns out that to get the most out of W3 Total Cache, you need oodles of memory. Within minutes of me turning off WP Super Cache and turning on W3 Total Cache, my server was on its knees, begging for mercy. Then it crashed.

And I decided to come up with a better solution.  


A bit of background

It should probably be mentioned that for the past two years, Photocritic (and Small Aperture, my Travel Blog, my company website, and a couple of other small sites) are hosted on a single server, hosted by Slicehost. They’re lots of awesome, but you basically pay by the size of RAM available to your ‘slice’ of the server. Which is a bit sucky when you run WordPress, because WP is pretty memory-hungry. For the past few years, as mentioned, I have been running WP Super Cache, along with a nasty little hack of a plug-in which would serve (most) of my image assets from Amazon S3.

My original plug-in was buggy as hell; add a link to an image, and it went haywire. A slightly malformed image tag, and it would break the page. Not very pretty.

Anyway, so I tried to install W3 Total Cache, and whilst I like its completeness and flexibility, my server simply didn’t like it. Of course, I could have tried to tune the server to the plug-in, but considering that my nasty hack of a S3 plug-in and WP Super Cache actually had better results… I figured it was time for a new idea.

What is a CDN?

Without a CDN, your server works itself into an early grave by having to serve each image to each user. Even on a relatively simple site like Photocritic, there can be 40-50 assets on a single web page. Multiply that by a thousand web pages and ten thousand visitors, and it all adds up.

CDN stands for Content Delivery Network. Basically, it’s the idea that if you have a server serving the same things over and over again, it may make more sense to get some other server to do it.

On Photocritic, for example, a page might have about 50 images on it. Your browser will first load the HTML page, then realise that it needs to fetch each of these 50 images, too. So, in effect, it does 51 requests to my server: One for the HTML, then one for each of the 50 images. This takes up a lot of bandwidth and server processing power.

With a CDN, your server only has to worry about the dynamic parts of the page: the HTML coming out of WordPress

A better idea, then, is to only serve the requests you really need to. Like the HTML; because this changes from time to time (every time someone leaves a comment or I post a new post, a lot of different pages change slightly). Images, however, change very rarely.

So, by using a CDN, what you’re doing, is serving the HTML page, and telling your browser to go fetch the images from somewhere else. In this case: the Amazon S3 service.

You could, of course, upload your images to S3 manually, but this loses you some flexibility, and makes posting new posts a pain in the arse. Also, if all your posts include direct links to S3, and you decide to stop using them (or using a different solution later), you’re buggered.

Where does caching come into all of this?

When you look at a WordPress page, a lot of things happen in the background. The server has to load in the blog settings, then the theme (that’s the design of the page). Then, it has to pull in all the comments, the categories, all the bits and pieces in the sidebar, etc.

To a webserver, that’s actually quite a lot of work: Every time you ‘render’ a page, it has to do dozens of calls to a database to put the page together. Like this:

Without caching

Of course, there’s nothing wrong with this: you have to render the page: If you don’t, nobody would be able to see anything.

The challenge is this: How often does the page actually change? I update my blog perhaps a few times per month. In addition, there are about 30 or 40 comments per day. That means that about 95% of the times somebody loads a page on the Photocritic site, it is identical to the last time somebody loaded the page. So why am I abusing my server by having to re-create the page each time somebody wants to look at a page?

Of course, I don’t have to: this is where caching comes in. WP Super Cache does this bit for me, but the way it works is interesting. The server basically skips the whole ‘page generation’ bit, and serves static HTML pages instead.

If there isn’t a cached version of a page, the user has to wait a little bit longer than usual whilst the cached version is being generated. Once it’s finished, they get the static version. The next visitor visiting the same page will get the cached page, served much faster (and with less strain on the server):

The red bit is where a cached version of a page is created.

How caching and CDN work together

Caching and CDN solve two different challenges of serving web pages:

Caching takes care of all the database requests needed to render a single HTML page. Instead of having to render each page every time someone loads the page, it gets rendered once, and then served to my users many times (until the cache ‘expires’ – I have mine set to refreshing the cached pages every hour or so – or when the page changes).

The CDN takes care of the static assets, so the server doesn’t have to serve the same image files again and again. By offloading this job to another server, my server has to deal with 80-90% fewer requests per page load.

Together, these two solutions dramatically improve the user experience of my WordPress blog, but it also means that my server can handle more traffic, responds faster to requests, and becomes cheaper.

Enter the WP Kamps Amazon S3 CDN!

So I sat down, threw away my original plug-in (which was called “S3 Hack” for an excellent reason: it was a hack, never meant for long-term use), and started afresh.

The logic in the plug-in works a little something like this:

Logic diagram for the S3 CDN plugin v0.1

Or, if you are more of a list person:

  1. It captures the entire page
  2. Finds any image tags
  3. Checks the SRC attribute of the image tag
  4. If the image referred to runs on Photocritic …
  5. … checks if the image exists on the S3 CDN cache …
  6. … and replaces the URL with the CDN version if it is found
  7. If the image isn’t found, it tries to write the file to the S3 CDN
  8. Then returns the HTML page with the relevant URLs re-written

Sounds pretty simple, yes? And it is, of course. But it works incredibly well; The image assets are served from Amazon S3, the HTML files are served from the database (or WP Super Cache, if they are cached).

Can I try it?

You’d be crazy to, because it’s an early beta version. But if you fancy it, knock yourself out.

To install it:

  • Get an Amazon AWS S3 account
  • Download WP Kamps Amazon S3 CDN Plugin
  • Open the S3-replace.php file, and edit the preferences near the top of the file
  • Upload it to your Plugins folder, then activate it via your WordPress
  • If you are using a caching solution; Clear your cache.

… And that’s it, really. Your image files will now be served from the Amazon S3 CDN solution.

Version history

v0.5.7 – Adds a database layer to avoid extraneous calls to S3 to check whether the files are there.

v0.2 – Re-packages the 0.1 release with fewer extraneous files.

v0.1 – First release of the plug-in.


Q: How do I update an image in the CDN?

A: At the moment, you can’t. All you can do is delete the image from the CDN manually. You can also delete all the files off the CDN; it’ll re-build over time, as people are requesting your files.

Q: How can I turn it off?

A: Just disable the plug-in, and clear your cache – your files will be served from your own server again, and you can safely delete them from your S3 account.

Q: I need help…

A: At the moment, I haven’t got the capacity to offer much in the way of advice, I’m afraid. In due course, I’m hoping to be able to create proper documentation and an admin configuration screen within WordPress so you don’t have to edit your files manually.

Huge thanks go to…

… W-Shadow for his ‘How to filter whole pages in WordPress‘ article.

… S.C Chen for his PHP Simple HTML DOM framework, which is included in this plug-in

… Donovan Schönknecht for his PHP S3 library, also included as part of this plug-in.

… Matt Kane of CleVR and BeeTight fame, for pointing me in the right direction for a couple of tricky questions I had right at the beginning of starting to develop this thing

What’s still to come?

  • Admin panel to change the settings of the plugin
  • Automatic creation of buckets in S3
  • Configuration test

Do you enjoy a smattering of random photography links? Well, squire, I welcome thee to join me on Twitter -

© Kamps Consulting Ltd. This article is licenced for use on Pixiq only. Please do not reproduce wholly or in part without a license. More info.