01 Jun

6 reasons your Magento site went down

This article is about extreme traffic overwhelming your site.

There’s plenty of marketing you can do to drive traffic, but TV appearances are by far the most hard-hitting. Your target audience is sitting on the sofa with a laptop or tablet at the ready, and they’re all going to hit your site within the same 10 seconds or so. Email campaigns can have a similar effect, but usually you can stagger delivery to limit the impact. Read on especially if you’re a startup planning on launching with a big bang.

Specifically, we’re talking about:

Before we start, I’m assuming your site is well hosted and generally performs well under normal conditions (Magento category page Time To First Byte < 1.0 second without FPC). If it doesn’t, then stop reading. These will be band-aids rather than solutions.

Here are six things you can do to prepare for huge traffic spikes.
 
 

1. Plan for failure

Despite everything else in this article, it can be hard to gauge just how much of a load spike you’ll get.  Get some error pages in place, at every level possible. Your developers or digital agency should be able to knock something up in no time.

Do: Think about including a discount code on error pages, encouraging your customers to come back later. I’m told it really works.

Do: Make it look nice, on brand, including contact details. Some are light-hearted and fun – I’ve even seen Pac-Man embedded to play while waiting – but the main message you need is to encourage a repeat visit. Default or generic error pages do not inspire confidence in your brand.

Don’t: Include any images, CSS, etc, on this error page from your web servers, in case they’re not responding. Host these assets elsewhere; a CDN would be perfect.

 
 

2. Full Page Cache

A good Full Page Cache is essential to absorb the majority of your traffic, and the majority of server load.

Can be quite a complicated thing to get right though, so I wrote a separate article with my thoughts on Magento Full Page Cache.

If you’re doing it right, your pages and categories should be coming back in around less than 100 milliseconds.

 

3. Database

First, use persistent connections.  I’ve seen the sudden influx of DB connections overwhelm the TCP stack on the DB host. Make sure MySQL is configured to accomodate that, though. Every single PHP-FPM or Apache child process could be hanging onto a connection. We can do some quick maths here: If you have ten web servers with pm.max_children=250, then your MySQL max_connections needs to be 2500. Add a few more for any monitoring or diagnostics.

Configure it in local.xml:

<connection>
    <host><![CDATA[magento_db_host]]></host>
    <username><![CDATA[magento_db_user]]></username>
    <password><![CDATA[magento_db_pass]]></password>
    <dbname><![CDATA[magento]]></dbname>
    <active>1</active>
    <persistent>1</persistent>
</connection>

Replication?

No.
Three points about this:

  1. Database is not normally the bottleneck. CPU for PHP execution is. You will want a reasonably powerful machine with good I/O, but so long as your FPC is effective, it’s very unlikely that you’ll need to scale out.
  2. Broken shopping carts, and other weird behaviour. Replication takes time, like a second or two, which can be long enough to cause a problem. For the most part, the Magento read/write separation does account for replication delay but third party extensions might not. It’s especially important when they are related to shopping carts or checkout functionality.
  3. Replication is not resilience.  I want to mention this,  because I think a lot of people ask for replication out of a misunderstanding that it’ll make the site more resilient or Highly Available. A master-slave setup still has single points of failure, and Magento will error out if it can’t connect to the master or ANY of your slaves. A multi-master implementation could work if you have a floating IP, but in my experience the complexity far outweighs the benefit. At Rackspace, our go-to HA solution to run MySQL (Percona usually) under the Red Hat Cluster Suite, and that works brilliantly. Magento gets one Database connection, which is a floating IP between resilient nodes. The backend servers are often less powerful than the Web servers; see point #1

That’s it for the database. I’m not going  into general DB optimisation here, but Major Hayden’s mysqltuner.pl is a good start.

 
 

4. Backend Cache Scaling

Most of the time, you’ll be sharing your Redis cache between web nodes. This is important for management of the cache via Magento admin. At a massive scale though, you can overwhelm the physical network, TCP stack on the Redis host, and run into performance problems because Redis is single-threaded.

Too many servers on one Redis instance

The simple solution is to install a local Redis instance on each web server, and connect on localhost or a UNIX socket. Cuts out the network load completely, and scales out to the Nth degree.

One Redis instance on each Web server

The major disadvantage of doing this though is that management operations, like clearing the cache, or general invalidation when you make changes, will not happen across the board.  Here’s a quick-and-dirty proof of concept bash script for clearing out all your caches at once, assuming you are also configuring each to listen on your local or isolated network:

REDIS_SERVERS="192.168.100.5 192.168.100.6 192.168.100.7 192.168.100.8 192.168.100.9"
for server in $REDIS_SERVERS; do
    echo -e "FLUSHALL" | nc $server 6379
done

NB: This kind of cache setup is only for extreme cases; 99% of the time a single Redis instance is OK for Magento cache. I like to use a second one for Enterprise full_page_cache. You could look into Redis sharding for ultimate performance, but that’s a little more complicated than we need here. This is for a one-off event, and when it’s done you can scale back you the single Redis instance for easier cache management.

NB: Cache storage must not be confused with Session storage. It often is, when the same technology is involved. Despite the above, I would still advise keeping all your sessions in one place, mainly because I don’t like to rely on load balancers’ session persistence. It’s very unlikely to saturate the network as the Cache traffic can. I prefer Memcached over Redis for sessions; it’s simple and multi-threaded. On that note, ensure MAXCONN and CACHESIZE are suitably configured.

 
 

5. CDN

Content Delivery Networks are not a magic solution, and usually have no effect on those initial page loads, nor the PHP load on your web servers. While some CDNs do have full page caching features, I haven’t seen anyone successfully integrate them into an application as complex as Magento.

What a CDN will do, however, is speed up the delivery of extra content for the overall page load. Especially if you’ve got an ocean between your customers and your server(s). If all your customers are in the same country as your server, though, it probably won’t be that much faster and might not be worth the effort.

The biggest advantage for me is to reduce the network load on your infrastructure. On most Magento stores (most websites in general), the bulk of actual data content is product imagery. Offloading that to a CDN will definitely help to avoid network saturation, and load on net devices like firewalls and load balancers.

You need to be using a CDN which pulls from origin; the days of trying to upload with ImageCDN are long gone. And for faster pageloads, you can use separate URLs for your skin, media, and javascript elements, leveraging parallel downloads. Once those are set up, it’s pretty trivial in Magento to configure the URLs under System > Configuration > Web. It might be a little more work for SSL, but if most of your window shopping is done over plain HTTP then start with the unsecure base URLs for the quickest win.

 
 

6. Load testing

You need to know where your website actually stands in terms of traffic, and you need to do it properly.

Don’t: rely on tools like siege, or services like loader.io and blitz.io. They can be extremely useful of course, but only if you are able to interpret the results properly. Unless you have a deep understanding of HTTP protocol headers, cookies, unique session IDs, these tools probably won’t help you all that much.
 
Do: get it done professionally. You need a test that can mimic actual human user journeys, and repeat them on a massive scale. jMeter is good, but doing it right can be complicated and very time consuming. I would argue this is best left to professionals who do just that. Not cheap, but a necessary investment for your website’s future. Your hosting provider might offer professional load testing services, or refer you to someone who can. Soasta are excellent.