02 Oct

How to deal with Black Friday

Black Friday, the day after Thanksgiving (US), is one of the biggest high street shopping days of the year.  Makes sense – a lot of Americans are off work that day, and for many it’s close to the last payday before Christmas.

In recent years, most of the action has moved into digital retail, which is good news for shoppers who want to avoid an actual fistfight over that last HD television. There’s even a .blackfriday Top Level Domain to capitalise on the increased consumer engagement.

The global rise of Black Friday for online retailers has certainly been interesting, but not without challenges. Even some of the most respected retailers can struggle to deal with the increased traffic. It’s usually more than just one day, though. Last year we saw plenty of “pre Black Friday” sales, and many of the deals will be on until Christmas. Argos took a whole Black Friday week. Amazon lead their own sales with Prime days and prolonged campaigns.

One thing I found most interesting: many retailers see traffic or conversions increase even when not advertising specific sales or deals. Some of our customers have been caught out by that.   People just seem to have a “buy now” mindset regardless.

Here are the things you should be doing to prepare your online store for Black Friday, or any other big sale event.

See also: another article I wrote about common pitfalls with extreme traffic. 

Performance testing

First of all, you need to know where your website actually stands in terms of traffic, and you need to do it properly.  If you have them, use analytics from Black Friday and Cyber Monday last year.

The key to success here is to use accurate user journeys and conversion rates in your load tests. For example, you might convert at 5% all year round, but as much as 50% on Black Friday if you’ve got a great deal on and your customers all pile in to the checkout. I’ve seen this happen, where a site was overwhelmed with a much higher conversion rate than was tested. A good problem to have, but it needed rapid action. Luckily we were already using Cloud, so it didn’t take long to scale out further. Additionally, our DBA team found and fixed a database locking issue (see below), which could have been picked up earlier with the right performance testing.

Human think times also play a big part in this kind of testing, which, again, can have an order-of-magnitude effect on the concurrency figures produced.

Don’t: just rely on tools like siege, or services like loader.io and blitz.io. They can be extremely useful of course, but only if you are able to interpret the results properly. Unless you have a deep understanding of HTTP protocol headers, cookies, unique session IDs, these tools might not give you the insight you need.

Do: get it done professionally. You need a test that can mimic actual human user journeys, and repeat them on a massive scale. jMeter is good for that, if you have the resources/bandwidth, but doing it right can be complicated and very time consuming. I would argue this is best left to professional performance testers; this is specialist work and a necessary investment for your website’s future. Your hosting provider might offer professional load testing services, or refer you to someone who can. Soasta are excellent.

Do: do this in plenty of time.  If you come out with a laundry list of changes, you need to be able to act on them before the shopping season starts.

 

Be ready to scale infrastructure

Black Friday is one of the best examples of how Cloud resources can be used to boost capacity when needed. Or, you might just need to increase your physical server resources with a CPU upgrade. More likely, it’s both. Talk to your hosting provider about Cloud Bursting or Hybrid hosting models, which can work really well for seasonal workloads.

Quick note about Autoscaling, if you use it. Use time-based events, or scale manually ahead of time, rather than waiting for Autoscale algorithms to detect the load, and kick in. It might be quite a few minutes before the new servers/containers are in place and ready, and you might be losing sales in that time. Bear in mind that everyone else in your public Cloud might be doing the same, so allow extra time in case the APIs are a little slower than usual.

Go big or go home: Cloud Resources, even with high CPU/memory, are great value when only online for a few days. Don’t be afraid to spin up those large instances. A couple of hundred dollars/pounds/euros is definitely worth the investment for the one weekend where you can’t afford downtime.

 

Have analytics in place

…such as NewRelic, AppDynamics, or similar. You need to know if your customers are getting a good experience. If they’re not, these tools will help you pinpoint areas for improvement at the application level.

 

Investigate database locks

Review your code. Log slow queries.  Database level locks can be a key limiting factor in concurrent transactions, and no amount of severs, CPU or memory will fix that. It is usually a case of changing table engines or removing/reworking locking queries. Investing in professional DBA time will be money well spent, especially if you can line that up with a performance/load test.

 

Cache all the things

Goes without saying – many applications will include some kind of caching layer. If not, there are specific HTTP caching tools and techniques. For Magento, I’ve written a couple of articles on Full Page Cache, and the Mirasvit FPC extension which I really like.

However, don’t just assume because the “Full Page Cache” box is ticked, it’s all rosy. Test thoroughly to make sure caches are actually working as expected. It’s not uncommon for a Magento extension to be blocking the Full Page Cache from being used properly. Be wary of any unique URL parameters in use, especially from mailshots (more on that later), which can sometimes completely bypass your caches.

If you can time curl -I your homepage, product and category pages, and they return within about 200 milliseconds, you’re probably OK.  Take the time to understand the  HTTP response headers; they usually give you a good indication of whether or not you are hitting a cache somewhere in the application stack.

https://tools.pingdom.com/ is a good way to measure that Time To First Byte (TTFB) and other factors in page rendering speed. You might need to run the test two or three times to get a feel of what’s being cached, and where you can make improvements. Combine these with NewRelic APM data, or for example, and perhaps varnishstat if appropriate; this you should give you a solid grasp of how effective your cache setup really is.

 

Use a CDN

At the very least, you should be using a CDN for static assets (images, CSS, etc) which can often be the bulk of the traffic. Especially if you do business overseas, where geographic latency comes into the mix. This is going to reduce load at the network level, and keep your server resources free to concentrate on the dynamic content like shopping carts and checkout.  In Magento, it’s trivial to change the URLs for your Media and Skin elements, to serve them via a CDN. 

Running your entire domain behind a CDN can also offer a host of security benefits, like WAF and DDoS protection. If you’re not already using the likes of CloudFlare, Incapsula, Akamai, Fastly, et al, then talk to your dev or hosting provider about implementation.

Consider full-page or aggressive caching at the CDN level where possible. Caution: This has its challenges, and perhaps one for another article. But done right, it can help you serve traffic on a significantly bigger scale. 

 

What not to do:

Don’t send customers to dynamic pages – E.g. mailshots or social campaigns.

If 20,000 Facebook users all hit a page which has to be uniquely generated on your servers, you will definitely have performance/scaling issues sooner or later. 

URL parameters. Critically, links from mailshots or ad campaigns often have unique or tracking identifiers. Same content, but each request is unique enough to bypass your various caches. To find out if that’s happening, your web server access logs should tell the full story of any URL parameters making their way through.  If you can’t remove the URL parameters at source – you might not be able to change the way your mailshot provider works –  then make sure your application (or cache) is going to ignore or ‘normalize’ the parameters. You could achieve this with Varnish VCL (only if you are already using Varnish) or perhaps work around it with a mod_rewrite rule to strip the parameters.  

Don’t overdo the AJAX. If you pages are static or cached (good), but have a ton of dynamic Javascript making requests, then you might still be causing undue load on the application layer.

Try to avoid that: don’t start unnecessary sessions; skip the POST requests where no session is present; perhaps even remove some of the dynamic blocks from your templates. Examples:

  • Do you really need the whizzy cart contents popping up onMouseOver(), causing server load on every page? Or, would a simple link to the shopping cart do the job?
  • Recently Viewed Products: often a unique block which can be removed to reduce traffic.
  • Search autocomplete: depending on your implementation, this might be making an AJAX call on every keystroke.

Obviously this is a functionality vs. performance trade-off, but it might be the one thing keeping your website online during the big sale event.

Instead, use as much static content as possible, because this will always be faster and create the least load on your infrastructure. If you don’t have the time for a code change, in an emergency, you could use web server rewrite rules (or similar) to return a 503 for those requests.

Landing pages should be flat HTML if possible, perhaps then offering links through to the real dynamic content. Having this first step can reduce server load by an order of magnitude.  If it absolutely has to be a dynamic page, then make sure it’s fully cacheable (see: Cache all the things, CDN), does not have unique URL parameters, and the page doesn’t have too many dynamic elements.

 

Performance testing, again 

Ideally you want to measure the difference after any changes to your infrastructure, application stack, code, or database.

This can be an iterative process – rinse and repeat until you’re really happy with the performance and capacity.