02 Oct

How to deal with Black Friday

Black Friday, the day after Thanksgiving (US), is one of the biggest high street shopping days of the year.  Makes sense – a lot of Americans are off work that day, and for many it’s close to the last payday before Christmas.

In recent years, most of the action has moved into digital retail, which is good news for shoppers who want to avoid an actual fistfight over that last HD television. There’s even a .blackfriday Top Level Domain to capitalise on the increased consumer engagement.

The global rise of Black Friday for online retailers has certainly been interesting, but not without challenges. Even some of the most respected retailers can struggle to deal with the increased traffic. It’s usually more than just one day, though. Last year we saw plenty of “pre Black Friday” sales, and many of the deals will be on until Christmas. Argos took a whole Black Friday week. Amazon lead their own sales with Prime days and prolonged campaigns.

One thing I found most interesting: many retailers see traffic or conversions increase even when not advertising specific sales or deals. Some of our customers have been caught out by that.   People just seem to have a “buy now” mindset regardless.

Here are the things you should be doing to prepare your online store for Black Friday, or any other big sale event.

See also: another article I wrote about common pitfalls with extreme traffic. 

Performance testing

First of all, you need to know where your website actually stands in terms of traffic, and you need to do it properly.  If you have them, use analytics from Black Friday and Cyber Monday last year.

The key to success here is to use accurate user journeys and conversion rates in your load tests. For example, you might convert at 5% all year round, but as much as 50% on Black Friday if you’ve got a great deal on and your customers all pile in to the checkout. I’ve seen this happen, where a site was overwhelmed with a much higher conversion rate than was tested. A good problem to have, but it needed rapid action. Luckily we were already using Cloud, so it didn’t take long to scale out further. Additionally, our DBA team found and fixed a database locking issue (see below), which could have been picked up earlier with the right performance testing.

Human think times also play a big part in this kind of testing, which, again, can have an order-of-magnitude effect on the concurrency figures produced.

Don’t: just rely on tools like siege, or services like loader.io and blitz.io. They can be extremely useful of course, but only if you are able to interpret the results properly. Unless you have a deep understanding of HTTP protocol headers, cookies, unique session IDs, these tools might not give you the insight you need.

Do: get it done professionally. You need a test that can mimic actual human user journeys, and repeat them on a massive scale. jMeter is good for that, if you have the resources/bandwidth, but doing it right can be complicated and very time consuming. I would argue this is best left to professional performance testers; this is specialist work and a necessary investment for your website’s future. Your hosting provider might offer professional load testing services, or refer you to someone who can. Soasta are excellent.

Do: do this in plenty of time.  If you come out with a laundry list of changes, you need to be able to act on them before the shopping season starts.

 

Be ready to scale infrastructure

Black Friday is one of the best examples of how Cloud resources can be used to boost capacity when needed. Or, you might just need to increase your physical server resources with a CPU upgrade. More likely, it’s both. Talk to your hosting provider about Cloud Bursting or Hybrid hosting models, which can work really well for seasonal workloads.

Quick note about Autoscaling, if you use it. Use time-based events, or scale manually ahead of time, rather than waiting for Autoscale algorithms to detect the load, and kick in. It might be quite a few minutes before the new servers/containers are in place and ready, and you might be losing sales in that time. Bear in mind that everyone else in your public Cloud might be doing the same, so allow extra time in case the APIs are a little slower than usual.

Go big or go home: Cloud Resources, even with high CPU/memory, are great value when only online for a few days. Don’t be afraid to spin up those large instances. A couple of hundred dollars/pounds/euros is definitely worth the investment for the one weekend where you can’t afford downtime.

 

Have analytics in place

…such as NewRelic, AppDynamics, or similar. You need to know if your customers are getting a good experience. If they’re not, these tools will help you pinpoint areas for improvement at the application level.

 

Investigate database locks

Review your code. Log slow queries.  Database level locks can be a key limiting factor in concurrent transactions, and no amount of severs, CPU or memory will fix that. It is usually a case of changing table engines or removing/reworking locking queries. Investing in professional DBA time will be money well spent, especially if you can line that up with a performance/load test.

 

Cache all the things

Goes without saying – many applications will include some kind of caching layer. If not, there are specific HTTP caching tools and techniques. For Magento, I’ve written a couple of articles on Full Page Cache, and the Mirasvit FPC extension which I really like.

However, don’t just assume because the “Full Page Cache” box is ticked, it’s all rosy. Test thoroughly to make sure caches are actually working as expected. It’s not uncommon for a Magento extension to be blocking the Full Page Cache from being used properly. Be wary of any unique URL parameters in use, especially from mailshots (more on that later), which can sometimes completely bypass your caches.

If you can time curl -I your homepage, product and category pages, and they return within about 200 milliseconds, you’re probably OK.  Take the time to understand the  HTTP response headers; they usually give you a good indication of whether or not you are hitting a cache somewhere in the application stack.

https://tools.pingdom.com/ is a good way to measure that Time To First Byte (TTFB) and other factors in page rendering speed. You might need to run the test two or three times to get a feel of what’s being cached, and where you can make improvements. Combine these with NewRelic APM data, or for example, and perhaps varnishstat if appropriate; this you should give you a solid grasp of how effective your cache setup really is.

 

Use a CDN

At the very least, you should be using a CDN for static assets (images, CSS, etc) which can often be the bulk of the traffic. Especially if you do business overseas, where geographic latency comes into the mix. This is going to reduce load at the network level, and keep your server resources free to concentrate on the dynamic content like shopping carts and checkout.  In Magento, it’s trivial to change the URLs for your Media and Skin elements, to serve them via a CDN. 

Running your entire domain behind a CDN can also offer a host of security benefits, like WAF and DDoS protection. If you’re not already using the likes of CloudFlare, Incapsula, Akamai, Fastly, et al, then talk to your dev or hosting provider about implementation.

Consider full-page or aggressive caching at the CDN level where possible. Caution: This has its challenges, and perhaps one for another article. But done right, it can help you serve traffic on a significantly bigger scale. 

 

What not to do:

Don’t send customers to dynamic pages – E.g. mailshots or social campaigns.

If 20,000 Facebook users all hit a page which has to be uniquely generated on your servers, you will definitely have performance/scaling issues sooner or later. 

URL parameters. Critically, links from mailshots or ad campaigns often have unique or tracking identifiers. Same content, but each request is unique enough to bypass your various caches. To find out if that’s happening, your web server access logs should tell the full story of any URL parameters making their way through.  If you can’t remove the URL parameters at source – you might not be able to change the way your mailshot provider works –  then make sure your application (or cache) is going to ignore or ‘normalize’ the parameters. You could achieve this with Varnish VCL (only if you are already using Varnish) or perhaps work around it with a mod_rewrite rule to strip the parameters.  

Don’t overdo the AJAX. If you pages are static or cached (good), but have a ton of dynamic Javascript making requests, then you might still be causing undue load on the application layer.

Try to avoid that: don’t start unnecessary sessions; skip the POST requests where no session is present; perhaps even remove some of the dynamic blocks from your templates. Examples:

  • Do you really need the whizzy cart contents popping up onMouseOver(), causing server load on every page? Or, would a simple link to the shopping cart do the job?
  • Recently Viewed Products: often a unique block which can be removed to reduce traffic.
  • Search autocomplete: depending on your implementation, this might be making an AJAX call on every keystroke.

Obviously this is a functionality vs. performance trade-off, but it might be the one thing keeping your website online during the big sale event.

Instead, use as much static content as possible, because this will always be faster and create the least load on your infrastructure. If you don’t have the time for a code change, in an emergency, you could use web server rewrite rules (or similar) to return a 503 for those requests.

Landing pages should be flat HTML if possible, perhaps then offering links through to the real dynamic content. Having this first step can reduce server load by an order of magnitude.  If it absolutely has to be a dynamic page, then make sure it’s fully cacheable (see: Cache all the things, CDN), does not have unique URL parameters, and the page doesn’t have too many dynamic elements.

 

Performance testing, again 

Ideally you want to measure the difference after any changes to your infrastructure, application stack, code, or database.

This can be an iterative process – rinse and repeat until you’re really happy with the performance and capacity.  

 

31 May

Can you run Magento on a Plesk server?

TL; DR – Yes, you can. But don’t. There are a lot of reasons why you shouldn’t.

If you must… scroll down for some tips.

I get asked this question a lot, by dev agencies or shared hosting resellers – often already using a control panel like Plesk.  They’ll pick up a new customer who happens to be using Magento, upload to their shared server like every other site, and then call us when it doesn’t work.

This applies loosely to any shared hosting environment – Magento on Plesk, Magento on cPanel, ServerAdmin, or any a DIY solution with multiple websites on one server or VPS.

This article should be useful for Plesk server administrators, and Ecommerce CTOs who are looking at Magento hosting.

The right way

In most cases, a small dedicated VM,  VPS or Cloud Server can be used to host a small Magento site in a more isolated way. LAMP Config can be honed to perfection for that site, and there’s no need to tip-toe around other websites on that same server.

A VPS, or Cloud Server with 4G memory is about where you need to start.

If you’re talking about a busier website with dedicated resources anyway, a control panel like Plesk will probably just hinder your ability to configure things at a low level. In that case, get a decent sys admin instead.

Security

Let me start with some doom-mongering. Shared servers get compromised all the time. Ecommerce is an area where a compromise can do serious damage to business reputation.

On a shared server, you have no control over how many other sites are there, how long ago they updated WordPress (for example) or whether there are compromised sites running rogue on that server that the owner hasn’t even noticed.

It’s a numbers game – the chance of any one site being compromised might be relatively slim, but there might be 200 sites on that server.  This is why shared environments (of any kind) usually don’t meet PCI compliance requirements. On the upshot,  panels like Plesk do go to great lengths to try to separate websites: users, permissions, config like PHP open_basedir, etc. Application-level compromises may not affect your site;  but if it’s a root level compromise then you’ve had it.

A more everyday problem might be that you’re on a compromised server which is sending a lot of spam.  Your crucial transactional emails could get lost in the server’s million-strong mail queue, or filtered as spam because of the sending server’s IP reputation.

Remember: If your Magento site is running on a shared server, then the security of your business is in the hands of the person running that server. Quite often – especially if they’re relying on control panels – those people are not very knowledgable on security topics, let alone the underlying LAMP config. I hate to say it, but it’s true more often than not. Disclosure: My first job in tech was with an IT services company, reselling shared hosting on Plesk servers.  I definitely didn’t know much about security. Ignorance was bliss.

Performance

For Magento to work well, you need to tweak the LAMP stack a fair bit. Plesk and other control panels can limit how easy/feasible this is, either server-wide or per website.

High level examples:

  • Varnish was popular for Magento 1 and is now a crucial part of a Magento 2 stack. Varnish config is very website-specific, so implementing Varnish on a shared server is very difficult. Possible, but will undoubtedly cause problems for other websites, and the VCL will be very messy. It’s just not practical.
  • PHP version: You might even be stuck with an older PHP version, to cater for a legacy website on that same server.

Recent Plesk versions do give a lot of granularity for PHP config, even offering different PHP versions per domain, so that’s good. But those extra PHP versions might be outside of your server’s main package management. Are they getting the latest updates? See: security.

One special mention here is a PHP setting called open_basedir.  Used per website, it restricts PHP to only a certain few directories – exactly the sort of thing you want on a shared environment. Plesk uses it by default as a sensible security measure.  But …and it’s a big butopen_basedir effectively disables PHP realpath cache – an internal PHP cache which massively speeds up PHP file includes by caching filesystem paths. It makes sense, because the realpath cache is global within PHP,  so having access to that cache could break the open_basedir restriction. The downside is a big performance hit; PHP has to query the filesystem for every single include() . In Magento, we all know that’s going to be a lot. The impact can be severe.

Resources

Shared servers can sometimes be packed to the rafters with small sites. And that’s fine – many low traffic sites use barely any resource. But even a low traffic Magento site can consume a fair amount of memory and CPU. Remember that low human traffic doesn’t mean there aren’t a dozen search engine bots constantly hitting the Magento site. Layered navigation on a large catalogue can lead to a lot of crawling;  you need to factor that in.

I’ve seen this several times, where a new or growing Magento site can engulf the CPU or memory on a shared server, causing downtime for many other sites. For example, Magento recommends a 512M memory limit – and it’s not uncommon to set a memory_limit of 2GB to allow for some large product import. What if you have 12G total, and 11G used by other sites? That might only be two visitors at once.

Server resources and website performance go hand in hand, but do think about the impact that the different websites will have on each other. In terms of resources (not necessarily security) it’s probably OK to run a busy Magento site and a few other, smaller sites (like a blog). But if you’re trying to run 20 Magento sites on one Plesk server, they’re going to trip over each other sooner or later. See below for how to mitigate that.

SEO makes no difference

A little off topic, but I just wanted to mention this as a non-argument. In case anyone mentions Search Engine Optimisaton as a reason not to use a shared server, or shared IP, it absolutely doesn’t matter. Matt Cutts said so, and he is pretty senior at Google. Sharing our diminishing IPv4 space with tools like SNI is very necessary, and it will not harm your SEO rankings. Or just use IPv6 already.

 

If you must…

If you’ve read this far , you probably have commercial or technical reasons you can’t avoid using a shared server. Here is how you run Magento on a Plesk server (or any shared hosting environment).

  1. Use a Full Page Cache Magento extension, which will massively reduce the server load impact as well as speed up page loads for your customers.  I recommend Mirasvit FPC, instead of spending weeks with Varnish config.
  2. Set open_basedir=none – if you are happy with the security implications. You can do that in Plesk under the advanced scripting options per domain – here are some instructions. My advice is to only do that for the Magento website(s), but leave the other sites restricted.
  3. Ask your administrator for general PHP optimisations to the global PHP config; the most important of which is to use an Opcode cache like Zend OPcache.
  4. Stick to the main PHP version on the server where possible, so it’ll get security updates. Anything above PHP 5.4 should be OK, but watch for EOL package sources.
  5. Use Redis for Magento cache, as long as you have the memory (1G should be plenty). Plesk doesn’t need to know or care that Redis is there, but it should be configured with requirepass to prevent other sites accessing the data. If you are running multiple Magento sites, you should use a separate Redis instance for each.
  6. For Magento sessions – just use <session_save>files</session_save> . Performance impact is minimal and it’s one less thing to worry about. Otherwise – if available – use memcached.
  7. MySQL –  set max_user_connections globally. I usually set it to ~80% of the max_connections, so as not to be too limiting but prevent any one site using all connections and effectively bringing down all database-driven sites on that server.
  8. Set limits on the resource usage, where possible. Plesk can use Apache mod_bw to do that – see this guide. I wouldn’t limit by Kb/s – just use the overall connections. Again, the value here is difficult to judge and will be different for each site/server. Start with about the number of CPU cores you’re happy for this site to consume.
  9. Use a CDN, like CloudFlare. It’s free, and gives an immediate boost to page load times, especially for overseas customers. Headers for GeoIP information are also really useful if you’re an international store. It helps to reduce server/network load, and tools like the WAF (~$20/mo) can help with security.
  10. Understand the resource limits – I’m expecting even a bottom end dedicated Plesk server will have at least 4 CPU cores and 12 or 24G RAM.  If you’re trying all this on a 1G VPS, you’re doing it wrong.