18 Jun

Magento deployment checklist

Just deployed a new Magento site? Or migrated to a new host?

Here are some things you should before you launch…

Secure it

  • Magento base code may not include the latest security patches!  Check the download page for the latest and apply them. For example, Community 1.9.1.0 is still vulnerable to Shoplift out of the box.
  • Change your Admin path – anything other than the default /admin, to make your Magento backend harder to find or brute-force.  This is usually configured in your app/etc/local.xml.
    ...
      <admin>
        <routers>
          <adminhtml>
            <args>
              <frontName><![CDATA[something_unique]]></frontName>
            </args>
          </adminhtml>
        </routers>
      </admin>
    </config>
  • Admin password: if you were using “admin123” while in development, now is a good time to change it.
  • File permissions: tend to get overlooked in development. Here’s a handy guide on what they should be.

Enable caches

Magento caches are often disabled during development, but in production it’s essential that they’re all ON.  In the Dashboard, go to System > Cache Management and enable them. Related article: Magento Full Page Cache

Log cleaning

This relates to the four log_* tables in the database. They’re a bit like access logs which are not rotated by default – this is bad because it bloats your database, wastes your InnoDB buffer, and makes database backups more cumbersome.

Go to System > Configuration > System > Log Cleaning and enable  it. The default 30-day retention should be fine.

Cron job

The Magento cron job should be run every five minutes. Top tip: run cron.sh instead of cron.php. The shell script first checks it’s not already running, then runs the PHP, preventing overlaps.

*/5 * * * *  /bin/sh /path/to/docroot/cron.sh

As of Magento 1.9.1, the cron job is responsible for sending customer email so it’s more important than ever.

Indexing

If you are using Community Edition, indexing may not be a problem to start with, but one day it’s going to cause issues. Here’s my advice on how to configure Magento indexing.

Error pages

Death, taxes, and Magento 503s. Even a well tuned Magento application infrastructure can be complex and one day something will break. Or maybe we’re talking planned maintenance; installing a new extension for example.  Here’s a great tutorial on customising your Magento error pages.  It’s best to get this done early on, so you’re ready for the unexpected.

Top tip: Other reverse proxies like Load Balancers and Varnish (if using) will probably show their own 503 page when something is broken. Talk to your hosting provider about modifying these – default error pages don’t inspire confidence in your brand.

The goal here is a nice customer experience even when something is broken. Make sure there’s a phone number or email address at the very least. Including a discount code will encourage customers to come back and buy later.

Load test

If time allows, the last thing to do before launch is a load or performance test to get insight into what your new solution can really do.  It’s best to do that before you have real world traffic, otherwise load testing is basically DOS’ing yourself. Load testing is also likely to show up areas for config optimisation/performance, which is always a good thing.

Don’t: rely on tools like siege, or services like loader.io and blitz.io. They can be extremely useful of course, but only if you are able to interpret the results properly. Unless you have a deep understanding of HTTP protocol headers, cookies, unique session IDs, these tools probably won’t help you all that much.

Do: get it done professionally. You need a test that can mimic actual human user journeys, and repeat them on a massive scale. jMeter is great, but doing it right can be complicated and very time consuming. I would argue this is best left to professionals who do just that. Not cheap, but a necessary investment for your website’s future. Your hosting provider might offer professional load testing services, or refer you to someone who can. Soasta are excellent.

20 Apr

Foolproof Magento Indexing

This is for Community Edition and Enterprise Editions before 1.13.

Once you have established whether Magento indexing is breaking your site, here is the simple 1-2-3 solution.

Generally, reindexing in the daytime on a busy site can cause problems, and by default Magento will fully reindex after any product/catalogue changes. The gist of this is that you probably don’t want that to happen in peak business hours.

1. Manual indexes.

Two of the indexes are more likely to cause you problems than any of the others – the URL rewrites and the Fulltext search. Set them to manual – the others should be OK.

Magento manual indexing

System > Index Management

Alternatively you can set this directly in the database:

mysql> UPDATE index_process SET mode="manual" WHERE indexer_code="catalog_url";
mysql> UPDATE index_process SET mode="manual" WHERE indexer_code="catalogsearch_fulltext";

2. Configure a cron job to do that manual reindex, every day.

crontab -e -u username

username is the user which runs your PHP-FPM, or just apache for mod_php.  I try to avoid having root run these jobs; it creates lock files in ~/var/ which the application user will not be able to work with.

Your added cron job should look something line this:

@daily /usr/bin/php /path/to/magento/documentroot/shell/indexer.php reindexall >/dev/null 2>&1

I’ve used @daily as a cron shortcut, which is usually midnight (server timezone). You could be more specific if you like, for example if you need to avoid other jobs like database backups. This is in addition to the normal Magento cron running every 5 minutes.
Obviously you need to replace /path/to/magento/documentroot with whatever’s relevant in your hosting environment.

If you don’t have access or confidence to do this via SSH, your hosting provider should be able to help.

3. Ignore the banner.

MagentoIndexingBanner
Might seem like a silly thing to mention, but  I’ve often seen cases where a diligent member of staff was following the advice and doing the reindex, unaware that it was causing problems and will be done by the cron job anyway. If you have a large team of admin staff, just be sure to let them all know.

 

 

Business critical updates?

When I suggest this, I’m often greeted with something like, “..but it’s absolutely essential that new products are searchable and available via their URLs IMMEDIATELY!

You have a few choices here:

  1. Think about your business requirements vs. impact vs. cost. Do you really need that? All the time? If it’s just occasionally, then continue as above and deal with the occasional manual reindex in the daytime.

  2. Third party code. This extension claims to do the job. There are probably others, too. I can’t vouch for it because I’m not really a developer. As with all third party extensions, the fewer the better and, of course, YMMV.

  3. Buy the Enterprise Edition. Or upgrade if you’re on EE < 1.13. There are plenty of other reasons for this, but index management is a major factor. If you have enough products for indexing to be an issue, and it really is business critical that your indexes are up-to-the-minute fresh, and you need vendor escalation with your software, then it’s a no-brainer. Talk to your finance director, bite the bullet, and invest in software that does the job out of the box.
27 Mar

Is Magento indexing breaking my site?

Magento indexing in Community Edition 1.x (and older EE ≤ 1.12) is an absolute train wreck.

Sooner or later it’s going to take your site down, or key parts of it like searching and checkout.

This article is about how to tell whether or not this is a problem for your site.

The worst thing is that the default behaviour is to reindex every time you save a product. Chances are this will be business hours, when your content editors are working on products. Not good.

System > Index Management.

System > Index Management.

 

The two that usually take the longest are are:

  1. Catalog URL Rewrites”This holds all references to every product, including old products if you’re keeping links for SEO.
  2. “Catalog Search Index”By default this is a MyISAM table with a FULLTEXT index, so the whole table gets locked during reindexing.  In MySQL 5.6 variants we can switch this table to InnoDB for row-level locking, but there is usually still disruption.

 

How long does it take?

Longer for larger catalogs, but you can find out from the database:

mysql> SELECT * FROM index_process;

Or, a quicker way:

mysql>  SELECT indexer_code, TIMESTAMPDIFF(SECOND, started_at, ended_at) as duration from index_process ORDER BY duration DESC;
+---------------------------+----------+
| indexer_code              | duration |
+---------------------------+----------+
| catalog_url               |       16 |
| catalog_product_attribute |        2 |
| catalog_category_product  |        2 |
| catalogsearch_fulltext    |        2 |
| catalog_product_price     |        1 |
| catalog_product_flat      |        0 |
| catalog_category_flat     |        0 |
| cataloginventory_stock    |        0 |
| tag_summary               |        0 |
+---------------------------+----------+
9 rows in set (0.00 sec)

This was just with uses the stock data and took 16 seconds for the catalog_url index. Most real-world shops will take several minutes; half an hour is pretty normal.  I’ve seen it take hours, where there are tens of thousands of SKUs.

Tell-tale signs

When indexing is causing a problem, you’re most likely to see Magento errors like this in your /var/report/ directory:

SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction

Anyone on StackOverflow might suggest increasing the innodb_lock_wait_timeout, but I don’t think that helps. Here’s why:
The default timeout 50 seconds, but a lot of browsers, reverse proxies etc, tend to time out after 30 seconds. You’re likely to see a 503/unavailable/bad gateway/ or something similar. Regardless, your customers are very unlikely to wait 50 seconds or longer for a page load. Increasing this value might make the SQL errors go away but it doesn’t address the root cause.

More subtly, if you’re using Nginx, especially behind a reverse proxy, you might see 499 errors in your access log. Although not an official RFC-defined error code, it means Nginx gave up because the connection was terminated at the client end. I wanted to mention that because sometimes this is the only error you’ll see, even if Magento/PHP/MySQL aren’t throwing errors lower down the stack.

Timing is key

Two ways to see exactly when it’s happening.

mysql> SELECT * FROM index_process;

The timestamps in this table will show you the most recent runs, and also remind you which ones are set to update on save (mode=”real_time”).

  • # grep [Rr]eindex access_log

    This will show up exactly when the Admin Dashboard was used to do indexing.

    # grep [Rr]eindex access_log
    127.0.0.1 - - [19/Mar/2015:13:43:34 +0000] "GET /index.php/admin/process/reindexProcess/process/1/key/3587af3c674a88e5304db11774e36326/ HTTP/1.1" 302 - "https://www.domain.com/index.php/admin/process/list/key/1129d37eabefa571c1956a19f45f632b/" "User Agent String"
    127.0.0.1 - - [19/Mar/2015:13:43:34 +0000] "GET /index.php/admin/process/reindexProcess/process/1/key/3587af3c674a88e5304db11774e36326/ HTTP/1.1" 302 - "https://www.domain.com/index.php/admin/process/list/key/1129d37eabefa571c1956a19f45f632b/" "User Agent String"
    127.0.0.1 - - [19/Mar/2015:13:44:04 +0000] "POST /index.php/admin/process/massReindex/key/0d75d56b5e90243c0175922deecb4e43/ HTTP/1.1" 302 - "https://www.domain.com/index.php/admin/process/list/key/1129d37eabefa571c1956a19f45f632b/" "User Agent String"
    
  • “massReindex” shows up when using the bulk reindex option
  • “reindexProcess/process/N” is a single index refresh, where N corresponds to the ID in the index_process table.

So, take the timestamp in the log here, let’s say 13:44:04. I know from earlier that it takes about 23 seconds to get through all my indexes, and given that log entries are generally made when a request is finished, I can count backwards and to work out we could’ve had possible website disruption between 13:43:41 and 13:44:04.  Not much of an issue for my test site, but in the real world, we’re usually looking at several minutes, usually enough for your customers to get bored and shop somewhere else.

I’ve found this to be really useful when supporting customers wondering why their website broke at X time on X day.

The solutions?

… another post for another day. UPDATE: here’s that post.