How to achieve zero-downtime deployments with Magento Commerce

By James Halsall

Magento Commerce was released more than two and a half years ago and yet the Magento community still does not have a solution for blue-green deployment – the automated deployment technique. In this article I explore a solution for achieving zero-downtime deployments with Magento Commerce.

The benefits of blue-green deployment

Martin Fowler, a trusted authority on the design of enterprise software, has a great overview here of the benefits of blue-green deployments. In essence, you have two production environments as identical as possible. At any one time, one of them is live, and, as you prepare a new release of your software, you do your final stage of testing in the other environment.

The fundamental idea is to have two easily switchable environments to switch between.

Martin Fowler

Here at Inviqa, we use blue-green deployments as a mechanism for deploying releases across multiple nodes (or servers) in which we stagger the removal of old containers and the deployment of new containers, with no impact on customer experience. 

We can better understand the process if we consider the following example for a simple two-node setup with a centralised database:

  1. Build new docker image on build server
  2. Deploy new container to 1st node (also applying any required changes to the centralised database)
  3. Remove old container from 1st node
  4. Deploy new container to 2nd node
  5. Remove old container from 2nd node
  6. Deploy complete

It’s clear that, during a blue-green deployment process, it’s possible for both old and new containers to receive HTTP requests, so it’s essential that both our old and new code works with any database schema changes that are applied during the deployment process. 

We use blue-green at Inviqa to achieve zero-downtime deployments, which have been a major problem with Magento Commerce. There have been numerous rumblings in the community about achieving zero downtime, but they always fall short of the mark. 

The challenge of automated deployments with Magento

To be clear about what we mean, a zero downtime deployment model should meet the following criteria:

  • No maintenance page
  • Works with container-based deployment (but also single- or multi-server traditional deployments)
  • Has no impact on user journeys (e.g. checkout process)

There have been several attempts to resolve this problem, but they do not meet the above criteria. Some of the solutions implement a maintenance page. Some may work well but do not support container-based configurations. Others deem 20 seconds of downtime as acceptable, however when working with ecommerce we must strive for zero-downtime in the truest sense of the term.

The greatest difficulty with Magento’s deployment process is the mechanism in which it applies database changes during the deploy: the setup:upgrade command. This is executed as part of any deployment and will apply data and schemes changes to the target database for modules where the application code’s module version is behind the database’s counterpart version. 

When a request is received by Magento it will check the module versions on disk against those in the database and return an error response if they are out of sync. The premise of this process is ok, however in practice it is at complete odds with the zero-downtime philosophy. Figure 1 shows this in action.

" "

Magento checks module versions at request time in good faith; it’s trying to be helpful. Of course if we forgot to execute the setup:upgrade command during a deployment we would want to know about it. However, there really must be a better way of handling this scenario. Not only does this error display when the DB versions are behind the modules on disk, it also exhibits the same behaviour when they contain a more recent version. 

This is completely undesirable because nearly all database changes can be made in a backwards-compatible manner, meaning that both old and new code will work with the new database changes. There is a better way of rolling out database changes that we will talk about for the rest of this article.

Achieving zero downtime with Magento Commerce

If we can’t use Magento’s setup:upgrade (and thus its module versioning system) to trigger database changes then we must look to something else. This is where mature third-party libraries can help us. As always there is a robust solution already available for this problem in the community outside Magento. Phinx, for example, is a database migrations package that provides:

  • Schema changes
  • Data changes
  • Database versioning
  • Painless rollbacks

Sound too good to be true? Well we are actually using a very simple premise. With a thin wrapper, courtesy of the MX_PhinxMigrations Inviqa module, we can trigger database updates without using Magento’s built-in mechanism of setup scripts. 

The beauty of this solution is that Magento is completely unaware of the changes happening in the database and it continues to serve traffic, providing we don’t go and do something silly like deleting a table referenced in our code.

There are examples and instructions in the module itself, but after installing it in your Magento instance the new workflow becomes something like this:

  1. Run bin/phinx create to add a new database migration file
  2. Phinx will ask which of your Magento modules to create the migration file in (it checks for any etc/migrations folders in your modules, so create one of these folders first)
  3. After the migration file has been generated you can write your database changes as required and even provide the changes needed to rollback
  4. Run bin/phinx migrate during your deployment process immediately after bin/magento setup:upgrade to apply database changes (also don’t forget to run bin/phinx migrate as part of your local environment provisioning)
  5. In the event of needing to rollback you can run bin/phinx rollback, this will revert the last migration that was applied

So, how does this help us achieve blue-green and zero-time deploys exactly? 

Well, if we never change module versions then at any one time we can have containers serving old code and containers serving new code, each of them being 100% happy because the module versions match between database and code.

It is worth stressing the importance of continuing to execute setup:upgrade as part of your workflow (in the event that 3rd parties or Magento modules get updated) and also to never change your bespoke module versions in the module.xml files as this will create the module version mismatch problem that we discussed earlier in the article, therefore introducing potential downtime.

Caches

Another problematic aspect of zero-downtime and blue-green deployments is the caching of data in Magento. Magento offers different types of caching natively, but the most common in a production set-up is an in-memory cache such as Redis. 

In container-based environments we run Redis and Magento in separate containers, meaning that during deployment we only need to replace our Magento containers. This presents a problem for blue-green deploys because, if we are serving HTTP traffic from both old and new containers during the deployment process, we could potentially end up with invalid cache data, depending on which container handles the request.

To circumvent this problem we use different Redis databases for our new and old containers. With some simple scripting in our deploy process we alternate between the two available Redis databases as we make deployments. 

We can better understand this process by breaking it down into steps as below:

  1. Build new Magento image on build server
  2. Determine the Redis database number that is currently being used by the live containers
  3. Create new containers from the newly-built image, specifying the alternate Redis database number
  4. Deploy the new containers using blue-green cycle (add 1 new container, remove 1 old container)
  5. Deploy complete

With the above process we are now able to handle HTTP requests on both new and old containers alike because they use different Redis databases for their cache. The only caveat here is ensuring that the Redis instance is capable of storing two sets of Magento cache data at any one moment in time.

Current limitations and drawbacks

No approach is without its limitations and there are a few carefully considered and accepted limitations with this approach. These are listed as follows:

  1. Upgrading third-party and core Magento modules (i.e. when updating Magento itself) will cause some downtime due to the module versions changing on disk
    1. To get the most out of this approach to zero-downtime deployments it is better to avoid third-party modules wherever possible and practical
  2. A small amount of scripting is required to ensure the use of two Redis databases for caching old and new data during the deployment process
  3. A larger Redis instance is required to be able to cache potentially twice the amount of data

Conclusion

Whilst Magento’s database migration mechanism presents a challenging landscape for zero-downtime and blue-green deployments there is, as always, plenty of scope to get creative. 

The MX_PhinxMigrations module is still in its infancy and, at some point in the future, we would like to experiment with tighter integration with Magento to offer features including:

  • Automatically applying Phinx migrations after setup:upgrade is executed
  • Integrating Phinx commands (migrate, rollback etc.) into the bin/magento tool

Eventually I would love to see an officially endorsed solution for zero-downtime deployments from Magento, but in the meantime the MX_PhinxMigrations offers a much easier way of handling database schema changes within our deployments.