Using Terraform to Enforce AWS Infrastructure Best Practices
Managing infrastructure is becoming increasingly complex as the application stack evolves with new technologies at every layer. What starts out as a simple setup with a few servers, a database, and a simple network layer can become a mess if the infrastructure code repositories are not split and structured effectively when attempting to serve more difficult requirements. As application teams become more and more empowered, they’re looking to take control of more and more cloud resources (e.g. DNS, identity management, storage), but what is the most effective way of handling this using current automation technologies without adding unnecessary intricacies?
Configuration management tools like Ansible, Puppet and Chef have helped to an extent. But to simplify infrastructure management effectively, you also need to consider integrating infrastructure creation methodologies with other downstream cloud platform services. This holistic approach to infrastructure management is what Terraform delivers.
There are similar solutions to Terraform on the market. AWS, for example, offers a similar solution called CloudFormation that manages AWS infrastructure at scale. CloudFormation is a useful tool in certain AWS-focused use-cases but, from experience, Terraform continues to shine in the most complex situations as the better option due to the level of engineering rigour that can be applied to it (thanks to the community) and the flexibility that it offers.
Let’s look at how Terraform can be used to empower developers to self-serve all of the cloud resources they might need, while also remaining risk-free and auditable.
Automate All the Things: Infrastructure, Storage and Networking
Terraform is very extensible and can be integrated with any service that has an API. This enables automation at every step of the SDLC. You can automate the creation and deletion of test environments, for instance, or configure downstream services like CDN and DNS servers to work with infrastructure. Terraform works across the infrastructure, storage, and networking layers and, in this sense, it is more than an infrastructure automation tool and more like an orchestration tool. While configuration management tools like Ansible, Puppet and Chef handle the configuration of instances after creation, Terraform also handles instance creation as well as broader integration with third-party services. This enables more opportunities for automation at every step of the software lifecycle.
Empower Teams to Scale Infrastructure
Ops is often seen as the bottleneck to innovation. This is because ops teams are flooded with requests from Dev, QA, and product teams for the provisioning of new infrastructure, or changes to existing configurations. With Terraform, ops teams can deliver a self-service experience to other teams, enabling them to define the level of scale they need. It’s as simple as changing a number in a file!
These teams still won’t know how the infrastructure is being configured or what the exact configuration is. In this sense, it is a black box experience, which gives ops teams full control over the process.
Importantly, this frees product teams to scale according to usage, or lets QA teams create new test environments by duplicating existing ones. The benefits multiply when you consider that teams can also share infrastructure configurations with each other and avoid duplication of effort. This brings uniformity across the development pipeline and all without having to go through a complex approval process that burdens both ops teams and their internal customers.
On the topic of approval processes: a resilient and well governed automated pipeline that verifies all new Terraform template additions should be constructed. This ensures that all of the checks that the ops team would have manually carried out before are now automatically checked as part of the publication of new infrastructure code. Also, when shifting the responsibility of infrastructure creation to the developers you need to ensure that resources used are billed back to the teams that consumed them, so having a solid tagging strategy for every cloud resource is key (internal or external).
Avoid Last-minute Surprises
Terraform includes a plan command that lets you review configuration changes before you apply them. This is great to review any minor details that you might have overlooked, and avoid surprises post-deployment. For example, you may notice that an EC2 instance doesn’t have the right role assigned to it in IAM and it is unable to access certain data stored in an S3 bucket. These situations are rare considering the power of templates, but you can still review quickly and fix them when they do occur. The plan command is a unique feature that other similar services don’t provide. It ensures more reliable deployments.
Having an auditable build pipeline for infrastructure code execution is advised at scale. This ensures that all activity that will affect live resources in the cloud is audited, that it is being executed with verified parameters and that it doesn’t pose any risks to live services. This can be done using such methods as interrogating the exit-code of a Terraform plan as part of an automated execution pipeline. Leave Terraform execution and state management to the machines!
Handle Failover in the Cloud
Remember the AWS S3 outage last year? It resulted in an outage to many of the top websites that rely on it for storage. You can avoid that situation by having a fallback option if your AWS service fails for some reason. However, running a multi-cloud setup is tedious as each cloud vendor has its own quirks. Running multi-cloud Terraform requires an engineering investment, developing the replication of the desired state with the target cloud providers resources as each cloud provider has their own ‘provider-specific’ Terraform resources. With that in mind, Terraform will allow you to go beyond AWS and have a backup on Azure or Google Cloud, for instance, but with minor time investment. It of course takes into account each platform’s dependencies and manages them for you, leaving you to focus on just the infrastructure you need. However, understanding a cloud providers ‘resource lingo’ is still required to ensure the developer knows what to declare in configuration in order to build the desired state with Terraform.
Terraform offers a community-driven, extensible, multi-cloud and straightforward method of creating and managing cloud infrastructure (internal and external). At scale it can be tricky, but with the right practices in place, guidelines followed and automated checks in place, Terraform continues to enable development teams to flourish, allowing lead-times to be demolished and business value to be reached much quicker while remaining compliant and well-governed.