AWS, Cloud

If your organization is considering or preparing a move to Amazon Web Services (AWS), chances are you might be feeling a little overwhelmed. Regardless of whether it will be your first stride toward public cloud or part of a multi-cloud strategy, the prospect may seem daunting with so many aspects to address.

The good news is, millions of organizations have been through the same process and you will benefit from their collective experience by starting your journey with a mature ecosystem and a wealth of information. Having myself worked for a year on migrating two datacenters to AWS, I feel I can contribute by sharing the lessons I learned along the way and the tips I wish my former self would have known.

While this is written in the perspective of an AWS migration, the main takeaways apply to migrations other cloud providers, namely Microsoft Azure and Google Cloud.

1. Upskill Your Teams

The Cloud and DevOps era made it easy to think better tooling is the answer to all the problems found in traditional IT shops. But no matter how sophisticated your infrastructure is, your services are designed, implemented and managed by people. At its essence, DevOps is first and foremost a cultural movement. You can achieve some tactical improvements by implementing better tooling (CI/CD, Infrastructure as Code, Configuration Management), but your organization will still be held up by the bottlenecks coming with a classic structure.

Adopting a culture of collaboration and trust between your teams can be the single most crucial factor to support your digital transformation initiative and long-term success. To be honest, changing an organization’s culture and the way people interact is hard, much harder than deploying a new shiny solution. Change can be uncomfortable, people might be concerned of loging their status in a top-down structure. But one of the chief issues is conflicting priorities between different groups.

Developers have an incentive to deliver features fast and break things (hello Facebook) while operation folks would rather avoid any change that might result in an unwanted 3 am call. This disconnect effectively creates silos between teams and reinforces the culture of “us against them” – change requests taking forever, works-on-my-machine and throwing the potato on the other side of the wall.

To tackle this problem, the DevOps movement advocates sharing the same goals, incentives and failures between development and infrastructure teams. The Site Reliability Engineering (SRE) methodology goes a little further by utilizing Service Level Objectives (SLO) and error budgets to objectively decide how to spend engineering cycles between features development and platform reliability work. No matter which approach you favor, you will need people at every level of the organization to buy in the vision.

It might also be tempting to hire a batch of DevOps engineers and call your organization  Agile. This approach is almost certainly doomed to fail as it boldly assumes new employees with zero knowledge of your organization culture, processes and groups dynamic will have the clout to change the way things have been done for years. It also misses the premise that embracing DevOps is a journey of continuous improvement and you cannot take a shortcut to avoid the long and difficult process of figuring it out.

You can certainly hire external resources to expand the talent pool, benefit from a fresh perspective and accelerate the initiative, but your journey has to start from within. And the good news is, your existing teams are incredibly well positioned to execute the cloud initiative with their knowledge and expertise of your environment. Turning them into cloud-aware teams will be achieved by actively involving them in the cultural transformation and enabling their upskilling.

There are several avenues for an organization to facilitate the development of its team cloud skill set. Classroom training is a time-tested approach but the rigid format might not be for everyone and the cost is substantial. Online training is inexpensive and makes it possible for people to learn at their own rhythm. I highly recommend acloud.guru which is probably one of the best resource available on cloud computing and their courses focus on hands-on labs. This is particularly helpful for candidates to prepare AWS certifications.

The best way to learn is also by practicing. Fortunately, the AWS on-demand model is low-cost and makes it easy to deploy sandbox environments. This practice can be encouraged at the organizational level by setting up a lab AWS account dedicated to learning and experimentations.

Finally, creating a Cloud Center of Excellence (CCoE) can also go a long way to provide a forum for interested people to exchange, set the standards and disseminate knowledge across the organization. One way is hosting Lunch and Learn sessions to share knowledge in a casual atmosphere. Or building a proof of concept to showcase the capabilities of the platform to your product teams.

2. Choose What to Migrate

Depending on the size of your organization, you might be looking at migrating dozens or hundreds of applications. Migrating the entire portfolio could easily set the length of the project to months or even years from the start. But the truth is, you do not have to migrate everything to the cloud, or at least not at first.

In fact, there are some systems that might not gain that much from the change or induce more trouble than benefits. Think of legacy systems that will be decommissioned soon or could be advantagely be replaced by a Commercial Of The Shelf (COTS) or a SaaS offering. There is also the question of migrating the application as-is (lift and shift) or using the opportunity to modify it to take advantage of modern architecture patterns and cloud-native features.

This is why performing an assessment of the portfolio is useful. For each application, identify the business and technical context, involved parties, current state and potential rewards and risks of moving it. This information will help you to formulate its cloud readiness status and identify the ones benefiting most from the cloud.

Knowing which applications to migrate is half the battle. You also want to decide which ones are worth investing in engineering cycles. In general you would prefer to modify an application that is either business critical and could benefit from reliability or performance or benefit from a nice boost for a minimal effort. The 6R strategy framework can help by providing a clear understanding of your options. You want to assign each application one of the six strategies depending on its context and return on investment.

  • Rehost (aka lift and shift) typically involves creating a copy of the source servers to be replicated and deployed to target instances in the cloud.
  • Replatform involves rebuilding the application in the cloud and modifying the components to utilize new technologies or cloud services.
  • Rearchitect is about fundamentally re-engineering the architecture or codebase to unlock the same benefits found in cloud-native applications.
  • Repurchase involves the replacement of the application by a third party by migrating its data.
  • Retain is the choice to keep the application as it is for the time being until further review.
  • Retire is the application will not be needed anymore or its duties will be moved to other systems.

You want to start with simple applications that will introduce a limited impact and risk to the business. That probably goes without saying, but use the migrations of the development, QA and UAT environments to practice, iron out the details and uncover potential problems before attempting cutting over production.

Having a plan of what and how you should migrate will be valuable to allocate your resources where they will bring the most value. If possible start the discovery and planning early to leave room to change the course and adapt with new information if necessary.

3. Build Solid Foundations

There is a difference between having a few employees playing with a sandbox environment and building and running business infrastructure. As your teams are reaching a level of comfort and maturity with the technology, the organization as a whole can start preparing the transition.

Before you can deploy even a single application, you have to build the foundations of your cloud infrastructure. It is critical to pay special attention to the planning and design before anything is deployed. Some design decisions will be difficult to change and costly once you will be running workloads built upon them. You will have to make trade-offs but try to always plan with future growth and extensivity in mind.

It will be worth it to deploy as much as possible, if not your entire platform, by using an Infrastructure as Code (IaC) approach. It will allow you to easily redeploy failed resources, track the changes and collaborate in source version control and deploy infrastructure at lightspeed.

By being vendor agnostic, Terraform might be a better option over CloudFormation to keep the IaC templates developed in one language for any environment (Azure, GCP and even on-prem data centers). While I have not tried it yet, you can also also look at Landing Zone released by AWS to assist clients to get set up faster with pre-built templates.

Design and create an account structure. A popular pattern is to have separate AWS accounts for billing, log consolidation, shared infrastructure (CI/CD, Configuration Management, Monitoring, Backup) and one for each application environment (development, QA, production).

You will have to design and configure the networking connecting your cloud (VPC in Amazon) and on-premises environments (VPN or Direct Connect links to your datacenters). Favor a broad IP addressing scheme (rather /16 than /24) leaving a generous amount of space for future uses no overlap with on-premises networks.

Federating your Active Directory with SAML and role-based access will simplify the management of your user permissions in AWS. Whenever possible, always use IAM roles to grant permissions to EC2 instance over user access and secret keys.

Build the tooling that will be used to provision and automate the management of your platform. A Continuous Integration / Continuous Delivery (CI/CD) pipeline will help you automate the build, test and deployment process of your new releases as soon as the code is committed in the repositories. A Configuration Management (Chef, Puppet) might be nice if you still want to provision Virtual Machines and Operating Systems.

If these systems will be new in your organization, try to deploy and run them in Docker containers to benefit from the get-go from great scalability and cost-effective compute. AWS managed Kubernetes service (EKS) is fairly new but would be a great way to get familiar with the the orchestrator which is likely to become the industry standard.

4. Discover your Dependencies

One of the biggest challenges of migrating on-premises applications is to identify exactly the configuration you will have to set to make it work in its new environment. More often than not, the people who built the systems have long left the organization and the documentation, when available, is rarely detailed or up to date.

You can attempt to gather enough information to prepare the migration, but you will likely miss undocumented or arcane details that will make the cutover painful. (Like spending hours on a Saturday afternoon troubleshooting a migration started at 6 am because an undocumented TCP connection needed to be open.)

Your biggest ally in this battle will be a solution collecting data on the on-premises systems to analyze their current state. AWS released last year Application Discovery Service (ADS) and I can’t emphasize enough how it was a life-saver to identify the networking dependencies.

It works by installing an agent on the operating system to collect and send to the service discovery data about running processes, performance and inbound/outbound network connections. You can then export the data from the service using CSV format, but be warned. Analyzing hundreds of flat files with millions of rows can be tricky to make the sense of the data.

I kept my sanity by automating the entire process with a PowerShell service automating the export, sanitization and insertion of the data in a SQL Server database used for aggregation and analysis. By using a combination of SQL queries, Excel formulas and PivotTable, I was able to generate in less than two hours reports that used to take me three or four days. In the next iterations, the automation was even capable of generating the Terraform code to deploy the Security Groups (Firewall in AWS) opening the ports of the application.

This automation was tailored to our specific needs at the time, but since ADS launch, AWS released a Python repository to automate the export part of the process. If there is interest for it, I might look at replatforming the PowerShell automation into a Lambda service that could easily be reused.

5. Start Small and Iterate

You might be tempted to use the shift toward AWS as a big bang project to completely redefine the way you run your infrastructure. While understandable, trying to do too much at once may decrease the focus and the quality of the migration. It might be better to try getting a few easy wins to gather experience and build confidence in the process.

Adopting an Agile approach will help you clarify your goals and deliver steady improvements over time. By ordering the highest priorities in the backlog, you will be sure the organization is always benefiting from the best ROI of the effort invested in the infrastructure. If you are using Scrum, make sure all the parties involved are trained and clearly understand the format value of the ceremonies (yes, even the retrospective to continuously improve). Keep your sprints short (ideally two weeks) to commit to frequent increments that will be usable right away and could be expanded in the next iterations.

Following the same logic, you might want to consider using lift and shift for most of the applications before attempting replatform or rearchitect strategies. This approach has the benefit of letting you quickly exit your on-premises environment while unlocking easily reachable benefits. By moving all our applications in less than two years, I was able to help the company shutting down two datacenters before renewal of the contracts and divide our operating expenditures by half. Having the applications already rehosted in AWS will also make it much easier to replatform or rearchitect them. This is because all the heavy lifting (data replication, on-demand source servers, networking integrated into VPC and Security Groups) is already performed so you can focus on the high-impact work.

Conclusion

By keeping in mind these practices, you will set yourself and your organization up to avoid some of the common pitfalls and the frustration of learning them the hard way. This post is obviously not exhaustive and reflects my own experience. If you are interested to learn more, you can to dig deeper by learning more about about the 6R strategies, Cloud Adoption Framework (CAF) and Well Architected Framework (WAF).

I want to thank Michael Pelliccia and Laurent Mandorla at AWS for their help reviewing and providing valuable feedback while writing this piece. If any of part of my feedback was helpful to you or if you have anything you would like to share (or even disagree on some points), please reach out to me. I would love to hear your thoughts. And until then, happy migrations!

x

SIGN UP TO OUR UPDATES

DevOps Insights Directly to Your Inbox!

Join thousands of your peers and subscribe to our best content, news, services and events.

  • Sammy Barras

    Senior DevOps Consultant

    Sammy is a technical leader with 10 years of experience leading and delivering infrastructure projects leveraging DevOps, SRE and Agile methodologies. His greatest asset is the passion he has for technology and the ability to use it to drive improvements for his clients.