Ben Saunders

11 November 2019

What Do the FCA Guidelines Mean for Your Multi-Cloud Strategy?

According to the Financial Conduct Authority (FCA) guidelines on outsourcing IT, firms must be able to “know how [they] would transition to an alternative service provider and maintain business continuity”.

For organisations that realise that the future of digital services belongs to the cloud, but want to remain in line with key financial guidelines, this could mean only one thing: multi-cloud.

But how should highly-regulated organisations build out their multi-cloud strategy in light of these guidelines?

In this blog, I'll consider the key requirements of the FCA guidelines on cloud adoption in financial services, common assumptions I see in how people react to the ambiguity of these guidelines and then divulge Contino’s top tips on how a multi-cloud strategy could be a sound consideration for highly-regulated financial service organisations. I’ll share some insights covering:

What assessments do you need?
What risks do you need to consider?
What information do you need to know about your workloads?
What skills and resources do you need to have in place?

Let’s firstly consider some of the guidelines and their position as regulatory requirements.

What Do the Regulations Mean for Your Multi-Cloud?

The guidance given by the FCA is trying to do one thing: reduce risk.

This can be approached from four main angles: operational, concentration, data and exit risk.

We’ll look at each in turn, summarising the demand and highlighting what action you may need to take as a Financial Institution.

Reducing Operational Risk

The operational perspective is all about securing your day-to-day operations. Namely; any outsourcing agreement must “not worsen the firms [sic] operational risk”.

Key requirements:

Documented and tested risk assessment (also for supply chain)
Skills and resources to mitigate risk
Documented business case justifying risks

The central pillar of your operational risk strategy must be a solid risk assessment.

This must identify all the critical or important functions that the financial institution provides (e.g. current accounts, payments, loans, credit cards, savings accounts) and the risks associated with these services (e.g. technical, financial, political etc.).

Your risk assessment must be documented and reviewed on a regular basis (e.g. once a quarter, year etc.). All the risks that are identified must be assigned to someone to be accepted, managed or mitigated with a clear action plan, with a Material Risk Taker (MRT) wholly accountable for the risks identified as part of the overarching cloud strategy.

This includes “ensur[ing] staff have sufficient skills and resources to oversee and test the outsourced activities”. The aim of this is to have “sufficient in-house ability to supervise their outsourcing arrangements, and to take control of the relevant functions if things go wrong”.

This is massive, as it means you cannot rely on your provider to police themselves. You must have the necessary skills and arrangements on hand to maintain your service, regardless of what happens with the provider.

This increases the importance of an investment strategy to upskill your own engineering communities so that they have the right skills to leverage cloud-native technologies. This is also relevant to the ‘exit risk’, which we explore later: if your engineers have the right skills to get workloads into a cloud service provider, they will also have the right skills to get you out of cloud service provider, should it be required.

That said, your risk assessment cannot be limited only to your public cloud provider(s). You must also “identify all the service providers in the supply chain and ensure that the requirements on the firm can be complied with throughout the supply chain.”

If any firm in your supply chain is providing you a service that is “material outsourcing risk” (i.e. a ‘critical or important function’) then you might have more critical apps in the cloud than you think! Beware: if you are consuming SaaS products they will invariably be in the cloud. Make sure you do analysis on any SaaS products in advance to make sure the Service Level Agreements (SLAS) and architecture of their app meets your regulatory requirements for availability and is capable of meeting your required Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).

Taking these risks must also be justifiable from a business perspective: for “critical or important operational functions or material outsourcing” you must have a “clear and documented business case … to use one or more service providers”. Writing a business case for the cloud is an art and a science, which Contino has explored in more detail in our white paper How to Make a Winning Business Case for DevOps and the Cloud!

What Should I Do?

The key takeaway here is that many financial organisations, upon first adopting the cloud, struggle to fully understand how their core products, business service lines and customer journeys hang together architecturally.

So the starting point is always to understand the as-is state is and what your provisional to-be architecture could look like.

At Contino, we often say “you can’t improve what you can’t measure”; in the case of scaled cloud adoption it’s rather that “you can’t move what you can’t see”. Understanding how your technology architectures and business processes hang together from a customer experience, business logic and technological perspective is vitally important to make risk-based decisions that will underpin your cloud adoption and future workload placement over time.

In other words, can the customer do what they want to do? Does the business get the revenue it needs? Does your technology enable both these while meeting security/compliance requirements in a sustainable fashion?

As a starter for ten, choose one business service line across each of your core product sets. Identify the components where value could be derived through the adoption of public cloud and establish a repeatable framework that can be used by other sections of the organisation. Something that is akin to the steps below usually works in most cases:

Categorise business service lines and align to product sets
Analyse and document customer journey flows within each business service line
Identify dependent services and technology components that underpin the customer journey
Review the technology architectures of each component and ensure that they are capable of meeting business and regulatory availability targets.
Review support agreements and existing SLAs, Service Level Objectives (SLOs) and Service Level Indicators to ensure that these match the organisations risk appetite
Cross reference discovery and analysis against the Bank’s existing risk documentation and records.
Identify deltas between target Service Level Agreements (SLA) for member journey and Service Level Objectives (SLOs) for underpinning services.

(Unfamiliar with SLAs/SLOs? Check out our white paper on how to execute site reliability engineering in the enterprise!)

Creating Antifragile Systems: Site Reliability Engineering for the Enterprise

Enterprises need to be thinking less like a business-business and more like a tech-business.

Enter Site Reliability Engineering or SRE.

SRE is a data-driven approach to IT that ensures that even wildly complex distributed IT systems are healthy. It can help you to turn fragile enterprise systems into antifragile ones that get better the more they are shocked! Download this white paper to discover everything you need to know about creating antifragile systems using SRE!

GET THE WHITE PAPER

Mitigating Concentration Risk

Concentration risk is the key issue that necessitates multi-cloud.

Concentration risk is defined as “the reliance that firms themselves may have on any single provider.” It’s about making sure that you don’t put yourself in a situation where you have all your mission-critical eggs in one basket.

The central point is this: “monitor concentration risk and consider what action it would take if the outsource provider failed”.

So what do you need to do to mitigate concentration risk in the eyes of the FCA?

Key requirements:

Know the criticality of your workloads in the cloud
Know where these workloads are
Have a tested plan for how you can transfer these to a different provider in the event of provider failure

Regarding workloads, note that different requirements apply to different functions. Most important here is whether the function being outsourced is “critical or important”. A critical or important function is one whose failure would “materially impair the continuing compliance of a firm”. Undertake a discovery assessment so you know what workloads you have where and what level of material importance they carry. Within the cloud it is important to use tagging standards and well-architected principles to understand your systems and dependencies.

When it comes to creating a tested plan for moving to a different provider we suggest taking an experimental, learning-based approach to hone in on a tried-and-tested strategy that the FCA can’t but approve!

Identify a small, low-risk workload in your existing cloud that would make a good candidate for an experimental migration to a new cloud
Execute the experimental low-risk migration
Whether you fail or succeed: learn from what went well and what didn’t go so well
Apply the lessons learned to the next experiment
Continue experimenting, scaling the migration more widely each time
Write up the results of your experiments into a documented strategy along with evidence of the experiments
Consult with the FCA to see if they approve of your battle-tested strategy!

In this way, you slowly prove out a strategy based on real-world conditions that can be scaled across your workloads. This is what is most likely to gain FCA approval.

Being transparent is a crucial part of an effective engineering culture and here it applies as much externally as internally. That is to say, that you should aim to be transparent with the FCA, for your sake as well as theirs. Update them frequently and ensure a tight feedback loop between them and your cloud teams. This will save you working on a strategy for many months, which the FCA reject out of hand for reasons that could have been easily addressed with a little openness months previously.

Note that it is unrealistic to hold total, instant portability of your workloads between clouds as an end goal.

With strong configuration management and modern engineering practices (e.g. containerisation, infrastructure-as-code) you might succeed in getting portability of your application tier, possibly your data tier. But the infrastructure and networks underneath these are constantly changing and differ from provider to provider (e.g. GCP networking cannot be lifted and shifted into Azure...without adopting newly released services!).

Reduce Data and Security Risk

How you approach data and security are critical when it comes to reducing risk.

Key requirements:

Conduct a wide-reaching security/data risk assessment
Know data sensitivity
Know how to remove data

Firms “should carry out a security risk assessment that includes the service provider and the technology assets administered by the firm … [c]onsider data sensitivity and how the data are transmitted, stored and encrypted, where necessary”.

The FCA also stipulates that you must “know how [you] would remove data from the service provider’s systems on exit”. But, while service providers are great at helping you get your data into their cloud, they are not so forthcoming when it comes to getting your data out.

What Should I Do?

Regarding security readiness for public cloud, we suggest that taking your existing ‘on-premise’ security and compliance controls and enforcing them in a cloud environment is often the wrong position to take.

As part of your cloud adoption strategy you should consider which of your existing security controls should be adopted, which should be adapted and which should be retired.

Using frameworks such as the Cloud Security Alliance (CSA), Centre for Internet Security (CIS) and National Institute for Standards Technology (NIST) and embedding these using practices such as compliance-as-code will provide your oganisation with a consistent security pattern that can be applied across each of the major cloud providers, in turn establishing a hetrogeneous way of handling security in the cloud.

In this instance, leveraging third party tooling (e.g. Dome9, Evident.io) to administer and manage your cloud environments security posture, in addition to embracing higher-level services (e.g. Security Hub, Security Centre, Cloud Security Command Centre) will enable your organisation to demonstrate better operational resilience back to the regulator through real-time telemetry and hardened audit capabilities.

Regarding data, it’s important to build a view of data tiering and sensitivity of data you’re prepared to push into cloud. This assessment must be wide reaching and include a data residency policy, a data loss strategy, and a data segregation strategy.

It’s critical to have strong config management practices around schemas and tables as well as stringent data management. This entails knowing what data you have, where it comes from, what it is, who has access, what it’s for and how frequently it’s used. You must understand the criticality of the data (i.e. does it support mission-critical services?). Knowing what data is critical and what is relatively unimportant means you can create a triaged data migration strategy.

Note that if you use higher-level services (e.g. AWS RDS/Google Big Query), they are sufficiently different between providers and that you are effectively “locked” into that service. To lower risk, there are third party managed database solutions (e.g. Mongo Atlas DB, CockroachDB, Confluent) that exist specifically to provide data portability between such services, which you can deploy across your different providers as part of a sound data replication strategy. This addresses business continuity, disaster recovery and lock-in issues!

And remember: “lock-in” and “concentration risk” have existed in financial services for decades! ...Unisys mainframe anyone?!

Lastly, ensure there is a contract in place with your provider that states that they commit to helping you to get your data out when the time comes!

Reduce Exit Risk

What if you need to leave a cloud? You need to be prepared.

Key requirements:

Have documented and tested exit strategy
Maintain service level required by regulation

The guidelines state that “[f]irms need to ensure that they are able to exit outsourcing plans, should they wish to, without undue disruption to their provision of services”. Regulations make it clear that you need a documented and tested exit strategy that will, crucially, enable you to meet the regulated level of service for a given workload.

Once you get one or two workloads in a given provider you need to understand what your exit strategy would be for each application. These must be in-depth: “have exit plans and termination arrangements that are understood, documented and fully tested”. Over time, we’d recommend having an exit strategy based around your business service lines or customer journeys.

What Should I Do?

Say, for example, that you had a critical payments system that regulations mandated be 99.99999% available, with a recovery point objective of zero. Your exit strategy would have to ensure that you can still meet this level of service, while you exit your cloud provider.

Achieving this goes back to having really good configuration management practices and architectural principles. You don’t want to be dealing with a monolithic app here! Make sure all applications are as modular as possible, which will support incremental migration patterns to maintain system uptime.

As you move to cloud, keep a config management database of what those apps are, where they are, what they do and where they are hosted. Strong and clear tagging strategies can help in this regard. Particularly for PaaS- and IaaS-based services.

You also need to have a plan for a variety of timing scenarios, i.e. what do you do if you need to exit in two years? Two months? What about two days?!

Critical here is that when you are in negotiations with a cloud service provider that you have a contractual agreement in place that guarantees that they will help you to exit with minimal disruption and provide you with the required support to do so. As part of this you need to understand what triggers might force a move from a cloud provider and have a plan for these, be it political events, sanctions, embargoes, prices increases, terrorism etc.

Common Mistakes People Make When Going Multi-Cloud

The issue that our clients have is that these FCA guidelines are ambiguous and open to interpretation. Exactly what is an “appropriate” security exposure?

In response to these ambiguous regulations we often see customers over-engineer a solution. Especially in the wake of the 2008 financial crisis. In particular, I meet a lot of people who think they need to:

Meet the above guidelines all at once
Have ‘real time’ portability i.e. be able to move instantly
Achieve total cloud neutrality i.e. be able to deploy all workloads across all clouds

While I have nothing against taking bold steps to reduce risk, when people attempt the above they invariably fail to deliver a functional multi-cloud setup. Here’s why:

You Don’t Need to Move All at Once

I often see organisations trying to create a ‘cloud broker’, an overarching platform that manages and gives access to (say) AWS, Azure and GCP all at once.

Avoid the cloud broker scenario! Attempting to build two clouds in parallel is very complex. Customers mistakenly think this would lead to ‘true portability’. For reasons that I outline in the section below, this is not the case.

By trying to move to fast, they end up biting off more than they can chew and end up with a platform that can only operate at the lowest-common-denominator across their clouds.

You will not be served by trying to match the guidelines perfectly straight away!

You Can’t Achieve Real-Time Portability...Without a Sound Data Replication Strategy

Moving apps between clouds is the easy part!

The real problem you will have is the lowest common denominator: your data.

If you have significant data in the public cloud, moving it to another cloud is a challenge. Particularly, if there is a large volume of it.

The sheer magnitude and complexity of your data make migrations into a single provider a challenge, never mind between providers. The data schema for one cloud is unlikely to simply convert to another cloud and will require significant work.

This is a massive hindrance, especially if you want to make use of higher level services (RDS/Big Query, for example), i.e. if you want to get the value out of your data!

This is where a risk based decision and acknowledgement of your choice to embrace higher level services needs to be documented, with your business case attached as evidence back to the regulator.

Aiming for Total Cloud Neutrality Creates More Risk

One lesson I have learned speaking with many customers is that they focus too much on achieving total cloud neutrality, i.e. having any workload run in any cloud.

They will try to load balance an app across AWS, Azure and GCP. But this adds more risk, due to the increased complexity and cost. It also means that your engineers need to understand three clouds, which is setting an impossibly high bar for your workforce.

So how can you approach multi-cloud in a way that works but that still reduces risk over the long-term?

How to Approach Multi-Cloud

The problem with the cloud broker approach is that it approaches each public cloud provider as if they are the same: they are not!

By standardising your approach you end up with the lowest common denominator. Take, for example, your control framework (i.e. change/release/incident management). While your controls can remain the same how they are applied must be unique to each cloud.

You end up with multiple tiers (app, data, networking etc.) all interlinked across multiple cloud providers - which brings more risk and actually requires more engineering skills (introducing cost complexities).

By neglecting the differences between your clouds you paradoxically increase the risk that multi-cloud sought to lessen!

So rather than trying to migrate to three clouds simultaneously, we suggest doing one cloud really well, then scaling to another cloud, using the lessons from the first migration to improve your roadmap.

The end goal is for the second cloud to be as good as the first, but not immediately. This way you are moving towards cloud neutrality in the long-term, but in an incremental fashion that reduces risk and minimises cost.

This must then be underpinned by an effective cloud-native operating model, which covers not only technology but your processes and people.

Upgrading your technology is worthless if you don’t bring your people with you! Invest heavily in educating your engineers in cloud-native ways of working via a combination of pro certs, classroom teaching and hands-on learning in sandbox environments.

The Cloud Service Providers Will Solve the Portability Conundrum…In Time

I highly suspect it will only be a matter of time until each of the major providers address the workload portability conundrum for their customers.

There have been recent promises from both Google and Microsoft to enable customers to orchestrate and manage compute resources “anywhere”, and to extend the use of their bespoke services and API’s so that they can run in each major cloud provider: i.e. you can run Azure services on Google compute power and vice-versa. To this end, these providers have recently released both Anthos and Arc, respectively (Maybe AWS will announce something at Re:Invent in December 2019 on this front!).

There is also the fact that each of the cloud providers have also released extensions of their APIs into customer data centres through the likes of Outposts (AWS), Azure Stack (Microsoft), Anthos/GKE (Google). This could eliminate the need for complex brokerage solutions and avoid significant cost expenditure over time. The rate of innovation from CSP’s is staggering and I truly believe it’s only a matter of time until they solve this conundrum as a consortium of sorts.

In the same way that banking customers can switch their current accounts in under seven days, I wouldn’t be surprised to see some form of legislation levied on to financial institutions and cloud providers to deliver similar agility and freedom of movement for business critical applications (e.g. payment systems). This would further force the hand of the providers to do what they can to eliminate the issue of portability.

A Roadmap for Cloud Adoption and Migration

Contino has a battle-tested framework for approach cloud adoption and migration: Momentum.

Derived from Contino’s experiences across hundreds of enterprise cloud migrations, Momentum is a data-driven roadmap for transforming enterprises to a cloud-native operating model.

We designed it to facilitate an organic, delivery-first cloud migration that proceeds wave by wave, gaining momentum each time.

It’s based on a few key principles that avoid the pitfalls so common to enterprise migrations:

Based on organic waves, not a big-bang approach
Data-driven, rather than opinion-driven
Focused on quickly delivering tangible results, rather than powerpoints
Works with the business, rather than in a tech silo
Product mindset, rather than project mindset.

This strategic, step-by-step approach is perfect for moving towards an effective multi-cloud setup without falling into any of the various traps that are lurking!

Our white paper The Definitive Guide to Cloud Migration in the Enterprise: Here Be Dragons! goes into lots of detail about how it works.

Final Remarks

I hope that helped to clarify what is and is not required as a result of FCA guidelines, as well as to give an idea of some tried-and-tested approaches to fulfilling them. Hopefully, I have also been able to share some valuable lessons from prior engagements, with other financial institutions?

There are certainly other areas to consider and we will be publishing more “Thought Leadership” and events over the next few months to help support our customers with these challenges.

If you have any questions please reach out to me at ben.saunders@contino.io.

Let me know how you get on!