Dow Jones: Scaling DevOps with a Federated Model
Dow Jones & Company is an American publishing and financial information firm best known for the publication of the Dow Jones Industrial Average and related market statistics.
Recent significant events had created institutional silos across the technology team and, as a result, multiple technical, organizational and geographical barriers spanning various regions and time zones.
The Dow Jones CTO had started a DevOps initiative that would see a centralization of common operations tasks, led by a newly created DevOps team. However, due to internal changes, this team had limited resources to meet newly created responsibilities, and became a bottleneck due to limitations it faced with staffing, prioritization, leadership and overall funding. Adding to this dynamic was a mandate to move 75% of data center resources to the cloud (Amazon Web Services), across all companies in the group. Given an absence of expertise, there was a risk that this migration would be a 'lift and shift', without any strategic overview or considerations.
As part of our rigorous DevOps Maturity Assessment, Contino consultants conducted a series of in-depth interviews and observations during a two-week period in multiple locations (New York, South Brunswick and Minneapolis) to establish practices and differentials between each team. For example, it captured anomalies between groups working on the Amazon Web Services (AWS) implementation: some were experiencing great success whilst others were struggling. Significant differences in the levels of agile and operational maturity were found. For example, some teams only used basic testing practices whilst others were advanced with agile development approaches.
Specific barriers to progress that were identified included:
- Replatforming to AWS without adequate foundations, or architectural/ operational strategies in place
- Technology-only adoption of agile
- Institutional silos within the technology group
- Renaming operations to DevOps
One of the core Contino recommendations was that of transitioning the DevOps group to a federated DevOps model. The existing, highly-centralized DevOps team structure was seen as too 'command and control', was inflexible to meet ever-changing requirements, and was creating bottlenecks as a result.
Contino proposed the creation of a federated tooling and services team that would serve technology by championing, evangelizing and governing the shared and desired development and operational outcomes of each business unit. This would be achieved by creating Centres of Excellence across DevOps disciplines, leveraging open source tools and ways of working extensively, to create DevOps assets, and templates for reuse across technology. The concept of 'trust but verify' was put forward, in which development teams are given appropriate tooling and access to do their job, but their work is verified to ensure they are operating within established boundaries, with complete visibility and transparency built into each team’s work for risk mitigation.
Contino consultants were embedded in the new regional teams to upskill them and ensure a smooth transition to the federated model.
In parallel, the CTO asked Contino to determine if a Site Reliability Engineering (SRE) function would support the increasing demands placed on the company’s engineering community, who were firefighting multiple requests from different business units with no visibility of what was coming downstream to them.
Contino oversaw the introduction of a Tooling and Services Operating Model Framework that ensured policy, process and validation would be baked into the company’s toolset from the outset. As a result, anyone using the tools would be following policy. Meanwhile, SRE would give operations a more robust capability and ensure that releases were of the highest quality. Platforms would be continually pushed to their limits, with improved performance engineering.
Adoption of a federated DevOps team model would bring a number of advantages, including:
- A central team ensures that application infrastructure and application configuration meets the basic needs of all business units, thus eliminating redundant technologies and driving consistency across tooling and implementation
- Opportunities to leverage innovation are generated from project-based work and re-purposed across all business units
- Overall Continuous Delivery capability and maturity of teams enhanced, through targeting Subject Matter Expert (SME) secondment
It was forecast that an SRE function would have a positive impact in a number of areas, such as:
- An improvement in the overall quality of the product produced, by encouraging engineering excellence from the delivery team
- Efficiencies resulting from the introduction of fundamental agile project management principles freed up the time of almost two FTE resources
- A reduction in the time taken to resolve production issues by cutting the time taken to identify and fix bugs
- Continual demand and capacity planning helps ensure that the product can continually scale to meet future needs