//

“Accelerating Value Delivery & Empowering Autonomous Teams”: Jen McVicker with Atlassian (Video + Transcript)

March 27, 2024
VIDEO

Jen McVicker (Atlassian Senior Enterprise Technical Architect) is an expert at building cross-functional teams responsible for the entire lifecycle of an application or service with over 25 years of experience in the field of software development and delivery. She discusses how to reduce risk by adopting a loosely-coupled architecture & categorizing services into tiers. Attendees will leave with one key takeaway – measure the right things!


WATCH ON YOUTUBE

In this ELEVATE session, Jen McVicker (Atlassian Senior Enterprise Technical Architect) discusses how adopting DevOps culture, strategies, and tools can reduce time to market and increase developer satisfaction and productivity by bringing together development and operations teams to work as a cohesive unit with shared goals.

Like what you see here? Our mission-aligned Girl Geek X partners are hiring!

Jen McVicker ELEVATE quote google dora devops metrics engineering performance

Transcript of ELEVATE Session:

Jen McVicker:

Today I’m going to share how you can reduce time to market and increase developer satisfaction and productivity by adopting DevOps culture, strategies, and tools. Let’s look at DevOps culture first.

In the world of software engineering, there’s historically been constant tension between deployment frequency on the development side and site reliability on the operations side.

When you have separate operations and development teams, this can result in a tug of war. The operations team is trying to limit deployments because deployments introduce change, which always carries some risk of failure while product development teams are trying to deliver value to customers as quickly as possible.

How can we resolve this struggle? Back in the late 2000s software development and operations communities started raising concerns about this tug of war. They realized that these apparently diametrically opposed groups could actually work much more effectively if they worked together as a cohesive unit with a shared set of goals.

The term DevOps was coined by Patrick Dubois in 2009 to represent this new way of working. But it’s not just as simple as mashing together two groups and telling them they’re now a team. You’ve heard of forming, storming, norming, and performing, right? It’s called Tuckman’s model, and it describes the life cycle of a team.

When a new team is created, they don’t immediately begin working together as a cohesive unit. This is when it’s important to develop working agreements because sooner or later conflict starts to arise within the group. And this is actually a good thing. It means that your team is starting to trust each other. As long as you have team agreements in place to ensure that everyone treats each other with respect and that each person’s opinion is considered valuable, you can harness that conflict to drive the collective team to deliver better outcomes.

Why? Because the knowledge of the collective group is greater than that of any one person. When the group starts to really coalesce as a team, they’ll begin sparring on problems and come up with solutions that take into account the diversity of experience among all team members. But teams need stability to develop this level of trust. That means not swapping team members in or out very often because every time the team changes, they’re going to regress for a little bit.

If you have a long-lived stable autonomous team that has developed trust, they’ll get to this point. Performing. And this is when you see real dividends pay off. I snuck something in there though. Did you notice? Autonomous. What does that word mean in terms of software development? Well, autonomous teams are independent. An autonomous team is made up of a cross-functional group of people who can move quickly because they don’t have dependencies on other teams. Ideally, all the skills needed to perform the work are encapsulated within the team.

The key is to eliminate as many dependencies on other teams as possible while keeping your team at a reasonable size, about five to 10 people. Now, another key element here is value-aligned, and by this I mean that the team is responsible for the entire flow of a stream of value to the customer. For example, an e-commerce solution. It needs to include products, the mechanism for adding products to the shopping cart, the checkout process, but it could also include things like a search engine for finding products to add to the cart, or customer tracking to gather data around customers who abandon their cart.

Finally, autonomous teams are empowered. They’re empowered to experiment, learn, and pivot as needed without a lot of bureaucracy, empowered to decide how they will work and what tasks to prioritize. And most of all, empowered to decide the best way to achieve the outcomes they’ve been asked to deliver.

Now, this involves a lot of trust that the team will deliver on their objectives, but if you empower your teams to be autonomous, you’ll be astounded at what they can accomplish. There’s a secret to getting the best results from an autonomous team though, and I just mentioned it.

Outcomes over output. Simply put, rewarding outcomes over output means measuring the right things. People are incentivized to deliver what gets measured because they’re held accountable to those metrics, so make sure you’re measuring the right things. Let’s take an example. You ask the team to build an avatar feature for the user profile section of your community website. You think that customers will come back to the site more frequently if they can see each other’s faces or cartoon robot faces in this case. The team takes a few sprints to deliver the feature because they need to implement a new backend process to optimize and store the images.

Now, you finally launched the feature, and it results in a very mild uptick in monthly active users for a couple of months, but then it drops back down. The effort and expense that went into building that feature may ultimately not produce the desired outcome. What if instead you ask the team to increase customer engagement on the website? Do you see the difference? In the first you’re telling the team how you want them to solve the problem. In the second, you’re telling the team the problem you want them to solve. How do they know how to solve the problem though? Well, it’s not easy, but it is simple. Experiments.

Much like with AI-generated art, you’re unlikely to get the outcome you’re looking for on the first try. If Agile has taught us anything, and I hope that it has over the past couple of decades, it’s that we have to iterate learning to capitalize on the things that work and pivoting to something different when the feedback tells us we’re going down the wrong path until we get to the outcome we ultimately want.

This is the power of experimentation and autonomy. When the people who are closest to the work are also closest to the feedback, they can learn from that data and make the best decision about what experiment to try next without having to get approvals and buy-in from three levels of management and a dozen other stakeholders first. By making teams accountable to outcomes instead of output, you incentivize them to quickly learn what works and what doesn’t, which naturally leads to a learning-centered culture, one of the key principles of DevOps. The faster you can get feedback, the faster you can learn what works and what doesn’t. So let’s take a look at some of those DevOps strategies and best practices.

Ten years ago, Google decided to research why some companies have high-performing software teams and why others fall short. Google’s research team known as DORA released the first State of DevOps report in 2014, and in it they introduced four key metrics that correlate to strong engineering performance. Those metrics are deployment frequency, which measures how often you deploy changes to production; change lead time, which measures how long it takes from the time a developer first commits code to the time it is released to end users; change failure rate, which measures what percentage of changes pushed to production cause a failure and mean time to recovery, which measures how long it takes to restore service after a failure. They’re collectively known as the DORA metrics. And you’ll notice that these are metrics that directly tie to strategic outcomes.

Deployment frequency and change lead time point to how quickly you can run an experiment and get feedback. Change failure rate and the mean time to recovery tie directly to code quality and site reliability. By focusing on these four metrics, your organization can reduce downtime and development cost while delivering value to your customers more rapidly. Now, all four of these are important, but today I’m just going to focus on deployment frequency and change failure rate.

By looking at these two metrics, you can build a robust process that mitigates risk and reduces time to market resulting in faster feedback loops. Let’s take a look at a common bottleneck for deployment frequency that I see at a lot of organizations. It’s the Change Approval Board or CAB. CABs are usually made up of leaders across different areas of the company’s technology stack. Those CAB meetings often don’t happen more than once or twice a week because it’s expensive to have a lot of highly paid people sitting around to give the thumbs up or down on a laundry list of changes.

Because of this, many changes are delayed several days before they’re actually released in production. What’s the solution? Do we eliminate the CAB altogether? Heck no. CABs serve a very important purpose. Their primary focus is site reliability. Remember that yellow ops side of our infinite loop?

The CAB is a key player here. Not only do they review proposed deployments in order to identify potential dependencies or downstream effects, they can also ensure that resources from any affected systems will be on call to address problems that might crop up during deployment such as database or network admins.

This is critical for complex deployments that may affect many areas of an application. Ideally, though, those dependencies and downstream effects are already known and communicated throughout the development process, and the people deploying the changes would have the ability to resolve any problems arising with the change because that change would be isolated from other applications and services.

How can we reach this ideal state? First step, a loosely coupled architecture. Loosely coupled architecture is a fancy way of saying that the software is made up of multiple independent services, sometimes called components, that interact with each other through APIs. For example, let’s say we have an online store. You may have a service that allows people to search for products. That service would call your product catalog service, and it does so by calling a specific URL, which is the catalog’s API endpoint, and passing along search criteria such as a keyword to get the results.

The catalog service API then returns zero or more records that match the search criteria. The search service doesn’t care how the items are stored in the catalog, how they get added or removed or updated. All the search service cares about is that when a keyword is sent to that URL, specific values will be returned, such as the name, price, and thumbnail image of the item.

This means that the catalog service can be updated at any time without affecting the search service at all as long as the parameters that get sent and the values returned by the API stay the same. You could add more product information such as customer ratings or even swap out the database that stores the products for a new one.

As long as the API endpoints don’t change and the same values are returned, the search engine will continue to function.

There are major benefits to a loosely coupled architecture, but there’s one big drawback. It turns into software sprawl. Now, software sprawl occurs when the number of applications or software components within an environment rapidly grows and changes. Sometimes this term is used in reference to the tools that your organization uses, but today I’m talking about the applications and services that your organization creates.

Sprawl makes it really difficult for traditional software project management to scale, including CABs, and it can be a nightmare for engineering teams to keep track of where dependencies exist, who owns what service, and when changes are being deployed.

How can we solve this? Well, hang tight. We’re going to get there in just a minute, but first, we need to take a little detour and talk about service tiers.

Service tiers are a great way to manage risk. I’m going to do a quick walkthrough here of what service tiers are, so we’re all on the same page.

Tier one also sometimes called tier zero includes your most critical services. Any downtime for a tier one service is going to have a significant impact to customers or the company’s bottom line. An example of a tier one service could be a credit card processor.

Tier two failures will cause a serious degradation to the customer experience, but it doesn’t completely block customers from interacting with it. Tier two services might include that product search that we just talked about. Customers could still purchase an item by browsing through categories even if the search is down, but it’s a poorer experience.

Tier three services have a very minor impact to customers or limited effects on internal systems. Tier three services might be something like that avatar display that we talked about earlier. And finally, we come to tier four. These services would have no significant effect on customers or internal business users.

Tier four services could include things like a sales report that temporarily fails to generate. Now we have one more point to cover before we move on to tools, and it’s about the difference between continuous delivery and continuous deployment.

These are actually related concepts that are often mistakenly used interchangeably. Continuous delivery is defined by Google as the ability to release software changes on demand quickly, safely, and sustainably. It does not mean that the code has been deployed to production.

Rather, it means that the main trunk of your repository in your staging environment must be ready to be deployed to a live production environment at any time. Any testing or scanning that needs to happen before the code is released to production must be done before the code is pushed to that pre-production environment. Now, remember, change failure rate? This is how we move the needle on that metric by ensuring that our code is deployment-ready and fully tested before merging it into our staging environment.

Continuous deployment takes this one step further and automates the release to production. Now, it doesn’t necessarily mean that the code is immediately released to production as soon as it reaches the staging environment. Those deployments might still be scheduled to run at a specific time. The key to remember here is that continuous deployment is dependent on continuous delivery.

Okay, now we can talk about tools. Remember, software sprawl? Well, software sprawl can be tamed by adopting an IDP, an Internal Developer Platform to catalog and document metadata about services. At Atlassian, we use Compass, an IDP we built specifically for the purpose of bringing clarity to loosely coupled microservices architecture. An IDP solves the common problems of lack of visibility into the status and risk level of services and the connections between them.

For instance, identifying the service tier and whether or not this service is active and the dependencies between different components. Now, suppose you’re working on a service with dependencies on several other services. Rather than waiting to deploy all the changes at once, you can have each service deployed to production separately behind a feature flag and then enable all the changes at once when you’re ready to launch.

Feature flags can minimize risk by allowing you to deploy changes to production in a dormant state and enable them later, and they give you the flexibility to quickly back out changes that aren’t performing the way you expect. So let’s look at this all together now. When software is designed with a loosely coupled architecture, we reduce the risk of introducing changes to a single service in the application.

If we assign an appropriate tier to this service and document any dependencies, we can identify which services are low risk. When the DevOps team is practicing continuous delivery, we know that the code and staging is in a consistently deployable state, having already passed testing and security scans. With feature flags, we can control when the new change takes effect separately from the process of deployment.

If that DevOps team is responsible for the end-to-end life cycle of the service and is held accountable for the quality of their deployments through tracking the change failure rate, they’re naturally going to focus on ensuring that this metric is as favorable as possible. Now, if all these things are true, then there’s no real benefit to having CAB review changes for low tier services prior to deployment. Even if a deployment does fail, you’ve ensured that it’s only going to have a minor effect on customers or internal business processes at the very worst, and it can be rolled back quickly using feature flags.

By implementing a continuous delivery process for services with a loosely coupled architecture, you can create a set of standards for pre-approved changes, which will unlock the ability to deliver value to customers on a more frequent basis. And by enabling autonomous teams that are empowered to solve business problems by focusing on outcomes over output, we can ensure that they’re incentivized to quickly learn from mistakes, take ownership of the success of the application, and create fast feedback loops.

If you only take one thing away from this session, I hope it’s this, you get what you measure. Measure the right things. And we’ve really just scratched the surface of DevOps here today. There is so much more that can be covered than what I could cover in a 20 minute presentation.

If you’re interested in helping your teams develop a more mature DevOps tool chain and process, feel free to scan this QR code and send me an email or reach out to me on LinkedIn, and thank you so much for your time today. I’m grateful to have had the opportunity to present to you all at the 2024 Elevate Conference.

Like what you see here? Our mission-aligned Girl Geek X partners are hiring!

Share this