Ever had to convince a group of sysadmins to adopt Agile? In the middle of a merger? And you’re brand-new to their organization? In this experience report, I’ll share the change management techniques and iterative approach that helped us make a successful (but not perfectly smooth) transition from an ad hoc sysadmin group to a high-performing Agile DevOps team.
As a scrum master, I help teams make Agile transformations, handle rapid growth and deal with institutional change – just not usually all at once. I was faced with exactly this challenge when I joined NPR’s Digital Media division in April 2017. I started as the scrum master for just one team, an experienced Agile product team working on the NPR website. But less than a month later, in May 2017, our Chief Digital Officer announced a merger with a separate NPR division in Boston and I got a new project.
In order to help manage all of the rapid change and growth we were about to experience, my manager asked me to take on our small sysadmin team and coach them in Agile methods. At the same time, we’d have to manage a physical infrastructure move and knowledge transfer from our Boston colleagues who chose to leave after the merger. What could possibly go wrong?
NPR, or National Public Radio, is an independent, nonprofit multimedia organization founded in 1970 with the mission to create a more informed public. The Digital Media division in D.C., reporting to NPR’s Chief Digital Officer, supports the newsroom and audience through digital content management and publishing systems, the NPR.org website, apps, and other public-facing and internal services.
From 2008 through 2017, NPR member stations across the country were supported by Digital Services, a Boston-based division that provided tools and services like station websites, analytics/reporting, and general technical support and also reported to the Chief Digital Officer.
The two divisions’ infrastructure and development needs significantly overlapped, so in May 2017, our Chief Digital Officer Tom Hjelm announced that Digital Services would merge with Digital Media and move to NPR headquarters in D.C. All Boston-based Digital Services employees were offered relocation, but most chose to remain in Boston and transition out of their roles.
As a result, Digital Media needed to support a new customer base, new set of products, different software stacks, and completely distinct hardware and infrastructure that needed to be shut down, shipped to D.C. facilities, and spun back up without interruption. We needed a plan. Maybe even more than one.
3. GETTING STARTED
The first thing I did, still in my first month of work, was sit down with our existing Digital Media team (three local sysadmins and one remote) to understand their current workloads and workflows. What did they do? How did they handle deadlines? Their answers: “whatever someone asks us to do” and “whenever they start yelling.” (They were joking. Mostly.) I asked more questions, because I’m a scrum master, and slowly found out more.
The sysadmins not only support Digital Media’s product and audio production teams but also other NPR divisions, including the digital archivist division and the digital media production group as well as regularly working with broadcast engineering. This meant they were supporting a website with 36.7 million users/month per Google Analytics ; NPR’s internal content management system; 2 main apps with a combined 2.9 million users across platforms per Google Analytics ; the division responsible for fact-checking, transcribing, archiving, retrieving, and conducting analysis on demand for every single story NPR ever aired; and the team responsible for making sure that newscasts like Morning Edition and All Things Considered reach their 13.9 million listeners each week on any platforms other than radio per Nielsen Audio Nationwide .
Figure 1. Example of supported projects
With all that, there was no one way of working together or any established process. The requests from this range of teams could come in to their shared sysadmin email inbox, in their Slack channel, through a direct Slack message, or even in a random hallway conversation. It was all ad hoc but it worked well enough, according to the team lead.
The sysadmin team lead had been there for over a decade, starting when the entire Digital Media division was small enough to fit into a 12-person conference room, and this informal way of working was the way it had always been. The sysadmins all hung out in the same physical or virtual space, so they just talked about work and eventually it got done. That was fine, right? No need for more change than we were already experiencing or any kind of new process.
I talked to the other sysadmin team members to understand their perspective, and it was incredibly helpful. Tyler, the senior sysadmin, shared his frustrations with never being sure what he should prioritize and constant help requests interrupting his work on more complicated projects. His comments were echoed by other team members.
My manager encouraged me to talk to additional stakeholders and other teams, and she pointed me towards the right people. She’d been at NPR for more than eight years, and shared history and context to help me understand the current situation and relationships better. I learned even more from the new conversations. No one really understood what happened to any request they made of the sysadmin team, who was working on it, or when they could expect it to be completed. They also had no idea what else the team was working on, or its importance.
I heard stories of emailing the shared mailbox and never hearing back, or hearing separately from two sysadmins who weren’t aware they were both responding. People told me, “I just wait till I see Tyler in the hallway and then I ask him for help with my devbox. He gets annoyed but he usually does it pretty fast.” And “We email Vick directly for help with audio extraction since he set up a tool for us and he’s the only one who knows how it works.”
I gathered a good baseline of existing issues and identified where Agile frameworks could help resolve them. But the sysadmin team was about to deal with a whole new challenge: adding the infrastructure and support of the Digital Services division, a previously separate entity that was now merging with Digital Media. Digital Services had a dozen distinct products and services of their own that were used by many of the 262 Member Stations across the country.
Figure 2. Logo station map
4. PICKING A FRAMEWORK … OR TWO
4.1 First Steps
We had to find a way to manage all of this work, and we had to find it fast. The rest of the department was using Scrum, with boards, daily standups, and all the ceremonies. I thought that introducing all the ceremonies might be too much too soon, but daily standups and trying out a lightweight Scrum board seemed like a good place to start. There was some initial resistance even to that from our lead and the Digital Services lead – “we’re too busy to meet every day!” – but team members like Tyler and Vick were willing to try it.
Having allies who were interested in trying Agile was a huge help in getting started. I also talked to both leads individually and reassured them that the daily standup meeting would last 15 minutes, require no preparation, and exist solely to help the team coordinate all the work we were facing.
It was a bit awkward at first, but I was able to get the team talking with some gentle prodding, a little joking around, and the support of my allies on the team. It quickly became evident that with so much going on, there was actual value in meeting briefly at the start of the day to share what everyone was working on and what obstacles they were facing. We had less duplication of effort, more communication among team members, and even some knowledge transfer about the systems we were inheriting and the products they supported.
However, the Scrum board was not an equal success. I tried to make it as lightweight as possible, using the cards to basically take notes on work discussed in standup without forcing user story framing, acceptance criteria, or even estimating on the team. It took a few weeks, but we got the majority of our work related to the merger, supporting our existing infrastructure, new requests, production issues, etc. on the board!
And it was a mess.
Figure 3. Mock-up of first scrum board
We had the work in one place, but it was all over that place. We had stories relating to the merger mixed up with requests for redirects and stage environment issue reports as well as pieces of longer-term projects like upgrades that we were trying to keep moving. It was very chaotic, but on the plus side, it was also an accurate reflection of everything the team had to deal with. It just wasn’t helping them very much.
4.2 Iterating and Adapting
My manager Kim suggested that the cards might be too lightweight, and now that the team was used to looking at the board, it might help to add more structure. We started grouping work like the merger and upgrade projects by epic, and using swimlanes to keep it separated on the board.
Figure 4. Mock-up of revised scrum board
This did help, but we still had a lot of unplanned work – support work for other departments, production support for existing infrastructure, etc. – flooding the board. We needed to try something else.
I had a couple of ideas, but held off on suggesting anything so the team could talk about their own ideas at our retrospective after the initial phase of the merger was complete, in late fall of 2017. If they came up with something I was also thinking about, that was a good sign that it might work and it would have better adoption coming from the team itself. If they came up with something I hadn’t considered, even better.
The state of the board came up right away at the retrospective, and so did several ideas. One person suggested Kanban as a good way to handle ops workflow, but another team member pointed out that this wouldn’t accommodate our larger project work or the items tied to the sprint cycle the rest of the teams were on. Plus, it wouldn’t solve the problem of the team not knowing who should jump on the ops items as they came in. At this point, our new team member Zach mentioned that his previous job used a portal and a “goalie” who checked all the incoming ops tickets, solved what they could, and handed the rest off to the team as needed. There was some disagreement about whether this could work with our board, but the idea seemed really appealing to most of our team.
4.3 Trying Out Scum + Kanban
I’d spent some time with our tools, so I knew that we could set up a separate Kanban board with an automatic assignee who would get notified about incoming tickets. Even better, we could connect that to a service desk portal with a customizable request form so that the wide range of people, teams, and departments that needed our help could have one easy place to ask for it and to track the status of their request. Plus, we could collect some important information on the form that would make it easier for our team to understand and prioritize the work and minimize the back and forth of getting all the necessary details.
Figure 5. Screenshot of portal
It took a little time to get the new service desk portal and linked Kanban board set up, and some help from our technical writer Sarah Hersh to make sure our request options and workflow were clear and easy to understand, but I had some free time over the 2017 holiday season to test it, and so in January 2018, we were ready to try it out.
It wasn’t the smoothest beginning. Sarah had also helped us create communications about the new method of requesting help, but of course not everyone read the emails. We had to redirect Slack messages, emails, and random drop-by requests by asking people to use the portal, which wasn’t easy for our new team members. I did my best to help out by showing people how to create tickets, getting my fellow scrum masters to remind their teams of the new process, and making sure the team knew they could use me to help deflect pressure.
The dual board system actually went more smoothly for the team itself. In standup, we would go through the Scrum project board as usual, and then move over to the Kanban board. The team member on service desk rotation would give a quick update on the tickets they were working on, and ask for help or hand off items as needed. The new process really helped the team get more focused on their work, and while it didn’t eliminate interruptions, it reduced them considerably.
Figure 6. mock-up of Scrum board
Figure 7. screenshot of Kanban board
4.4 Advantages and Benefits
As we continued to use this Scrum + Kanban approach, we discovered additional benefits. New team members would get a service desk rotation after their initial onboarding, and the range of requests that came in acted as a training aid. They learned about our systems much more quickly than before.
It improved our ability to plan accurately. We were no longer trying to factor in interruptions for all team members, and guessing at what they’d each be able to accomplish. Instead, we were counting one team member’s capacity as primarily devoted to service desk work, and the other team members could plan on having time to focus on development and project work. This really helped us complete the shift from being a primarily reactive team into one that could be more proactive in planning and executing their own work as well as reacting to requests and issues.
For stakeholders and other teams in the department, it also took away the “black box” aspect of ops work. They could see the progress of their requests, who was assigned to them, and how many other requests we were handling at the same time. It helped them understand that their quick ask might actually be more complicated, or might be one of ten requests that came in that day.
5. WHAT WE LEARNED
Change fatigue is very real, but sometimes dealing with overwhelming change is unavoidable. Even then, it’s faster to go slow. Our Agile transformation started without planning, grooming, retros, or even a fully detailed Scrum board, but that was all the team could handle at that point. And then once we’d begun, it was much easier to pull in additional Scrum ceremonies because the team had already committed to the first steps, and had seen that I was willing to adapt to their needs and not use things that weren’t working. If I’d tried to impose a full Scrum framework from day one, the resistance would have been much higher, adoption would have been much slower, and it would have taken us more than eight or nine months to get to a process that worked for us.
Allies are essential. I had the support of my manager and my scrum master colleagues, but also of several team members who were eager to try a more structured way of handling work. Without those allies on the team, it would have been much more difficult to suggest changes to how we worked. If you don’t have allies, it’s worth trying to understand why, and either change your approach or see if you can change their minds.
Scrum is great for product teams, but it couldn’t handle our interrupt-based work. Kanban is great for service desk and interrupt-based work, but couldn’t handle our longer or more complex projects. As a DevOps team that was truly handling both development and operations work, we needed the Scrum + Kanban approach to handle our overall workflow.
Keep trying things! If we’d settled on Scrum as good enough and kept trying to force interrupt work into a swimlane on a scrum board, we wouldn’t have gotten to the place we are now. And we’re not stopping here. As our division grows, we’re experimenting with embedding DevOps members in specific product teams as needed, and continuing to refine how we use our two-board Scrum + Kanban system. It’s a workflow tool, a communications tool, a training tool and I can’t wait to see what it and the team become next.
I’d like to start by thanking Kim Bryant, my former manager at NPR, and my current co-workers Stephanie Oura and Alexander Diaz for their support during this Agile transformation. I’m also grateful for past and present members of our DevOps team – Shain Miley, Tyler Sullens, Ted Neykov, Vick Krishnaswamy, Zach Dixon, Sarah Hersh, Ahsan Lake, Grant Dickie, and Anthony Ghandi – without whom this experience would have never happened! Huge thanks as well to Nara Kasbergen and the Tech Conference Speakers group for their encouragement and support, and the NPR Audience Insights team for their research and information. Lastly, I want to thank Curtis Michelson for his excellent feedback and suggestions for this report, and of course for his enthusiastic support of public radio.
[1, 2] Google Analytics 3-mo avg (Jun 2017 – Aug 2017).
 ACT 1 based on Nielsen Audio Nationwide, Fall 2018, Persons 12+