Converting a successful, ongoing program while maintaining 2M lines of software code to Agile presents unique challenges that are not covered in most books and courses on Agile; this talk describes successful strategies used on a criminal justice program. This is an up-‐to-‐ the-‐minute account of our (continuing) experiences instituting Agile practices, paying down technical debt, and addressing the concerns of skeptical governance and oversight groups. This paper explores the motivations for instituting Agile on an already-‐successful program, the challenges we encountered, and the lessons learned.
There are many examples in the literature of unsuccessful programs "rebooted" and rescued through the use of agile software development models. And there is a wealth of information available on how to institute agile practices such as test automation and refactoring into new software projects. But what about a highly successful, longstanding, waterfall-‐based government program with over 2 million lines of code and 15 years of history? How do you institute DevOps practices on a project that uses 100% Microsoft platforms? How do you write automated tests and refactor mercilessly on a project where the majority of the business logic is written in SQL? How do you deliver valuable software when the oversight framework requires you to maintain over 50 separate documents? How can we institute Scrum while simultaneously undergoing CMMI level 3 Certification?
This experience report describes specific practices that we adopted during our two-‐year (and counting) experience transforming a criminal justice program to Agile. This talk attempts to provide context and insight into when and how specific approaches might provide the most benefit for legacy modernization projects or projects answering to skeptical oversight groups. While this experience report may be of interest to anyone who is interested in Agile software development, it is particularly suited for anyone who works or wants to work in an Agile context while working with legacy code, maintaining and enhancing a large pre-‐existing software application (especially one that is already being used in production), working in a large company or government environment, working with oversight or other groups who may be unaware, skeptical, or unable to use Agile, or complying with standards such as PMBOK and CMMI that have historically been applied to traditional (non-‐agile) projects.
The project supports a prominent criminal justice program that operates both domestically and internationally. The software consists of an integrated suite of applications that have been operational in production while being continuously maintained and enhanced over the last 15 years. Since the late 90s when the software was first developed, new capabilities and enhancements have been continually brought online.
2.1 Motivations for Moving to Agile
By most accounts, the program has been highly successful. Since going live over 15 years ago, the program has been continuously maintained and operated without significant service interruptions, and it has undergone at least three major transitions from one prime contractor to another. Each prime contractor in turn has had the project successfully assessed at CMMI level three, a prestigious award that implies a high degree of process discipline. Although significant pieces of the original design remain in place, successive development projects have been able to make substantial improvements over the years, enabling the program to keep pace with the needs of the criminal justice community. The program has successfully rolled out seven major and dozens of minor software releases to its users. Given the long and successful history of the program using traditional waterfall-‐based software development practices, what is the motivation for moving to Agile?
Quicker and more predictable development cycles. The single most important motivation for moving to Agile is to deliver enhanced capabilities to support the mission as quickly and reliably as possible. Given its vital ongoing mission, deciding whether to adopt a set of practices that have been shown to lead to quicker and more predictable delivery of value seems like an easy choice. The faster we can get improved capabilities and tools into the hands of those charged with our safety, the better.
Automated testing. While not strictly a requirement of adopting an Agile lifecycle such as Scrum, test automation is a best practice that is closely associated with Agile. Over the last 15 years the software has grown to incorporate an enormous set of highly complex functionality, but without employing any test automation. Changes and enhancements have multiplied on top of each other to the point where any new addition of functionality or defect repair entails significant risk of breakage of existing functions. Most of the original authors of the software components have long since left the program, and many components have been maintained by up to dozens of different software practitioners. The result is that designs have become more brittle over time, to the point where it appeared that “fixing one problem breaks two more.” More and more manual testers are required to provide adequate coverage of all existing system functions within a reasonable timeframe. By contrast, automated test cases can be run continuously at virtually zero cost, while freeing up expensive manual testing resources to apply their knowledge and experience creatively to the testing process. In fact, the program may well have adopted automated testing practices regardless of methodology—even if the program had ultimately decided against adopting Scrum.
Ability to rapidly shift development priorities. The criminal justice software is used in a wide variety of contexts. Sometimes problems with a given software release are encountered in the field well into the development period for the subsequent software release. Rather than having to lock in all functional requirements up front, Agile provides the ability to introduce new requirements even late into a development cycle. While there are trade-‐offs involved whenever new requirements are added, Agile’s short, incremental planning cycles provide the ability to quickly address urgent defects and to implement popular user requests, and get them out to the field faster and with minimal disruption.
3. CASE STUDY
3.1 Getting Started: Up Front Decisions and Their Ramifications
There were a number of key decisions and strategic choices that were made on the project as part of the shift to an Agile software development life cycle based on Scrum. The following sections discuss a number of these up front choices and how they affected the project.
Mutual Commitment to Agile. The first and most important decision regarding the project was for both the contractor and the government agency to make a full commitment to Agile. The decision to switch to an Agile process was written directly into the contract, so both sides knew that change was afoot. In this instance, the fact that the re-‐compete process resulted in the selection of a new prime contractor may have provided a beneficial side effect: since much of the contractor team was new, one process was as good as another. The “command and control” structure typically in place for a Federal government agency may have also come into play: once Agile was mandated from up above, the staff readily adopted the new program.
Whole Team Training. Soon after the project started, the entire team, including all members on both the contractor and the government side, attended an Agile “bootcamp” class specifically designed for Government customers. The class was calibrated for professionals with extensive experience working in the waterfall-‐ centric and high ceremony world of government contracting. It focused on the underlying motivations for Agile, and introduced a number of Agile practices in very practical cost versus benefit terms with point-‐by-‐ point comparisons to traditional practices. While no training class can make instant converts of everyone, this class was universally well received, and even the most skeptical seemed to be willing to give Agile a try.
Consolidation Into a Single Tool to Manage Scope and Schedule. The project had traditionally used a suite of software tools to manage project requirements, project schedule, and software change management. Each tool required expertise to manage and maintain, and generating reports that incorporated data from multiple tools was complex and difficult. Early on the team decided to migrate the legacy Requirements, Change Requests/Defects, help desk tickets, and test cases from their former homes within RequisitePro, DevTrack, MS Access, and MS Word, respectively, into Microsoft Team Foundation Server (TFS). The migration was lengthy and difficult, but the end result is a vastly streamlined process that is conducive to automated reporting. The team has started producing high quality automated reports from the TFS system, and we expect to create more over time.
Mutual Commitment to Invest in Test Automation. Recognizing that the cost of maintaining high quality with 100% manual testing would escalate out of control, all agreed up front that the project would expend significant efforts to establish the discipline of test automation. Over the two years, the team struggled to establish automated tests, encountering a number of obstacles—most of which would not apply to a brand new project. The upfront commitment to test automation proved especially important during sprint planning, where the team allocates a time budget (measured in story points) amongst a large set of proposed efforts (captured as user stories). The team was able to devote a significant percentage of time on a consistent basis to technical improvements such as establishing test automation.
Sprint Zero: When Do We Start Sprint 1? We knew when we started the project that we couldn’t wait until all of the planned technical underpinnings were in place before officially starting Scrum. On the other hand, starting Scrum without any development workstations or team members did not make much sense, either. We decided on the following simple set of targets that would tell us when to start Scrum:
- Staffing reaches critical mass,
- Every team member has an imaged PC,
- Bare bones configuration management is in place, and
- A rough initial backlog is identified.
As it turns out, it took us three months to achieve the targets listed above and kick off Sprint 1. Some people have called this initial startup period Sprint Zero, while others claim such a term smacks too much of waterfall. In the Federal Government contracting world, contracts can take months or even years to be awarded, negotiated, and initiated. Few companies can afford to keep a team fully staffed and ready to go for that long. By contrast, many commercial companies commission new projects and staff them with existing employees who already have a desk, equipment, and training. In such a situation a Sprint Zero seems like a waste of time.
3.2 Challenges We Encountered and How We Handled Them
The following sections discuss some of the challenges we encountered during the course of the project. Although some of the challenges were known about up front, all of them had interesting and sometimes surprising ramifications.
Federal Government Outsourcing. The project faces challenges that are common to most software projects for US Federal Government agencies, including the fact that the work is performed at the prime contractor’s site, rather than at the customer’s site, and the fact that projects are generally awarded for a limited amount of time. Each new prime contractor has a limited window of time in which to achieve base project objectives in order to secure additional option years, which may discourage long-‐term thinking such as investing in automated testing capabilities that will pay benefits over time. Project documentation requirements are generally higher, due to the ongoing nature of many government programs and the planned transitions from one contractor to another.
Significant Documentation To Maintain. Given the criticality of the software and the large user base, there is a need to maintain extensive end-‐user documentation and training materials. At the same time, maintaining comprehensive technical project documentation is thought to reduce the risk of knowledge loss due to contractor turn over. As part of the transition from waterfall to Agile, the customer and the project team have re-‐examined the need for extensive technical documentation. Several documents were eliminated, and some others have been largely or completely replaced with automatically generated versions. However, much more work remains to be done. Over the last eighteen months, the project team found that more than 15% of total team effort is spent maintaining documentation. While the customer has indicated this amount matches their expectations, the team continues to look for ways to reduce the burden of maintaining manual documentation. In summary, the team found:
- On a highly regulated, life-‐critical program relatively few existing documents could be eliminated or consolidated outright.
- Significant effort is required to automatically generate high quality documents to replace manually generated ones: “out of the box” solutions are rarely acceptable.
- Even documents containing significant original content can be improved: in several cases the team was able to convert a lengthy, manually maintained document into a much shorter manually maintained document with an embedded excel report generated out of a tool.
- The bottom line is that when it comes to documentation there is no “silver bullet,” rather we are making a gradual transformation towards a reduced dependency on manually maintained documentation.
Non-‐Agile Governance: PMBOK, CMMI Compliance Required. While the customer agreed to adopt Agile methods, the governance and oversight groups still required compliance with the standards espoused by the Project Management Body of Knowledge (PMBOK). At the same time, the contract requirements stipulated that the project would be assessed at maturity level three (“Defined”) for the Capability Maturity Model Integration (CMMI) standard. The project was able to balance competing concerns by quantifying all project efforts and capturing them in an automated tool, and hiring a team member to focus exclusively on documents and audits. At the price of a manageable amount of task overhead, we have been able to generate most of the reports we need to satisfy the concerns of the governance and oversight stakeholders.
Manual Testing Efforts Complex and Difficult. During the interim period while our test automation efforts have yet to reach a substantial level of code coverage, the project continues to rely heavily on the efforts of expert testers using largely manual tests. Given the fact that manual testing is the most important constraint on the success of the project, how can the team maximize the effectiveness and efficiency of the expert testers? The answer is clear: we must remove impediments to manual testing. Experience has shown us that testers spend a significant proportion of the total testing time preparing the test environment or chasing down false positives (configuration errors). Therefore, solving these issues should yield excellent return on investment. We are therefore focusing efforts on automating deployments, automated detection and repair of configuration issues, and other issues related to the discipline of DevOps and continuous delivery.
3.3 Benefits We Observed
The project has enjoyed significant benefits from our adoption of Agile practices and some of the key decisions we made. The following paragraphs discuss some of the more interesting or surprising ones.
Increased Stakeholder Collaboration. Without question, the single biggest benefit we have enjoyed since we started our transition to Agile has been the level of positive collaboration we have achieved. Having two representatives of the customer team co-‐located with the contractor team on an almost-‐full-‐time basis has been absolutely invaluable. The team has been able to clarify requirements questions virtually immediately. The regular feedback and status that customer team members provide to outside stakeholders have greatly helped manage customer expectations—especially when things go wrong. For example, when some tasks took longer than originally planned, the customer was not surprised – because they had followed the process all along and knew that the team had already put risk mitigation plans into motion.
Ability to Reset Priorities Midstream. During the planning session for our first major project release, the customer provided a comprehensive set of business user stories up front—to seed the business release backlog. However, our Agile format enabled the customer to optimize flexibility and progressively refine and adapt the backlog to insert new, high value stories in place of others during the course of the project. Several urgent issues arose unexpectedly during the domestic and international roll out of the previous software release. Feedback from the user community and other venues identified a set of highly desired features. The customer was able to adjust the backlog for each new sprint such that the team always worked on the user stories that had the highest possible value.
Figure 1. Story Points Completed After One Year: Pre-‐planned versus Emergent
In the end, nearly 50% of all of the team’s effort related to business user stories applied to emergent user stories that were not present as part of the customer’s originally identified scope for the project.
Improved Insights Due to Quantitative Measurements. Our quantitative approach to Agile enabled us to gather metrics that highlighted several interesting phenomena:
Figure 2. Story Points Completed After One Year: Technical versus Business versus Documentation
- Approximately 10% of team’s total effort was spent updating documentation during the first year.
- During one four-‐week sprint, the team spent 370 hours performing break-‐fix (fixing testing environments, troubleshooting deployments, etc.).
- The team spent a plurality of their time implementing technical user stories rather than business user stories—primarily in setting up the TFS change management infrastructure and creating all of the testing environments.
- The total time required to manually execute all system test cases, performance test cases, and installation test cases consumed the efforts of the entire team for a full sprint. Test case execution was captured via a set of technical (as opposed to business) user stories.
The team was able to leverage the data to improve planning (understanding that we should budget 4 weeks at the end of each release cycle for hardening) and make the cost-‐versus-‐benefit case for DevOps (to reduce the 370 hours we spent during the first release manually maintaining our environments).
3.4 Continuing Challenges
Although the team has realized many benefits from moving from waterfall to Agile, significant challenges remain. The following are some of the most difficult challenges we continue to struggle with.
Meeting Our Sprint Commitment. During the first eighteen months on the project, the team completed eighteen consecutive sprints. During that time the team delivered a substantial amount of business value and started paying down technical debt. The team finally met our sprint goal for the very first time during sprint number eighteen. Although the team came close several times during the first seventeen sprints, we were never able to completely achieve our sprint goal until sprint eighteen. We have analyzed the reasons for this and we believe they include optimism bias, difficulty breaking down user stories into manageable sizes, and reliance on “specialist” team members that end up becoming bottlenecks for the team. We have continued to work on improving our estimates and we are committed to continue to meet our sprint commitments going forward.
Expanding Outside of Traditional Role Boundaries. It has been challenging for some of our team members to expand outside of the boundaries of their traditional roles. In the team we sometimes still hear “not my job” phrases such as: “I am a programmer, so I should stop testing and start fixing Bugs.” During sprint planning, some testers may not feel as if their input is necessary or important when estimating development-‐intensive user stories. Realistically, we understand that role boundaries that have been reinforced for decades cannot be overcome overnight. We hope that in the future pair programming—for example pairing a tester with a developer—will help the team to work together more cohesively.
Institutionalizing Test Automation. When adopting Agile practices on a pre-‐existing project with significant technical debt, the first objective is to “stop the bleeding:” first; one must stop adding more debt. A good way to do this is to ensure that all code changes are fully refactored and backed by at least one automated unit test. Unfortunately, after more than two years, the project has still not reached this milestone. Why is automated unit testing so difficult on the project? The answer is two-‐fold:
- Most software modules do not currently support automated unit testing.
- Most project developers do not have experience writing automated unit tests.
Our approach to solve the problem has two components:
- Practice writing automated unit tests via pair programming.
- Analyze all 18 of the major code modules within the software and devise an automated testing strategy for each.
Because the software was developed over many years by several different teams, some of the components have radically different designs than others—and will therefore require different testing strategies. Our plan is to devote senior resources to analyze each module in turn to devise an automated testing strategy and develop a small number of automated unit tests that can be used as templates for further development.
4. WHAT WE LEARNED
Our ongoing transition to Agile has just begun, and we plan to continue our journey to realize the benefits of increased quality and decreased risk. The following is a list of some of the lessons we have learned:
4.1 Agile Benefits can be Realized Even On Successful Waterfall Projects
Almost two years into our transition, we can now say with confidence that the program has realized significant benefits by embracing Agile practices. The fact that this program had previously been successful using waterfall is a testament to the unique benefits that Agile practices confer. For example we have found that adopting Agile has helped us increase stakeholder collaboration and given us an improved ability to adapt and to reset priorities midstream.
4.2 No Silver Bullet
There is no silver bullet when it comes to migrating legacy projects to Agile. It requires discipline and sustained efforts. You may encounter a unique situation for which there is little or no guidance and have to figure out your own path forward. Don’t despair. Migrating a legacy project to Agile represents a significant ongoing set of challenges. Persistence pays.
4.3 Planned, Incremental Introduction of Agile Practices
Legacy projects may need to introduce specific Agile practices in a different order and combination than new projects. For example, on the project we are undertaking a move from cubicles to open spaces via a series of baby steps starting with traditional cubicles surrounding central open spaces that are equipped with “collaboration stations.” We are tying the introduction of pair programming with brand new PC workstations having large high-‐resolution monitors. Our (tongue in cheek) message to the team: “with great power comes great responsibility.”
4.4 Start Scrum Even Without Everything In Place
It is not necessary to wait until all technical frameworks are in place before starting Scrum, but the team and the customer should be prepared for the tradeoffs involved as the environment matures over time. The sooner you can start Scrum, the sooner you can start learning, adapting and improving. It would be wasteful to spend significant time up front developing the “perfect technical frameworks” before attempting to use them. For example, we found that over time we developed many unique linkage types between TFS issues that enabled automated reporting. For example, we were able to generate the equivalent of a Requirements Traceability Matrix (RTM) tracing Epics to User Stories to Test Cases. It would not have been possible to imagine all of the possible linkage types in advance. That said, it is probably fair to say that more technical critical mass is required for a legacy project than a new one.
4.5 Adopt a Systems Thinking Approach to Addressing Technical Debt
Legacy projects may have accumulated vast amounts of technical debt. When faced with such large obstacles, it is important to take a step back and avoid premature optimization. We want to view the problem holistically to ensure we are addressing the most important issues first and not “polishing the hubcaps while the engine leaks oil.” Our project contains 18 different software modules, some of which use different designs, libraries, and programming languages. While basic automated unit testing may work well for one module, it may not be appropriate for another. For some modules, the cost of writing high quality unit tests may be sufficiently high that a re-‐write may ultimately be more cost effective. In this case, a better approach may be to develop automated graphical user interface-‐level tests, and wait for a new feature request that is sufficiently large and risky that the incremental cost of a total re-‐write may be palatable to the customer.
4.6 It Helps to Have An Expert Tool Smith
If you are going to use electronic tools to manage your Scrum effort rather than sticky notes and manual Kanban boards, and you need to generate reports for project stakeholders, make sure you have someone on the team skilled at customizing the tool. Providing a customized report that is automatically generated from your tool of choice can go a long way to helping your customer feel more confident. Remember that your customer may answer to other stakeholders who are further removed from the project and who may not have bought into Agile. For a skilled tool smith, generating a custom report may represent a relatively modest effort—and the rewards can be significant and ongoing.
I would like to sincerely thank our government customer for making this paper possible. Thanks to each colleague who reviewed and edited this paper. I would like to thank all my team members for their dedication, expertise, and amazing teamwork. It would not have been possible for the program to come this far without the courage, dedication, ingenuity, and willingness to try new things of every team member, both customer and contractor. I have no doubt that this integrated team will continue to learn and adapt and the best is yet to come.
Rubin, Kenneth S. “Essential Scrum: A Practical Guide to the Most Popular Agile Process (Addison-‐Wesley Signature Series (Cohn))” Addison-‐Wesley Professional, 1st edition, 2012
Kim, Gene et al. “The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win” IT Revolution Press, 1st edition, 2013 Humble, Jez et al. “Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-‐Wesley Signature Series (Fowler))” Addison-‐Wesley Professional, 1st edition, 2010