The team’s ability to frequently deploy to production enables business agility. With frequent releases, there are less changes in each release and the changes are easy to test. 6 months ago our release cycle lasted three days. Each release was an unpleasant duty that teams had to perform. Developers tried to push the release to production as quickly as possible and come back to the business tasks development. As a result, the deployment pipeline was not improved systemically, the tests faded, and the deployment time got worse. In this experience report we present the Stop the Line practice — a practice that helped us focus on fixing the deployment pipeline problems. After just three months we managed to get rid of manual testing, fix flaky tests and make deployment more than 10 times faster. Today, our deployment is fully automated, and the release cycle has been reduced to 4-5 hours.
If your releases are lightning-fast, automated and painless, you are lucky. Unfortunately, this was not our case. We used to have a manual, slow and error prone deployment process. It impacted our ability to deliver business goals. We failed several sprints because we were not able to deploy features fast enough to get them ready before sprint review. We hated our releases. They lasted up to 3-4 days. After introducing the “Stop the Line” practice, the situation has changed. After just 3 sprints (6 weeks) since the practice was introduced, our deployment pipeline became 10 times faster and much more stable. Let me share details of how we achieved this.
Nine teams working on a single product can do a lot during the sprint. We have a golden rule: our teams show at the Sprint Review only those features that have been deployed to Production. But if the release cycle is too long and unpredictable, is there any point in producing more features that can’t be released? To reduce release time we have introduced a very simple rule — Stop the Line. If a release is longer than 48 hours, we turn on the flasher and stop working on business features. All teams working on the Product Backlog must stop the development and focus on pushing the current release to Production. Focus on the release delay and eliminating what caused it is more important than creating more business features we could not deliver anyway. It is forbidden to write any new code, even on separate branches. This behaviour is prescribed in the “Continuous Delivery” article by Martin Fowler :“Your team prioritizes keeping the software deployable over working on new features.”
Not everything worked out right away. We have had to adapt the practice several times. In the first sprint, we Stopped the Line several times and couldn’t work on business goals for three days out of ten. Four out of eight teams failed the sprint goals. Teams complained about defocusing due to context switching from business context to technical context. But due to this effort we significantly accelerated the deployment pipeline, automated stand preparation and deployment, stabilized tests and got rid of flaky tests in just three sprints.
Of course, Stop the Line is not a silver bullet, but this practice is an important step towards Continuous Delivery and genuine DevOps.
Dodo Pizza may seem like a fast-growing pizzeria network. But actually Dodo Pizza is an IT company which happens to sell pizza. Our business is based on Dodo IS — a cloud-based web platform, which manages all the business processes, starting from taking the order and cooking, finishing with inventory management, people management and making management decisions. Dodo IS is written mostly in the .Net framework with the UI in React/Redux, jQuery and sometimes Angular. We have iOS and Android applications written in Swift and Kotlin.
Dodo IS accepts 200+ orders/minute in Russia and handles 3000+ requests per second. The total code base is over 1 million lines of code. In seven years we have grown from a couple of developers serving a single pizzeria to 70+ developers supporting 480 pizzerias in 12 countries.
The Dodo IS architecture is a mixture of a legacy monolith and 40+ microservices. All the new functionality is developed in separate microservices, which we deliver either on every commit (Continuous Deployment) or by request as often as business needs, up to every five minutes. But we still have a huge part of our business logic implemented in monolithic architecture. The most slow deployment rate we have is for the monolith. It takes time to build the whole system (the build artifact size is about 1Gb zipped), run unit and integration tests, and perform manual regress before every release. The release itself is also not fast. The monolith is not country-agnostic, so we have to deploy 12 instances for 12 countries.
Continuous Integration (CI)  is a developer practice to keep a working system by small changes growing the system by integrating at least daily on the mainline supported by a CI system with lots of automated tests. When multiple teams work on a single product and practice CI, the amount of changes in the mainline is growing rapidly. The more changes you accumulate, the more hidden defects and potential issues the code contains. That’s why the teams prefer deploying the changes frequently, leading to a Continuous Delivery (CD) practice as a next logical step after CI.
Continuous Delivery (CD) practice lets you deploy the code to production at any time. This practice is based on a Deployment Pipeline — a set of automatic or manual steps that validate the product increment on its way to Production.
Our Deployment Pipeline contains the following steps:
Figure 1. CI build pipeline
When the product increment contains a lot of changes, the Deployment Pipeline takes longer. There are more reasons for tests to fail. There are more manual test cases to run.
3. MY STORY
3.1 The slow releases problem
I had a dream. I dreamed of releasing our Dodo IS monolith several times a day. We already know how to release the site and recently created microservices several times a day, but the monolith part of our system is a huge, malicious and dangerous beast. Since changes are made in common code, any change can break someone else’s logic. Of course, there are autotests, but the code coverage is not enough to fully trust any change. Hundreds of servers, 12 countries. It does not like being touched often.
But I dreamed that we were able to release Dodo IS several times a day. Once a feature is ready, it could immediately be released and shown to the business. If a hypothesis failed or some logic was broken, instead of rolling back the release, we could quickly fix it, run the tests and release the hotfix. Frequent releases enable fast feedback and business flexibility.
There is a golden rule of XP: if something hurts, do it as often as possible. Our releases have always been a pain. We spent several days trying to deploy the test environment, restore the database, run the tests (usually multiple times), deal with the causes of the crash, fix the bugs and, finally, to release. Often it took more than three days. With a two-week sprint, in order to release before the Sprint Review on Friday, we have to start the release on Monday. That means we are working on the sprint goal only 50% of the time. If only we could release every day, then the productive period of work would grow to 80-90%.
When multiple teams work on the same code base, they produce a lot of changes every day. With lightning fast releases it doesn’t cause any problems. But let’s see what happens if releases take more time.
Our average release used to take two-three days. There were six teams working in a common dev branch. Just before the release we branch a release branch. This branch will be tested and deployed to Production while the teams continue development in a common branch. Before the release branch reaches Production, the six teams write quite a lot of code in dev branch. Each team wants to roll out one or two new user stories in the next release. During the last year we have grown to nine teams and will continue to grow more. For the same three days, nine teams will write even more code. This means that if we do nothing, as the team grows, the volume of changes made in three days will also grow. So, there will be more changes in a product increment. The more changes in the increment, the more chances are that the changes made by different teams will impact each other, the more carefully the increment needs to be tested, and the longer it will take to be released to Production. It is a reinforcing loop (see Figure 2). The more changes we have in the release, the longer the regress time. The longer the regress time, the more time between releases and the more changes team produce for the next release. The following CLD (Causal Loop Diagram) illustrates this dependency:
Figure 2. CLD diagram: long releases lead to even longer releases
We were looking for ways how to solve the problem.
3.2 Regress automation with help of QA team
We have long understood that slow releases are our bottleneck. Why are they so long? During the release, we take the following steps:
- Environment setup. We need to restore the Production database (675 Gb of data), scramble personal data, and purge RabbitMQ queues. The scramble process itself affects a lot if data and takes about 1 hour.
- Run automated tests. Some UI tests are still flaky so we have to run them multiple times. Fixing flaky tests requires enormous focus and discipline.
- Manual acceptance testing. Some teams prefer to run final acceptance tests before the code goes to Production. This may take several hours. If they find bugs, we give teams two hours to fix them, otherwise they have to revert their changes.
- Deploy to Production. Since we have separate instances of Dodo IS for each country, the deployment process takes some time. After the first country deployment is done we monitor logs for a while, look for errors, then continue with deploying to the remaining countries. The whole process usually takes about two hours, but sometimes may take longer, especially if we have to roll back the release.
At first we decided to get rid of manual regression testing, but the path to this was long and difficult. Two years ago, a Dodo IS manual regress lasted for a week. We used to have a whole team of manual testers, who checked the same features in 10 countries week after week. Such work is not to be envied.
In June 2017, we formed the QA team. The team’s primary goal was to automate the regression of the most critical business operations, orders acceptance, and production of the products. For the next 6 months, a new QA team of four people covered the critical path. Feature teams developers actively helped them. Together we wrote a clear and easy to read DSL (Domain Specific Language), which was readable by customers. In parallel with the end-to-end UI tests, the developers covered the code with unit tests. Some components were redesigned with TDD. Once we had enough tests we could trust, we abandoned manual testing. But this only happened 1.5 years after we started the regression automation. After that we disbanded the QA team and the members of the QA team joined the feature teams.
However, UI autotests have significant drawbacks. Since they are dependent on real data in the database, the data needs to be set up. One test can spoil data for another test. The test may fall not only because some logic is broken, but because of a slow network or outdated data in the cache. We had to spend a lot of effort to get rid of flaky tests and make them reliable and repeatable.
3.3 #IReleaseEveryDay initiative
We created a #IReleaseEveryDay community of like-minded people. We brainstormed ideas on how to speed up the deployment pipeline. We made some quick progress at first. For example, we significantly reduced the UI test suite by discarding repetitive and redundant tests. This reduced the test run by several tens of minutes.
We have significantly reduced the time to set up the environment due to the preliminary backup of the database and data hiding. We are preparing a backup at night, and once the release starts, we switch the environment to the backed up DB in a few minutes. After a couple of months, we had gathered all low-hanging fruit. We had cut the average release time, but it was still annoyingly long. The time came for system changes. All developers were well aware that frequent releases are good. But how could we achieve this when all the teams are busy developing business features? After all, finding a systemic solution takes time to analyze, and its implementation usually takes quite some time.
3.4 Stop the line. The practice invented by the team.
I remember how we came up with Stop the Line. At the Overall Retrospective, we discussed long releases that prevent us from achieving a sprint goal. One of our developers suggested:
— [Sergey] Let’s limit the scope of release. This will help us test, fix bugs and deploy faster.
— [Dmitry] Let’s limit WIP by the number of tasks. As soon as we have done 10 tasks, we stop development.
— [Developers] But tasks can be different in size; this will not solve the problem of large releases.
— [Me] Let’s introduce a limit based on the duration of the release rather than number of tasks. We will stop development if the release takes too long.
We agreed on the rules: If the release is longer than 48 hours, the person responsible for release (releaseman) writes “Stop the Line” in Slack. From this point on, all teams are forbidden to work on the monolith. It is forbidden to write any new code, even on separate branches. All we can do is eliminate the causes of the delayed release. After the release is out, this restriction is removed. We also introduced the “Stop the Line Kanban Board” on a simple flipchart. When the line is stopped, all the team representatives get together and add items to Inbox column. The items either help to push the current release or help to avoid the root causes of the release delay. The team representatives pull a card and work on it until it is done.
When the line stops, an orange flasher blinks in the office. Whoever comes to the third floor, where Dodo IS developers work, sees this visual signal. There is no sound, so it doesn’t drive us crazy, it just irritates. And that’s the point. How can we feel comfortable while release is in trouble?
Figure 3. Stop the Line flasher
At the Overall Sprint Retrospective with all the team’s representatives we review all the experiments in progress, including Stop the Line. Over the last few retrospectives we’ve made a few rule changes, for example:
- Release channel. All information about the current release is in a separate Slack channel. This channel is also for the releaseman to ask for help.
- Release log. The releaseman tracks a log of his actions. It helps to find the reasons for the delayed release and to detect patterns.
- Five minutes rule. During five minutes after the announcement of Stop the Line, the teams representatives gather around the flasher.
- Stop the Line backlog. There is a flipchart on the wall with Stop the Line backlog — list of tasks that teams can do during Stop the Line.
- Do not count the last Friday of the sprint in 48 hours limit. It is not fair to compare two releases — for example, the one that started on Monday and another that started on Friday. The releasing team can spend two full-time days to support Monday’s release, while there will be multiple distractions on Friday (Sprint Review, Team Retrospective, Overall Retrospective) and the next Monday (Sprint Planning One, Sprint Planning Two) so the Friday’s team will have only about half of the day for focused release support. The Friday’s release has a much higher chance to be stopped comparing to the Monday’s release. So we decided to exclude the last Friday of the sprint from the equation.
- Eliminate technical debt. Recently, teams decided that during the Stop the Line they are also allowed to work on technical debt, not just on improving the deployment pipeline.
- Stop the Line Owner. One of developers volunteered to be a Stop the Line Owner. He will deep dive into delay reasons and manage the Stop the Line Backlog. While the line is stopped, the Owner can use any teams to help with his backlog items.
- Post mortem. The Stop the Line Owner drives a post-mortem meeting after each stop.
3.7 Impact on business
How will the Product Owner respond to our Stop the Line practice? After all, because of the stop, the teams will produce less business value. Is this practice too expensive? We don’t know until we try. Any decisions need to be supported by metrics. We measured how much time and money we lost at each stop. But we also measured how much we accelerated the build, stabilized the tests and how much time we saved on every future release.
Nothing good comes for free and Stop the Line is no exception. We pay our price for it and the price is high.
3.8 Cost calculation. Openness. Healthy pressure.
At first we failed many sprint goals because of Stop the Line. Of course, our stakeholders were not impressed (in the least) by our progress and raised a lot of questions at the Sprint Review about why we failed the sprint goals. Respecting transparency, we explained what the Stop the Line is and why it is worth waiting for few more sprints. On every Sprint Review we show how much money we lost due to Stop the Line to our teams and stakeholders. The cost is calculated as total salary of development teams while the line was stopped.
- November $31 500
- December $7 500
- January $18 300
- February $30 000
- March $0
Watching these figures, our teams understand that nothing comes for free and every stop the line case costs us money. Such a transparency creates a healthy pressure and motivates teams to resolve the issues preventing us from fast and smooth deployments.
3.9 Pain and resistance
At first Stop the Line was appreciated by all the teams. Like any other new initiative, it was fun at first. Everyone made jokes and posted photos of our flasher. But eventually the fun was gone and the flasher began to annoy us more and more. Once one of the teams broke the rules and committed to the dev branch during Stop the Line, striving to save their sprint goal. It is easier for people to start violating the rules that impact their work. It is a quick and dirty way to appear to make progress on features while ignoring any systemic problem. And the violators will find good excuses for themselves. As a Scrum Master, I couldn’t stand those rules violations so I raised that issue at the Overall Retrospective. We had a difficult conversation. Most of the teams supported me, and we agreed on the rule “Every team has to follow the rules, even if they are not in total agreement.” We also agreed that together we can change the rules without waiting for the next retrospective.
3.10 Stop the Line owner
At the start of this practice, we relied on teams’ self-organizing too much. We expected that when the line is stopped, the teams would get together, discuss how they can help, find the delay root cause, generate tasks, distribute tasks between themselves and solve the problems. Unfortunately, this did not happen. What happened instead after the line was stopped: the teams got together, asked how they could help, got an answer like “You can’t help much, we just have some flaky tests here and there and I already fixed them.” Then the teams come back to their work and did nothing to find the root cause and make systemic improvements. As a result, we did not reduce the Stop the Line time significantly for few months. For instance, in February we stopped for seven days out of 20 (30,63%) and lost about $30 000.
So we decided we needed a Stop the Line owner. One of the teams volunteered to become the owner for three-four sprints. They carefully explored the slow and flaky tests, fixed and rewrote them, came up with test writing best practices and shared it with other teams. Due to focused efforts, the release time has dropped to four–five hours. In March we did not stop for a minute.
Figure 4. Release length with/without Stop the Line owner
3.11 What was my role?
I am proud that the decision about Stop the Line was not mine — this decision was made by the teams themselves! They offered it; I supported and helped them to execute it. But long before that, I influenced a culture which allows teams to make strong and risky decisions and business stakeholders who trusts teams making these decisions.
3.12 What didn’t work out as conceived?
Initially, the developers did not focus on solving system problems with the deployment pipeline. When the release stopped, instead of helping to eliminate the causes of the delay, they preferred to develop microservices that were not covered by the Stop the Line rule. Microservices are fine, but problems with the release of the monolith will not be solved by themselves. In order to solve the problem, we introduced the Stop the Line Backlog.
Some solutions were quick fixes that rather hide problems than solved them. For example, many tests were repaired by increasing timeouts or adding retries. One of such tests run for 21 minutes. The delay was caused by getting an employee from the table without an index. Instead of changing the query logic, to fix the test a programmer added few more retries. As a result, the slow test became even slower. The solution was introducing the Stop the Line Owner team. During next three sprints the team managed to speed up our tests by 2–3 times.
In one of the sprints Stop the Line slowed us down a lot. We’ve had only two releases in the sprint. Instead of helping with the release, one of the teams broke the rules and continued to work on business features. After discussing this case in retrospective, all the teams agreed to follow the rules.
3.13 How did the behaviour of teams change after practicing Stop the Line?
Previously, only one team experienced problems with the release — the one that supported the release. The teams tried to get rid of this unpleasant duty as quickly as possible instead of making lasting improvements. For example, if tests did not pass on the test environment, they could be run locally or manually. With Stop the Line, teams were able to focus on stabilizing tests. We rewrote the data preparation code, replaced some UI tests with API tests, removed unnecessary delays. Now all tests are fast and pass on any environment.
Previously, it took about three full days for a team member to support the release. Everyone hated supporting releases. Now it takes only a few hours. Teams are no longer irritated by long release times. They are happier when support releases.
Previously, teams did not deal with technical debt systemically. Now we have a backlog of technical improvements, which we analyze during Stop the Line. For example, we rewrote the tests on .Net Core, which allowed us to run them in Docker. Running tests in Docker allowed us to use Selenium Grid to parallelize tests and reduce test run time further.
The teams started to trust the build results. When tests are failed, we know for sure that something is broken. No need to restart the tests to double-check where it is infrastructure or business logic problem. The more teams trust tests, the more they tend to maintain them.
Previously, teams were relying on a QA team for testing and an infrastructure team for deployment. Now there is no one but themselves to count on. The teams themselves test and release the code to production. This is genuine, not fake DevOps.
The Stop the Line practice helps us find a balance between business tasks and keeping our deployment pipeline in a healthy state. If we collect many changes, the size of Product Increment grows. The more its size, the more likely the release time will also grow and the more chances we will not release the Product Increment in 48 hours. When it happens, we Stop the Line. Once the line is stopped, instead of producing business features, we focus on accelerating the deployment pipeline, investing in build and tests quality. As we were less focused on business features, the next Product Increment is smaller. Also, the long-term result of investments in the deployment pipeline acceleration will reduce the future release time. Figure 5 explains how it works.
Figure 5. CLD diagram: Stop the Line balances release time
Effectively, the Stop the Line practice converts a reinforcing loop (Figure 2) into two balancing loops (Figure 5). Stop the Line helps us keep focus on improving the deployment pipeline when it becomes too slow. In just 4 sprints we:
- Deployed 12 stable releases
- Reduced build time by 30%
- Stabilized UI and API tests. Now they pass on all environments and even locally.
- Got rid of flaky tests
- Began to trust our tests
4.1 What did we pay for it?
We stopped the business features development for 20 days during the first four months (November 2018 — February 2019) of using Stop the Line practice. It was about 20% of all the teams’ capacity. It is a huge time investment. In the beginning due to Stop the Line we failed about 50% of the sprint goals.
Figure 6. Stop the Line time by month
The total salary cost was $87 500. Stop the Line also increases the teams’ stress level. Until the deployment pipeline issues are resolved developers have to switch from business features development to deployment pipeline improvement tasks multiple times per sprint. It annoys them, they complain about defocus.
5. WHAT WE LEARNED
Stop the Line is a vivid example of strong decisions made by the teams themselves. A Scrum Master couldn’t just come up with a brilliant new practice. A practice is best implemented when the teams invent it. Preconditions are surely necessary — an atmosphere of trust and a culture of experimentation. We also need trust and support from the business, which is possible only with full transparency. Process feedback loops, like regular Overall Retrospective with all the teams’ representatives help to invent and adopt new practices.
Stop the line is a self-destructing practice. Paradoxically, over time, the practice of Stop the Line should kill itself. The more we stop the line — the more we invest in the deployment pipeline — the more stable and faster the release becomes — the less reasons to stop. Eventually the line will never stop unless we decide to lower the threshold, for example from 48 to 24 hours. But thanks to this practice we have already improved the pipeline. Also teams are now fully proficient in the process of delivering value end-to-end, from development to production. This is authentic DevOps.
Focus is the key. Because of focused work on accelerating the deployment pipeline, we never stopped in March and April. Focusing all the teams on the deployment pipeline issues helped us to resolve most easy to fix issues in the first two months. More difficult to find and fix issues like flaky, dependent and slow tests required dedicated team’s work for 3 more sprints (1,5 months).
What’s next? I don’t know. Maybe we will give up this practice soon. The teams will decide. But it is obvious that we will continue to move in the direction of Continuous Delivery and DevOps. One day my dream of releasing a monolith several times per day will come true.
I appreciate very much help and support I get from my team members. Special thanks to Dmitry Pavlov, our Product Owner, who always supports me and helps improve our processes. Many thanks to Alex Andronov. The changes would not be possible without constant help and support from my colleague, the first full-time Scrum Master in Dodo – Dasha Bayanova.
Special huge thank you to Rebecca Wirfs-Brock who helped me with writing this report. Rebecca, it was a great pleasure working with you. Your help, support and diligence are very much appreciated.
 Fowler, Martin “Continuous Delivery” https://martinfowler.com/bliki/ContinuousDelivery.html
 Continuous Integration on LeSS.works web site https://less.works/less/technical-excellence/continuous-integration.html