“It is not the strongest of the species that survives, nor the most intelligent, but the one most responsive to change.”
Although widely attributed to him, Charles Darwin never actually made the above statement. However, Darwin still presents us with some interesting concepts we can use in reflecting on our own experiences.
This experience report covers the ongoing, 14-year evolution of an engineering team at a web hosting company that powers a sizable portion of the Internet. It starts with the very first developer climbing out of the “primordial ooze” of a burgeoning company. The timeline progresses to the modern era and the behaviors of a full research and development team discovering and experimenting with Agile methods to provide greater customer satisfaction within a nearly 500-person organization.
In the beginning there was the Internet. It was an uncharted world explored by the likes of academics and the National Science Foundation. Eventually, Mosaic gave us a more capable looking glass in the form a popular graphical browser in 1993. A few short years later, the World Wide Web was exploding, with growth only continuing to accelerate.
1.1. Brief History
In 1997, a company was born in a Michigan high school student’s basement to offer a way for businesses and individuals to get their own presence on the Internet. That quickly matured into providing professional data center space, backup generator power, and large, redundant connections to one of the largest hubs of all Internet traffic. Staff are now available around the clock to answer support questions and make sure thousands of servers are running properly.
This is the story of a company called Liquid Web. Specifically, it is the experience of its Research & Development (R&D) department over the past approximately 14 years. The authors’ goal is to cover the subsequent ups and downs inevitable in evolving from a one-man outfit to nearly 40 team members today. This includes the more recent shift into various Agile approaches to facilitate the team’s growth.
Four years after the start of Liquid Web, it hired its first developer in 2001 to automate processes and workflows to keep up with demand. In those Precambrian times, programming efforts were frequently interrupted by support requests, system administration and other unplanned incidents that arise while operating a company in such a quickly and constantly shifting industry such as web hosting. Even with all the interruptions, having a developer on hand was a huge aid to the growth of the company.
After some successes and 10 years of operations, the company was able to amass a talented group of 5 developers. As other departments grew and were able to deflect the distractions of earlier days, the focus of this first actual development team became automating server deployments, effectively building their own version of “the cloud.” There was a clear direction and active decision-making from company leadership, along with a sense of urgency to get the product released to stay competitive through a significant industry shift. The initial version was released in just over a year. It was an accomplishment everyone involved says was extremely demanding, but also very rewarding to see the original goal and vision completed.
1.2. About the Authors
Blake Nyquist joined Liquid Web in mid-2012 as a Project Manager. Wade Wachs joined Liquid Web in early-2013 as a Software Testing Manager. Today, they operate as Agile Coaches within the R&D team. Both have been active in the various leadership discussions and decisions affecting the department since joining the team. Having worked very closely with many of the original team members in affecting changes, both have developed an understanding of the history surrounding the department.
1.3. About this Report
This paper will include personal experiences from the authors, as well as stories that have been shared with them from long-term members of the R&D department. Most of the insight derived from early times originates from an ever-shifting oral history, making complete accuracy difficult to achieve.
The stories from the team will be loosely grouped into specific challenges the team faced. While some effort will be made to show the timeline and progression of the team, anecdotes may be offered out of order to promote a more complete view of specific challenges. Any names have been changed in order to protect the innocent and others.
2.1. The Phoenix Operation
“At each period of growth all the growing twigs have tried to branch out on all sides, and to overtop and kill the surrounding twigs and branches, in the same manner as species and groups of species have at all times”
― Charles Darwin, Origin of Species
With the first taste of success from an entire product line built in-house, and the desire to build new products and continue to expand its business, demand for development activities kept growing. In response to the demand, adding more developers, testers, and system engineers was deemed priority. In 2010, an office was opened in the greater Phoenix area with a team of 6 brand new hires to assist in growing the team. Having a remote location 2000 miles from the home office provided the first set of large challenges for the entire team to overcome.
Some might theorize that doubling the size of the team should double the output of the team. Leadership was aware that a new office would not double capacity overnight, however no one was aware of just how many issues would creep in to slow output even further.
Communication was the first big hurdle that needed to be addressed. Out of sight, out of mind was the rule of thumb. The team in the home office was focused on completing their projects and adding features to the new cloud product that had launched 2 months prior to the new office opening. The team in the new office was stuck trying to familiarize themselves with a new company, a new industry, and a new codebase in the relative vacuum of a 1-week visit and a few infrequent emails.
With that lack of communication came a lack of understanding about what was important to Liquid Web. Not only were priorities difficult to understand for the new members, the company culture was lost on them. The ways in which the original team worked, functioned, and supported the company’s goals were, at best, extremely challenging for the new remote team members to absorb. They struggled greatly to comprehend their place in the company and how to make significant contributions. This led to frustration from the home office when nothing was being accomplished.
2.1.2. Response / Results
The first response to assist in the communication gap was to purchase a video conferencing solution to connect the two offices. This was a relatively simple change, and still provides some improved communication. Being able to see body language and hear tone of voice makes the communication more impactful. However, this tool actually has to be used for it to be helpful, and there was still the problem that neither office was very good at initiating communication.
One attempt at solving that specific issue was to leave the video conference connected all day. The goal was to provide a kind of hallway experience where members of the geographically separate teams could casually chat with each other. The outcome was more of an awkward ‘Big Brother’ kind of experience with odd bits of noise floating across the line along with a feeling that someone was always watching.
A few weeks into leaving the video conference running all day, the practice was abandoned. Eventually the team became conscious enough of the video conferencing tools and began to include members of the remote office into meetings happening at headquarters, but this far from solved all of the communication and culture problems.
Several months into the new office, team members in the office became frustrated. The video conferencing tools were not adequately addressing the problems of communication and understanding culture. It was decided to send one of the core team members to the new office full time. The goal was to share institutional knowledge and cultural understanding in order to incorporate the new members into the ‘feel of Liquid Web.’
Three of the six developers hired into the new office left the company within nine months. Unfortunately, sending the home office developer to help the situation resulted in him feeling similarly isolated and disconnected from the company. He eventually left the company to work for a competitor.
After years of continually fighting the distance and trying to incorporate the remote team, the most recent response has been to embrace the distance, and focus that team on one common task independent of the tasks in the home office. This was part of a broader experiment of breaking the entire team into smaller squads, which will be discussed in further detail later. Thus far, this approach has been the most effective solution for the problems faced by creating satellite offices.
By the time this paper is published, Liquid Web will have opened a third development office. This undertaking has been significantly different based on the lessons learned in Phoenix. The new office is an hour away from our home office, and is being launched with a handful of long-term members of the team. New members will be brought in at a significantly slower pace, and all members of the team will be available for on-site meetings in the home office much more frequently. Time will tell if this strategy works any better than in the past. Based on the lessons learned, however, it should be off to a much better start.
2.2. Ready Alpha Squadron
“It is necessary to look forward to a harvest, however distant that may be, when some fruit will be reaped, some good effected.”
― Charles Darwin, Voyage of the Beagle
As the size of the team continued to increase in both locations, the capacity of the team increased as well. With more people, the focus of the team was able to split from one major project to handling multiple projects at a time. At first, management was very excited about being able to add more and more work into the pipeline. Over time however, that excitement and demand quickly outpaced development capacity, creating another set of challenges.
Upper management and other departments had been able to cajole “yes, we’ll work on that” for 30 simultaneous projects, in addition to the constant bugs, fires and mountainous backlog of projects not yet started. There were less than 20 people available to work on said projects. With 1.5 projects per developer, progress was meager at best. Stakeholders waited for months to see any significant progress on their seemingly small requests. Developers were frustrated from always being switched from project to project without fully completing a task.
2.2.2. Response / Results
There have been many responses to the issues of keeping focus through an increasingly high demand on developer time. This paper will specifically discuss four such attempts, which is only a subset of the responses over the past four years.
184.108.40.206. Iterative Development
The first attempt to focus on this problem was shifting the team into a 3-week release schedule. Developers would write code for 3 weeks, then put their mostly functioning code into a staging environment before starting the next cycle. Testers would then run regression tests against staging for the next 3 weeks. As they found issues, that would steal developer attention to resolve bugs and other problems identified. After the test cycle, code would be pushed to production, and the next build would be pushed into the staging environment. Rinse and repeat ad infinitum.
This solution was in place for several years, but had many holes. The time between a release to production and a new build coming into staging took 2-5 days. The amount of disparate work crammed into any given release was still incredibly high, continuing to split focus across the team. What focus was available, was then interrupted by issues found in testing pulling developers away from new code, investigate something the last worked on 3 weeks prior. Releases were typically large and interruptive to other departments due to planned downtime, as well as unforeseen problems that could impact others for as much as 2-4 hours. Everyone felt the pain of this process, including customers.
220.127.116.11. Focus on Finishing
Team leadership was very aware of the negative impact of so many competing priorities for team members. After several months talking about more focus, the decision was made to stop accepting any new projects. Discussions were had with management about selecting the right subset of ongoing work to continue, and what portions could be safely set aside until the current list was finished and released. A line was drawn in the sand, and the whole team was on board with not accepting any new work until the current workload was coded, tested, fixed, released, and functioning in production. The mantra in the team was ‘Focus on Finishing.
The 3-week cycle was thrown away in favor of finishing the list. Everything would be pushed out in one giant release. Energy was high. People were focused. After two months, the developers had finished many of the parts they were responsible for and testers were buried under a mountain of new code and features. The mantra had slowly faded from ‘focus on finishing’ to ‘no new projects.’ Wishing to capitalize on the momentum, the industrious developers decided to assist in the testing effort. They determined it would be a good time to re-architect the automated test suite to improve speed and reliability.
Over the years, the unit test suite maintained by the developers had become large and unwieldy. It took approximately 8 hours for the whole suite to run, and the results were often unreliable due to development environment issues, timing problems, and poor mocking for some of the more complex subsystems. The tests ran every night, but many of the failures were ignored, due to the known problems. There was certainly room for improvement, and developers were ready and willing to make these improvements under the name of not starting any new projects.
The fact that everyone ripping apart the test suite was in reality a rather large project seemed to elude the team. The effort was started. Every developer not actively working towards finishing the items on the list was engaged in refactoring test classes. The initial plan was to spend 3 days of concentrated effort, then circle back to the rest of the tests over time. Ten days into the project, the big release was almost ready to ship to production, and only about one third of the tests had been ported into the new architecture. The remaining two thirds of the tests were not functional at all due to suboptimal design decisions.
To make a long story fit into these pages, six weeks after the large release shipped, the team had started approximately 25 new projects, and the test suite refactor was still incomplete. The large release had gone out safely, but the goal of trying to focus on a significantly smaller set of projects had failed miserably.
18.104.22.168. One Tool to Rule Them All
One aspect of this lack of focus was evident in the way work was tracked. In 2008, an open source bug tracking tool called FlySpray replaced email as the tool for organizing all work done by the development team. A standard bug database, this tool was used exclusively until 2013, when Trello was introduced as a lightweight Kanban board.
Trello was piloted on a couple individual projects, and became the de facto tool for starting new projects. With each unique project, a new board was created, and cards added to track development work. With the explosion of projects, one individual could have items assigned to them in 4 or 5 different boards in addition to bugs and small requests in FlySpray.
Eventually FlySpray was abandoned, and all work was exported into Trello. This was a step taken to increase focus, and allow developers to have one tool from which to pull actionable work items. This consolidation reduced the number of tools that people had to look for work, but individuals still had work spread across multiple boards and projects.
22.214.171.124. Release the Squads
The leadership in R&D was quite frustrated. To no real fault of the developers, the big push to increase focus backfired in a not so subtle way. Something needed to change.
During the previous discussions around how to create focus, one idea that bounced around was to organize the department, which at this point in mid-2013 consisted of over 30 developers, testers, and systems engineers, into separate cross-functional groups that would each focus on one specific project at a time. With the previous experiment failing to reach the goal of focus, leadership decided to give this new idea a try. Several discussions with the whole department and many 1-on-1 chats with individual team members later, what became known as ‘The Squad Experiment’ was ready to be executed.
In planning the roll out, the number of groups changed from 3 to 4, but the plan was still to create cross-functional ‘squads’ with developers, engineers, and testers all focused on specific goals. These squads would persist project to project, provide leadership for individual projects, and have some freedom over the specific methodology over how their work was tracked and completed. This experiment was loosely modeled on the article “Scaling Agile @ Spotify” by Henrik Kniberg and Anders Ivarsson.
The remote office was organized into a self-contained squad and given a single project of focus. Due to the fact that all of the testers were located in the home office, two testers were allocated to the squad, but all other members of the squad were in the same office in Phoenix, and for the first time were all given one specific project of focus. While there are still trials with the distance and being separated from the main culture of Liquid Web, this change seems to have been one of the more effective methods in the 4-year history of the office at keeping team members engaged and productive. At the writing of this paper this experiment has been running for approximately 6 months, but early signs suggest the situation is improving.
The squad experiment has not been without flaws. Keeping focus on the most important tasks still requires vigilance and concerted effort to eschew distractions. Many of the squads are still mastering that ability. The failsafe of squads seems to be that when those distractions happen, the group is small enough to recognize them, and can take steps as a squad to adjust, remove the distractions, and move forward on their main project goal.
This experiment is currently ongoing, and will continue to adjust and change as time goes on. Thus far this experiment seems to be providing a framework for continuous improvement, and has held more staying power than any previous attempts at increasing focus and flow.
Much of the discussion on how to improve focus occurred at the team leadership level. Since the squad experiment, conversations have been pushed much closer to those actually doing work through retrospectives and other discussions. Including everyone in these deliberations earlier to work towards solutions may have accelerated the pace of advancement towards our current state. Perhaps, the most effective part of the squad structure is not so much smaller teams, but providing a better forum for useful conversations.
2.3. Managing Up
“It is always advisable to perceive clearly our ignorance.”
― Charles Darwin, The Expression of Emotion in Man and Animals
The company’s very first developer remains with the organization, and is an extremely valued and deeply loyal employee. During his tenure, the team of one has become much larger. As the most senior developer and the one with the most institutional knowledge of all our various systems, this developer has been promoted into a management position. He is now responsible for the entire department, which is entirely comprised of humans. They behave nothing like the 1’s and 0’s he’s dealt with his entire career.
The first challenge of this section is writing it in such a way that is very clearly not about any perceived shortcomings of a specific individual or set of individuals. Details may be changed to protect the innocent.
Over the years, the company’s first developer grew with the team, taking on more and more managerial responsibilities. The direction and coaching provided on what it meant to handle those new, human-focused responsibilities was minimal at best. A four-year computer science degree in no way prepares someone for a role as manager of other people.
As evidenced already, lack of direction and the plethora of distractions are problems that exist outside of the actions of this manager. However, after years of creating and running the department, it is easy to trace many of those problems back to this individual. In the months since the squad experiment this person has provided squads with tasks and distractions that very actively dissuade specific squads from accomplishing their main priority. Looking back, it is impossible for the authors to decipher if lack of focus in the culture started with this individual, or if the current distractions are the result of being steeped in a culture that interrupts. Regardless, the situation leaves us with a head of the department actively, not necessarily intentionally, distracting from priorities set by upper management.
This level of distraction is a perfect segue to the next big challenge. After years of problems and struggles between the R&D department and upper management, there is a sizable rift between upper management and this original developer. The intent here is not to place blame in any direction. The goal is merely to point out challenges faced by the team. This rift is certainly fanned from both sides. However, to maintain focus, this paper will discuss the impact of the R&D team’s leader.
2.3.2. Response / Results
There have been several attempts over the years to respond to this situation. Those responses will be listed in chronological order.
126.96.36.199. Do Nothing
This individual went almost 13 years within the company without receiving any formal feedback on his performance. Individuals all around this person saw the struggle to perform duties as a manager. The simple route was to do nothing. That option was taken for several years.
188.8.131.52. Talk about the problem
The next attempt to address the problem came in an unplanned conversation with the person. Several of the team leads within the R&D department were able to share direct feedback about their perception of their supervisors’ performance. Three hours later, 4 people emerged from the small office, exhausted after a very cautiously worded conversation trying to help the manager see the situation was as dire as the rest saw it. It didn’t work. The conversations in team leadership discussions got more awkward, and no progress was made on the problems of distraction and damaged relationships.
184.108.40.206. Greener Pastures
After the initial conversation from the team leads sunk in for a few months, feedback was delivered by executive management indicating expectations were not being met. The current position was not a good fit, and the person needed to find a new position within the company. It was plainly stated that the years of loyalty, institutional knowledge, and other talents deserved choosing any other desired role.
A month after the ultimatum, an email was sent to the team announcing the manager’s resignation from the company. At the time of writing, there is one week left until his departure. The effects of this final step are yet to be seen. The only thing certain is that with the end of this particular challenge, others will begin to present themselves.
The biggest takeaway from this section is the importance of open and honest feedback within a team. This individual did not receive feedback for a long time. At some point, feedback turned into something to ignore, if only for self-preservation. Had those feedback channels been open all along, perhaps this story would have a significantly different outcome. Those who remain in R&D are now, and will continue to be, pushing for better feedback to be available to all members of the team.
“Natural selection acts only by taking advantage of slight successive variations; she can never take a great and sudden leap, but must advance by short and sure, though slow steps.”
― Charles Darwin, The Origin of Species
For Liquid Web, experiencing Agile has been a slow succession of variations; many intentional, but not all. Some of those variations have been favorable and continue with the team. Others have been weeded out through Natural Selection. The processes we follow will continue to undergo minor adaptations and will likely evolve into something completely different from what is done today.
If Darwin were alive today, there is a very good chance he would approve the approach of making small, evolutionary changes within an organization. Such means of adapting to environments have worked for millions of years. Certainly, they can continue on for a few more years.
First and foremost, the authors are hugely indebted to their team’s willingness to participate in and help shape the numerous experiments in our effort to continuously improve. None of the reported successes would be even possible without their participation as the key ingredient. Second, all the stories, teachings, and reports preceding this paper have provided numerous insights and ideas that have fed into Liquid Web’s process of evolutionary change. The community’s penchant for sharing has enabled countless teams, including ours, to adopt their own improvements. Likewise, we thank the track committee for providing an opportunity to share our own stories. With a little luck and a lot of effort, this report meets their expectations for delivering observations and reflections that are of use to others. Finally, much gratitude is owed to our shepherd, Linda Rising. Her insights and experience provided guidance in making this report vastly more meaningful for our own reflections. Hopefully, we did justice to her thoughtful feedback by also writing something worthwhile to other readers.