RESOURCES

Lessons Learned From Agile Testing a Mission Critical and Complex System

About this Publication

In this Agile software development environment, are you working on legacy software that has an impact on our nation’s security? Or are you asking yourself how can I do test driven development with such a huge legacy code base? Our project has felt these pains and would like to share how we’ve overcome some of our biggest challenges.

1.      INTRODUCTION

At Asynchrony we’ve had to learn to crawl, walk, and run over and over as technologies and our customers have changed throughout the course of the development of the Mobile Field Kit (MFK), one of our government projects. Our project is complex with many moving parts; it’s got multiple technical stacks, changing customers, lots of unusual equipment, lots of developers, and, with this being a software development company, we have had a good number of developers who have cycled in and out of the project team. We feel we have proven ourselves at being really good at changing and adapting. The success of this project has been due in large part to its ability to remain flexible, continuing to meet our customer needs as they evolve, and the creation and achievement of high quality processes. With all these moving pieces, how do we assure we are creating a high quality product and advocating for quality throughout our processes? We are able to do so because of our ability to combine the best traditional testing practices with unconventional quality processes to help our users succeed in their missions.

2.      Background

The Mobile Field Kit (MFK) puts state-of-the-art tools and technology directly in the hands of field team members, including first responders, civil support teams, and other physical security teams. The MFK allows them to acquire, store, assess, and share information, both within the team and across organizational boundaries. The system was initially created by the U.S. Navy for the Joint Explosive Ordnance Disposal community. Through Department of Defense-sponsored activities, the Army’s 20th Support Command and National Guard Civil Support Teams currently utilize the MFK as a tactical tool for Weapons of Mass Destruction (WMD) defeat missions specifically focused on radiological, nuclear, and chemical threats. The software has been used for everything from search and entry such as determining if a suspicious smell coming from someone’s home is potentially threating, to reconnaissance, with missions ranging from detecting radiation/chemicals in the public water system after a chemical dumping, to providing WMD defeat support for such events as the presidential inauguration, the Super Bowl or the Boston Marathon.

The system is made up of multiple technologies. The application runs on a Windows machine and on Android. It has interoperability with multiple sensors, radios, and other Off-the-shelf applications. The product utilizes a mixture of communication technologies ranging from mobile networks to closed secure portable tactical radios. The MFK software is typically configured in a package that includes the software, hardened tablets and/or laptop computers, Android phones, secure wireless mesh-networking and hardware facilitating WAN connectivity via cellular, satellite or other networks. The MFK configuration can be housed in a portable, hardened case that includes everything needed for deployment across the city or half way around the globe. It’s built to easily integrate a wide array of communication and sensor suites within a field-tested, standards-based platform.

3.         MY STORY

3.1      History

Below is a timeline that shows how the MFK quality process as evolved over the years. As you can see, we have definitely needed the agile principles in place to handle the transformation of users and the application.

At the same time there have been many changes that have impacted testing. We will discuss some of these in detail below.

3.2      Automation

In 2007 we started using some of the principles found in Extreme Programming. However, without someone there to assist them as we started this journey, the team was lackluster and would only occasionally pair and only sometimes write unit tests. A Quality Assurance (QA) engineer was added to the team to develop a suite of manual user acceptance tests (UATs) for regression testing and to write tests as stories were completed. This introduced a new problem when we found it difficult to find sufficient time to run all those manual tests. We decided that rather than running the tests frequently, we would wait until there was a potential release candidate and would then set aside time to run all the tests. This would cause some bugs to be found late, which in turn could cause a delay in the release. We then ended up with large pieces of code that had not been tested, creating fragile code that would break as future changes to the code were made.

In the beginning of 2008, a new technical lead was added to the team that insisted on full unit testing. We began creating automated acceptance tests for stories as they went through the board. The team initially did an investigation into testing tools and selected a tool called FIT. In FIT test cases are displayed in HTML tables, which can enable customers to assist with writing the tests. The team went with FIT and ran with it for 1-2 years, but it was painful. In order to test with FIT we had to put in our production code the ability for FIT to inspect the objects so we could see the state of the object. We actually created a design pattern called the “violator pattern” to allow this. In retrospect, we wish we would have done an experiment where we used FIT for a month and WindowTester for a month to see which worked better.

In September of 2009 after our evaluation we purchased the required licenses to use a tool called WindowTester. Once the developers started using WindowTester we found it easier and wished we had used it from the beginning. The team continued automating as many of the manual tests as we could. Now we don’t have to spend weeks and weeks doing manual regression testing and we have confidence that new code is not breaking any existing functionality. We monitor the pass/failing of builds on a big monitor located in within view of the entire team. When a test fails, the team sees it immediately so that it becomes a high priority and we can address those fixes during our daily Kanban flow. As our customers have continued to request new functionality, the code base has continued to grow, and the number of tests has continued to increase.

After some time, we noticed the automated UATs started taking over an hour to execute, so the team decided to implement a matrix build process that would allow them to spread out the load. When running the tests linearly, they could take up to more than two hours to run. Currently, we have 18 build boxes and we are averaging 20 minutes for a build. From a testing pyramid perspective, we currently have approximately 12,000 unit tests that take approximately 6 ½ minutes to run. We have approximately 1000 automated user acceptance tests that take about 20 minutes to run, and 200 manual acceptance tests that take 2 people about 3 days to execute.

An additional concern about our automated testing until recently has been our inability to test our integration with sensors in an automated fashion. One of the most noteworthy benefits of the MFK is its ability to quickly integrate with a sensor. Sensor integration along with the MFK communication strategy provides the most value to our users of the MFK. The ability to automate tests using a real sensor is difficult in itself. In many situations we found that we may not even have access to an expensive radiological sensor. Initially, the team would try as hard as they could to make sure we could get the sensor that we were going to integrate into the office, but as time went on, we found times when we could only connect to a sensor over the Internet, or that we could only get it in the office for a couple days. Unfortunately, this didn’t enable us to get all of the data we would like, and we were only able to test a limited amount of the sensor’s capabilities. Our sensor integration is fortunately dependent on an IP based system where most of the data is similar. Team members seeing all the pain it took to get these sensors in house and how we were unable to test all the necessary functionality decided to figure out a way to ‘simulate’ a sensor. We received agreement from the team to allocate time and a pair to develop an XML simulator. We basically took a simplistic approach to parsing of the data. The simulator generically reads data from a file and then sends the data back over the network. The big win is how generic it is and with most of our data being pretty similar, we can use this simulator to test about 6 of the 10 sensors we are currently integrated with.

An additional challenge we have been able to overcome using WindowTester is the ability to start and test multiple instances of the MFK that can communicate and coordinate with each other. Having this allows the team to test their ability to send data from a sensor to the MFK and its nodes to verify the data is transmitted and displayed correctly. With the integration of the Android platform, we can also verify that the data coming in from the Android platform is parsed correctly and displayed correctly within the MFK.

3.3      Field Testing

We try to do functional testing that will give us the most bang for our buck. One way we ensure quality and usability of our application is by going out with the equipment and testing using scenarios that our users might actually participate in. The team originally started going out to Forest Park in St. Louis a couple of weeks before a release to test a build candidate in a more realistic environment with a mesh network. Testing in the park allows us to get away from tall buildings which can cause lots of interference and test the application on a more reliable wireless network where there is little interference.  We also know that in lab testing vs. real world testing we always see differences, things will work fine in the office but not so well when we go outside.  Because our system is dependent on the network, when you get on the edge of the network, things behave differently, and odd cases can happen.  Another advantage is seeing the application in the sun; since, our users are outside using the software. We need to test the user interface with a glare on it. This has led us to implementing higher contrasting colors or making changes in buttons/windows to be more visible. These types of tests have always fielded something, such as issues with synchronization when things are going in and out of network to data out of order.

QA saw the benefits of testing in Forest Park and wanted to do this type of testing on a more regular basis to enable us to find some of our bugs earlier. We agreed to try and start doing field tests around our office building, but we ended up often canceling because every time we went to do a field test, someone would say let us finish this one story and then we’ll do the field test. This would go on for six months or more where we were always putting off field testing until tomorrow for whatever reason. Finally, we got agreement from the team to perform field tests every Tuesday morning no matter what. We might modify the scenarios based upon what’s in the build but we always test something. The tests are meant to verify functionality that was completed in the past week is fit for customer use. We also see benefit because it gets the developers away from the code and actually looking at the application and thinking like their users.

In the beginning, the tests were mostly focused on equipment, most importantly the radios. We struggled many times to not make the test about the equipment but rather about the application because frequently it was difficult to get all of the equipment to work. Something that has helped us is using cellular for our network for most of our tests. This enabled us to concentrate more on the software and less on the application. On a negative side though we again fell into this trap of not seeing the edge cases which would possibly lead to a support call regarding someone having an issue with the MFK on a radio network. We recognized we need to test sometimes with cell, sometimes with the Wi-Fi radios, and other times with both. Crawl, walk, run, when we started we didn’t even know how to crawl, we didn’t know how to perform basic tasks, and we kept pushing it off, and pushing it off. Then we said, “You just got to do it,” and we have practiced it, and then as we’ve learned, we’ve optimized it.

3.4      Getting To Know Our users

After the initial success of the project and as our user base was changing, we ran into challenges with obtaining access to our users. Civil Support Teams are busy and their schedules are more packed than ever before. Our users are very educated and specialized men and women who are needed on the ground protecting the country. Unfortunately, this has made it hard to get them to have time to assist with the writing of stories or test cases and spending time at our Downtown St. Louis office. Ultimately, our success comes down to our ability to deliver quality to our customers and part of that assurance process is actually sitting down with our users.  In the past we would do this with occasional trips and trainings where we could get feedback and that in turn would come back and get filtered down to the team.

For example, we were testing our application based upon a standard pattern of users going down range a short distance with a wireless mesh network that, due to the terrain, had little interference. As it turned out, the Civil Support Team users in the meantime were struggling to use the system in a more urban area, where tall buildings and concrete infrastructure caused issues with the mesh network. They were also were beginning to use the system across long distances. We didn’t realize this until the program manager, a developer and I had gone to a CST site to provide training on our latest release. Our program manager decided that not only would we do training but this would be a good time to have the CSTs show us how they were actually using the application. The CSTs immediately began to show us a disturbing issue where they would have communication problems when a team goes out on a mission. They were feeling frustration because they couldn’t communicate with the person next to them since they had lost connectivity with the hub of our hub-and-spoke network architecture. The users creatively came up with a new-work around for how our network is setup. Their idea is to have multiple hubs going down range, so that all of the people in that group near the hub would always be connected. So instead of the usual one hub/command node topology, they were having multiple hubs, one with each team so they could stay in contact with each other.  This solidified to us the importance of tight feedback cycles with end users/customers.  We knew we needed a mechanism to obtain an understanding of our users to assure the Development and QA teams were creating and testing the ‘right’ things. This visit also made us realize the importance of getting QA and developers in contact with our users and to provide support not only with our application, but with the entire mission. We started becoming an integral part of the CSTs workflow by providing capabilities and the hardware to run cell networks that allowed them to cover their growing need to support new types of missions.

In 2014 we had a great idea to bring all of our customers together for a “community of interest” meeting, which would also have a secondary benefit of helping us in assuring quality. Appreciating what we had done our users agreed to us hosting the meeting. All pilot CSTs came to the first User Group meeting in March of 2014. This meeting was the first time to get users together and to have good interactions between them and our development team. We all learned from each other. There were lots of hands-on activities with the MFK using different aspects of the system. We had them walk us through their pain points. We took these and identified solutions to improve their missions, and QA became smarter in their testing. This year’s meeting was a little different because the goal was to get feedback from all the CSTs on how they use the system. The teams again came together and met on May 2015, for a week these teams dedicated themselves to sharing how they had planned on using the application in upcoming events. The QA team could then take what was learned and prepare realistic testing scenarios and have an appreciation of the conditions in which the users actually used the application. Working with users remotely who cannot be embedded with the team or physically present on a regular basis, highlights the critical importance of our user group meetings.

3.5      Automated Endurance Test

We’re only ever doing a field test for couple hours, but we know some of users have to use the application for multiple days. We also have a long standing issue where if you have multiple tacticals and multiple sensors setup and running for a long time the system eventually just bogs down and becomes unusable. Trying to capture the problem in a way that is repeatable has been challenging. We needed to develop a load testing mechanism that is as automated as possible so that it can run regularly and efficiently. This is exactly what we’re working on now. We had developed a set of manual stress tests that involved setting up a large number of computers connected to various sensors, and this alone could take as much as a half a day to assemble.  We didn’t want something that would take hours to setup, and we didn’t want to bring back a process that only gets tested right before a release, potentially causing us to catch bugs late in the game. We needed to be able to run this test as a part of our continuous integration so that we test it early and frequently. I have two tests in particular that I would like to share.

The first basically took the test mentioned above and put it together using WindowTester running multiple nodes and sensors. We have set this test up so that it runs continuously until the application breaks, and then error logs are created to capture any issues. These logs are then verified manually by the development team. When we initially started running this test, it would only make it a couple days before the system would fail. After we were able to pin-point what some of the issues were and fixed them, we were able to run this test for over a week. This is a very unusual test because it runs for a very long time and we intentionally let it run until our application fails. Some questions that we were trying to answer are for example; At what point does the application break down with x many sensors? How long can we run the application before we start to see the application slowing down?

The second test involves verifying that we do not see an increase in the amount of time it takes for the data in our application to sync. For this test, we again put together two MFK nodes and have them lose connectivity for x number of hours. We then monitor the time it takes for them to get their data back in sync. We plot the duration it takes to sync over time to determine if we see an increase in the amount of time it takes for the data to sync. This test result also gets displayed on our monitor in the MFK area to ensure when we see an increase in sync time that we can turn around and fix it quickly.

4.      What We Learned

We started with development approaches that have led to high quality production such as test driven development, pair programming, continuous integration and continuous improvement. Our quality assurance processes are designed to prevent bugs, not just fix them. Before we write a single line of code, we create tests to validate whether each unit of code will work as intended and we continuously integrate and test to ensure that the system works exactly as it’s designed to work after each and every change. By putting together a combination of traditional testing practices, automated testing practices, as well as continuous learning and improvement. We have become the application our users depend on when doing security for such events like the Boston marathon. We feel we have had great success on our project, and the future looks great for our project.

5.      Acknowledgements

Special thanks to the following who allowed me to ask them a ton of questions and gave me the right answers to assist with the writing of this paper: Mark Dinman, Kevin Pfarr, Jake McCormick Dan King, and Mark Balbes.

I want to thank my fellow team members Kyle Ladd who helped me get this paper started, Danny Elfanbaum who managed to help me with my grammar and putting the story together, and finally to Tom Hermann who was also able to help me fill in some of the blanks.

Special thanks to Jason Tice who is my advocate and pushed me to do a presentation as a ‘growing’ experience.

Thanks to our Government sponsor, Luke Erikson.

Thanks to my shepherd, Robert Nord. Your advice was sound and your patience was masterful! You even came looking for me when I was missing 😉

About the Author

No bio currently available.