Experience Report

Scaling agile development across loosely coupled teams using microservice architecture

About this Publication

We observe that teams from different organizations with different incentive structures tend to focus on different system quality attributes and that this difference can impede collaboration. In this experience report, we present a case study of a microservice architecture-based approach to this problem that we adopted during a collaboration between IBM Watson and IBM Research. We propose that the overall framework described is generally applicable to scaling agile development across teams with different incentive structures.

1. INTRODUCTION

The quality attributes (QAs) of a system, such as performance, accuracy, fault tolerance, reliability, and maintainability, are various factors which affect its behavior, usability, and design across all its components. QAs can have tradeoffs with one another and thus their relative importance to a particular system must be weighed and balanced. If a particular QA affects a software system’s architecture, it can be an architecturally significant requirement (ASR) [1].

A common trope is that of “research code” and refers to code written for research applications, with the subtext that such code can be difficult to maintain or extend, lack reusable abstractions, and have inexhaustive error handling. While it is true that code written in the course of a research project is often not suitable to be shipped in a product, another perspective on the phrase is that research code is intentionally written with emphasis on certain QAs, such as accuracy, over others that are more important to production code, such as maintainability.

The QAs that a development team focuses on often align with the incentives of the organization that team belongs to. In a research organization whose incentives are built around academic publication and furthering the scientific state of the art, an inclination towards focusing on accuracy at the expense of maintainability can graduate to a norm, while within a software engineering organization performance, usability, and fault tolerance will tend to be first concerns. When two teams from different organizations with different incentive structures and reporting chains are tasked to begin collaboration on a product, this difference in habits and norms can impede the development process or lead to tension.

In this experience report, we recount an approach we took towards preventing this tension by hoisting [3] certain QAs into a microservice based architecture so as to lead two teams to efficiently work towards a common goal while simultaneously following their respective organizations’ incentives. In contrast with agile frameworks such as Disciplined Agile Delivery [5], ours is architecture-first perspective that tries to use system design to scale agile processes across teams.

2. Background

IBM Watson and IBM Research are separate organizational units, the former being a software engineering organization first and the latter being a research organization first. They routinely collaborate and often a new product or feature is started with a collaboration between teams from each organization that have never worked together before. The difference in habits and incentives between these organizations needs to be resolved every time a new collaboration is formed, which especially comes to fore when considering what to do with a research prototype.

A research prototype is generally not written with maintainability as a focus, nor often extensibility or modularity. What portion, if any, of the existing code can be reused, where new code should be written, and how the ensuing development process will be conducted needs to be solved every time this situation occurs.

The problem compounds if product development must start while the scientific work is still being developed: should there be code handoff, a common codebase, a prototype/reimplement cycle, or some other method of collaboration? How does one engineer a product and deliver it on time when the algorithm development has not finished, or is not past the conceptual stage? And, as we focus on here, can the collaborative development process be designed to take advantage of, rather than try to work against, differences between team incentives and norms?

3. CASE REPORT

In June 2017, an engineering team from IBM Watson, distributed between Pittsburgh, PA and Denver, CO, and a research team from IBM Research, based in New York City, NY began partnering on the development of an artificial intelligence application built on top of an existing, productized platform to provide state of the art functionality to solve a novel artificial intelligence task. Though the natures of the inputs to and outputs of the desired solution were defined, the nature of the solution itself was not known.

The core development process problem was to find an efficient way for the engineering team to iterate on the system design while allowing the research team to iterate on the science in order to achieve a production quality, state of the art system within an aggressive deadline. Technical problems to be solved included a data processing pipeline, real-time propagation of and reaction to state changes such as new or modified data, task scheduling and status monitoring of training jobs on a high-performance computing cluster, and trained model management.

3.1 Strategy

Our approach was to not only identify specific QAs as ASRs but to go farther by hoisting them into the system design in such a way as to allow each team to focus on the same QAs it normatively does, without having to trade off between them. We used a microservice architecture to facilitate system design built around this principle. An essential auxiliary activity was the recording of relevant system design decisions using architectural decision records (ADRs) [6] to serve as a record of the design process.

3.2 System Design

Given that the types of inputs and outputs of the desired product were known, we chose an ensembling [4] approach in which any number of different microservices can implement an algorithmic solution for the desired task, so long as they implement a common interface (Fig. 1). User input is asynchronously fanned out to all Algorithm backends for their returns to be disambiguated and de-duplicated before being stored and made available for user retrieval. The common interface each Algorithm microservice implements evolved over time, but changes were infrequent and made only as necessary because any changes had implications both for the Ensembler service and for each Algorithm backend.

Figure 1. System Architecture

As the project developed and the various concerns of the overall application became better defined, we abstracted away each of those concerns common to all Algorithm backends behind APIs, leaving each backend itself stateless and only reading from or updating shared state as necessary. References to new or updated data arriving from the user via Ingestion are placed on a message queue for the Document Processor to pull from. Once Document Preprocessor completes and saves the output of its data normalization and annotation tasks to the Processed Document Store, it publishes a list of the consequent state changes, and each Algorithm backend subscribes to notifications of such state changes to create what is essentially a blackboard pattern [2], (pub*/sub* in the figure). Each Algorithm backend then triggers any required new training jobs via the Proxy, which abstracts away task scheduling in the Model Training service and which maintains the blackboard with training job status for the Algorithm backends to monitor. Once any training job completes, the resulting trained model is loaded into memory by the Model Server and made available for use. Finally, the user is able to query the system for inferences made based on the processed data, which are ensembled as described above and stored to the Training Database to be retrieved at will.

Using this architecture, reliability and fault tolerance are hoisted into the system design, leaving accuracy to be the chief concern of the Algorithm backend being developed by the research team and allowing the engineering team to iterate on the usual engineering tasks. The responsibilities for individual components of the system were discussed among and decided upon by both teams to avoid missed requirements and were then recorded in ADRs.

3.3 Summary of Results

Throughout the development process, as the concerns common to each backend became clear, utilizing a microservice architecture allowed the engineering team to iterate on the overall system design and implement individual components to provide the desired QAs, allowing sufficient time for the science produced by the research effort to evolve. For example, one backend was able to provide minimal service quality while initial nontrivial scaling and performance issues in the other backend were characterized and resolved.

This architecture was flexible enough to be adapted to the new business requirements that were added over time and allowed both teams to work with minimal communication and management overhead. In particular, this development process allowed both teams to work towards their own incentives in such a way that the final product had a cohesive design and had all desired properties.

Following this strategy, the teams quickly adapted well to this development process. However, a drawback to that process became apparent upon the discovery that, after two sprints with less communication than there should have been, both teams had identified and approached the same problem with mutually exclusive solutions. These solutions had to be reconciled, delaying development. Furthermore, towards the end of the project there was a significant barrier for one team helping another complete or debug an implementation because of a lack of familiarity with each other’s codebases. Nevertheless, and despite significant new requirements having been introduced midway through the project, the overall system met its functional specifications within a few weeks of the aggressive original deadline.

4. CONCLUSIONS

We found that choosing a microservice architecture can facilitate a development process for teams in different organizations and with different incentives to be able to do long-term work in a decoupled manner and having significantly reduced communication and management overhead. The overall process could be improved by increasing cross-team code reviews, having some minimal formal project management role, and/or making more frequent use of architectural decision records in order to avoid loss of cross-team synchronization.

We have examined here a method for scaling agile development across loosely couple engineering and research teams by hoisting specific quality attributes into system design using a microservice architecture. Although this is one specific application of this approach, we believe that the overall framework described here is generally applicable to scaling agile development across teams with different incentive structures.

5. Acknowledgements

We would like to thank, in no particular order, Gabe Hart, Mark Tyneway, Manjari Akella, Alfio Gliozzo, Nicolas Fauceglia, Gaetano Rossiello, Michael Glass, and Sarthak Dash for their passion and hard work throughout the project described here. We would also like to thank Tim O’Connor for his critical feedback throughout the writing process.

REFERENCES

Chen, Lianping (2013). “Characterizing Architecturally Significant Requirements”. IEEE Software. 30 (2): 38–45. doi:10.1109/MS.2012.174.
Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides (1995). “Design Patterns: Elements of Reusable Object-Oriented Software”. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
Fairbanks, George (2014). “Architectural Hoisting. IEEE Software 31 (4): 12-15. doi:10.1109/MS.2014.82
Rokach, L. Artif Intell Rev (2010). “Ensemble-based classifiers”. 33: 1-39. doi:10.1007/s10462-009-9124-7
Scott W. Ambler and Mark Lines. 2012. Disciplined Agile Delivery: A Practitioner’s Guide to Agile Software Delivery in the Enterprise (1st ed.). IBM Press.
Tyree, J. and Akerman, A. (2005). “Architecture decisions: Demystifying architecture.” IEEE software. 22 (2): 19-27. doi:10.1109/MS.2005.27

Add to Bookmarks Remove Bookmark

Add to Bookmarks Remove from Bookmarks

Agile2018

You must be an Agile Alliance member to download the report. Please log in to your account now, or join us to take advantage of all our members-only events and resources.

Your Bookmarks

No favorites to display. You must have cookies enabled to add bookmarks.

Have a comment? Join the conversation

Related Agile Experience Reports

Selling the Investment to Pay Down Technical Debt: The Code Christmas Tree

Extreme programming (XP) techniques and other advances in software development allow for creation of a code base which is more easily understood and maintained. This often leads to faster enhancements, at a lower cost, and with fewer defects. However…

View Session >

Share the Load: Distribute Design Authority with Architecture Decision Records

An architecture decision record (ADR) captures a single design decision in a markdown file stored close to the code relevant to that decision. ADRs are lightweight and easy to use but require practice to use well. This experience report shares lesson…

View Session >

Experimenting with Mob-Programming

Mob-Programming is a programming technique which involves all team members in collective coding sessions. After experimenting with four teams working in different contexts and application domains, we have reached several key takeaways and results. Wh…

View Session >

Cookie	Duration	Description
__cfduid	1 month	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
_csrf	session	This cookie is essential for the security of the website and visitor. It ensures visitor browsing security by preventing cross-site request forgery.
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
gdpr[allowed_cookies]	1 year	This cookie is set by the GDPR WordPress plugin. It is used to store the cookies allowed by the logged-in users and the visitors of the website.
JSESSIONID	session	Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
pmpro_visit		The cookie is set by PaidMembership Pro plugin. The cookie is used to manage user memberships.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	This cookie is set by Addthis to make sure you see the updated count if you share a page and return to it before our share count cache is updated.
__atuvs	30 minutes	This cookie is set by Addthis to make sure you see the updated count if you share a page and return to it before our share count cache is updated.
__jid	30 minutes	Used to remember the user's Disqus login credentials across websites that use Disqus
aka_debug		This cookie is set by the provider Vimeo.This cookie is essential for the website to play video functionality. The cookie collects statistical information like how many times the video is displayed and what settings are used for playback.
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
CONSENT	16 years 8 months 15 days 5 hours	Description Pending
disqus_unique	1 year	Disqus.com internal statistics
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
language		This cookie is used to store the language preference of the user.
lidc	1 day	This cookie is set by LinkedIn and used for routing.
locale	3 days	This cookie is used to store the language preference of a user allowing the website to content relevant to the preferred language.
STYXKEY_aa_signup_visited	session	No description

Cookie	Duration	Description
_gat_UA-17319182-1	1 minute	Set by Google Analytics and Google Tag Manager to enable website owners to track visitor behaviour and measure site performance. These cookies are used to collect information about how you use our website. The information collected includes number of visitors, pages visited and time spent on the website. The information is collected by Google Analytics in aggregated and anonymous form, and we use the data to help us make improvements to the website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_17319182_1	1 minute	Set by Google Analytics and Google Tag Manager to enable website owners to track visitor behaviour and measure site performance. These cookies are used to collect information about how you use our website. The information collected includes number of visitors, pages visited and time spent on the website. The information is collected by Google Analytics in aggregated and anonymous form, and we use the data to help us make improvements to the website.
_gat_UA-0000000-1	1 minute	Set by Google Analytics and Google Tag Manager to enable website owners to track visitor behaviour and measure site performance. These cookies are used to collect information about how you use our website. The information collected includes number of visitors, pages visited and time spent on the website. The information is collected by Google Analytics in aggregated and anonymous form, and we use the data to help us make improvements to the website.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
eud	1 year 24 days	The domain of this cookie is owned by Rocketfuel. This cookie is used to sync with partner systems to identify the users. This cookie contains partner user IDs and last successful match time.
S	1 hour	domain .google.com
uvc	1 year 1 month	The cookie is set by addthis.com to determine the usage of Addthis.com service.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.

Membership

Members-only Content

Become an Agile Alliance member!

Agile Conferences

Virtual Events

Community Events

Agile Essentials

Download the Agile Manifesto

Recent Blog Posts

Agile Resources

The NEW Agile Resource Guide

Sustainability Manifesto

MEMBER INITIATIVES

Your Community

Global Development

Global Affiliates

Global Affiliates

OUR POLICIES

ABOUT US

Become a sponsor

Experience Report

Scaling agile development across loosely coupled teams using microservice architecture

About this Publication

1. INTRODUCTION

2. Background

3. CASE REPORT

3.1 Strategy

3.2 System Design

3.3 Summary of Results

4. CONCLUSIONS

5. Acknowledgements

Your Bookmarks

Have a comment? Join the conversation

Related Agile Experience Reports

Discover the many benefits of membership

Thank you to our valued Agile Alliance Annual Partners

Our Cornerstone Corporate Supporting Members