TDD is great for testing code logic and small parts of an application. Most teams are pretty good at this.
However, what happens when your application logic depends on data? Maybe you have a reporting/analytics system or a complex CMS. Business logic varies with the shape of the data. Unit testing alone falls short. Integration and functional testing become more difficult and costly.
To solve this problem, we developed a multi-layer approach to testing our analytics application. We combined unit, integration, and functional tests with multiple test data sets to balance test coverage and maintenance costs.
Brief introduction. My name is John Reuning, I work for CA Technologies in the Rally Software Group. So Rally Software was acquired by CA about nine months ago. And so I’m part of that organization.
To give you guys some context and what we do. I have a couple questions for you.
How many of you love data?
Okay, how many of you hate data?
There we go. Yes, me too. I have a love hate relationship with data, especially when it comes into how the behavior of an application varies with the shape and the nature of the data.
And I’m going to talk a little bit about the lookback API application within the Rally stack. So the lookback API is something that allows people to access time series data of all of the user stories, defects, tasks, etc. in the rally tool.
What this means is there’s a huge amount of data in there. And there’s an ETL application that takes all of the data from our main product database, where everybody interacts with the user stories, cast defects into the lookback API database, which keeps almost every version of every artifact in the database. And so what we have here is kind of an indicator of the flow of the data, it starts off in our main product database, and then gets shuffled over through an ETL process into a look back API database, and then on into an aggregation and analytics system that has the data going out the door to users based on the requests.
So the question is, how do you test something, we pretty much I think the community is more or less figured out testing within an application, a unit test integration test where the code is the behavior of the application, but what happens when the behavior of the application varies on the shape of the data – have to include a bunch of data in there. And so I’m going to talk a little bit about how we have a multi layered approach to testing a data heavy application stack.
And so what you see here basically, is these circles indicate focus for the different levels of testing. So we have very traditional unit and integration tests, we cover methods, we cover classes. Some of the integration tests, though, you start to see … encompass parts of the database. So we have this thing here is an ETL application. And we have a Spock based integration tests that set up various scenarios within just the ETL portion of the application, and measure the behavior test the behavior of taking data from one database, transforming it and putting it into another. And it’s all pretty well constrained at the integration test level. We set up the test cases and have the the data generated and then tested.
And the same holds true of the loopback KPI in the aggregation piece of this where data is coming out of another database. And then, again, going through various forms of transformation, whether it’s taking fine grained time series data, and aggregating that and sending it up in summary form, or just taking time series data and sending it straight out the door.
Now at the outer layer, we’ve put together something that uses j behave, and basically a BDD style scenario approach. And here we use a combination of simulated data as well as snapshot data from a production environment, to go through the entire process of interacting with a web services API to create user stories and defects and put them into the system and then watch the transformation go take place. And then on the far end of that, query the data back out and verify that the whole process has worked.
Where does this work? And and where does it not? Obviously, it’s pretty good coverage over the transformations is pretty good coverage over the data model. And the data structures involved. It’s the snapshotting is kind of hard, it does tend to be brittle, a little bit like the UI testing with Selenium. We occasionally see some problems when a little piece of data changes. That was the basis for some assumptions in the test. It’s also a little bit slow, as you might expect, going through full data transforms and interacting with multiple applications.
So that’s pretty much it. I’m open to questions. I also have a bunch of these these are lean coffee kits in here is a cup with Sharpie stickies if you’re not familiar with lean coffee. Find me afterwards. I’ll be happy to explain that we do a lot of lean coffees in our organization. Thank you very much.