The Meaning of Unit Test Data

07. February 2017 Uncategorized 0

The data you use in your unit test communicates messages you might not be meaning to communicate. For the sake of this post, unit testing means any tests written to test a specific portion, or unit, of code. It does not mean Test Driven Development, whether you’re writing tests first or tests after, the data you use is communicating.

Let’s start with a simple example. You have a function named getFullAddress. This function combines the street, city, state and postal code into one string. You might write a test like this:

address.street = “a”;

address.city = “b”;

address.state = “c”;

address.postalCode = “d”;

expect(address.getFullAddress()).to.equal(“a b, c d”)

What does the data convey here? It conveys that the length of data doesn’t matter here. For this function, there’s nothing special about any of the fields, you’re just going to combine them all into one string. You could have just as easily written:

address.street = “2825 Lexington Rd”;

address.city = “Louisville”;

address.state = “KY”;

address.postalCode = “40206”;

expect(address.getFullAddress()).to.equal(“2825 Lexington Rd Louisville, KY 40206”)

This doesn’t really communicate any more information about your code under test, but it might be easier for other developers to read, as that looks much more like an address than “a b, c d”.

Let’s look at another example.

You have a function getActiveUsers that you want to test. Its purpose is to iterate over a list of users and find all the active ones. Your data might look like this:

users = [{

id: 1,

active: ‘true’

}, {

id: 2,

active: ‘false’

}];

expect(getActiveUsers(users)).to.equal([{id: 1, active: ‘true’}]);

Here you’re conveying to future readers that the active flag isn’t really a boolean, instead it’s a string for some reason. The API that you’re getting data from might actually return a boolean, but this test is telling future readers to expect that the data is a string.

So far this is all pretty basic, let’s look at one more test.

You have a findHomeAddress(). This time your function takes a list of addresses and finds the one that has the designation “isHome” set to true. What would you think if you saw the following test data:

addresses = [{

id: 1,

userId: 1,

isHome: true

}, {

id: 2,

userId: 1,

isHome: true

}]

When I see that data I immediately think one of two things:

This system allows a user to have multiple home addresses (both addresses belong to the same person and both have isHome set to true)
This code was copied and pasted to create 2 objects

If #1 is true, that’s a great piece of information to have, because I might have assumed that the system didn’t allow it. I would go look to see if I could find a test that specifically called that out, something like “A user should be able to have multiple home addresses.”

If #1 is false, then the best case scenario is that you’ve mislead someone who comes along later and reads the code. The worst case is that you have now weakened confidence in the tests. Some questions your readers might have are, why did the test pass? Can I trust other tests to use the right data? Are these tests deliberate?

Maybe the problem wasn’t intentional. Perhaps point #2 is the real culprit, the code was copied and pasted. If that’s the case, what does this test data communicate? It likely communicates that testing isn’t considered a first class citizen on this project. Test cases aren’t treated with the same care as non-test code. Or, if you’re really unlucky, copying and pasting is common throughout the project, which is a bigger code smell than just some haphazard test data.

Why does data get to this stage? I think it’s because we’re often unclear why we’re writing tests. We lose site of the fact that tests communicate how the code functions. If you have a large test suite, then it can do a pretty good job telling you how the code works (whether the code works correctly or not is a different topic. But your tests do tell you how it works.) As a result, we often set up test suites with a lot of data up front, and then write tests, instead of asking ourselves “What data does this test case need?” When you start asking that question, you tend to wind up with more meaningful, and limited, test data.

Remember, tests are code, and code conveys meaning to its users. Just as I write my code in a way to try and make it easy for other developers to read, I need to write my tests in the same way.