As companies look to maintain lean budgets, Matt Heusser explains why it takes full organizational commitment and a clever approach to extract value from all QA resources.
Getting Real Value Out of Your Testing
About This Episode
Matt Heusser is the Managing Director of Excelon Development. Heusser is an award-winning thought leader and speaker. He also writes for many tech publications and authored the Lean Software Testing/Lean Software Delivery methods and courses.
(This transcript has been edited for brevity.)
DAVID CARTY: What is your lot in life? You probably want more than you have, right? Better career, bigger house, faster car, more coffee. That’s human nature. But stoics, like Matt Heusser, believe it’s helpful to take a step back and appreciate what you have.
MATT HEUSSER: At one point in my life, I was having a difficult time, and someone recommended that I read Viktor Frankl’s Man’s Search for Meaning. And Viktor Frankl was a — he went through the Holocaust and World War II. He was in a concentration camp. And they limited his meals to something like 200 calories a day. He’d basically get broth with, if he was lucky, there’d be a few peas in the bottom. And then, what he had to do was he had to build gravel roads by hand with picks, ax, wheelbarrows, shovels, that kind of thing, every day. And he found pleasure in things, like a sunset or, maybe if he was lucky, he’d get a cigarette a week, something like that. But I thought my problems are not anywhere near this bad.
Then, I read not Admiral Stockdale’s Thoughts of a Philosophical Fighter Pilot. He was a POW during the Vietnam conflict. During those 36 months, I think, he was in solitary confinement for 18 and leg irons for nine months. He was tortured, I don’t know, dozens of times. And it’s very similar, where focus on what you can control and don’t worry about what you can’t.
So this fundamental stoic idea of separating what you can control from the externals and then not worrying about your externals is really helpful to me.
CARTY: Stoicism is a philosophical concept that dates back to ancient Greece. While the practice has changed over time through the Renaissance and into today, it’s four primary virtues are wisdom, courage, temperance, and justice. The idea is, essentially, that the practice of virtue is all that is required to achieve happiness.
HEUSSER: I don’t mean to be morbid, but what if I didn’t have a roof? What if my furnace broke? What if there was a bank run as has happened recently? And the companies that I invested in all went bankrupt, and I’d really be in trouble. House burns down. And it makes you appreciate what you have. What if, we didn’t have — this is happening in parts of the country right now. They don’t have access to clean water. Right now, I could just turn a sink on, a faucet, and I can get clean, cool, refreshing, healthy water available to drink on tap. What if I didn’t have access to that? So by actually sitting in, then imagining the pain of the loss, then we come back. We appreciate everything all the more.
CARTY: Stoicism also comes in handy as a parent. Matt has discovered that his young daughter can be quite convincing in getting what she wants, such as extra screen time. Those stoic practices of perseverance and behavior over belief can be really useful, whether you’re dealing with a difficult coworker, or a defiant child, or a difficult coworker, who is acting like a defiant child.
HEUSSER: My challenge in the moment is in every single time, consistently, recognizing right. I’m called the parent here. This isn’t my peer. This isn’t an adult. This isn’t someone I negotiate with. We have boundaries, and I need to consistently enforce them.
I think the same things happen in the workplace. There are personalities that will do the exact same thing in the workplace, and it’s the same lesson. If you don’t nip the behavior in the bud, you’re going to see more of it.
CARTY: This is the Ready, Test, Go. podcast brought to you by Applause. I’m David Carty.
Today’s guest is stoic father and software development expert, Matt Heusser. Matt is the managing director at Excelon Development. He’s an award-winning thought leader, author, and consultant in the software development and testing space. Just because you test your software, that doesn’t mean that you test it thoroughly. And even if you test your software thoroughly, that doesn’t mean that you test it efficiently. In today’s economy, every testing dollar must deliver value. But how do you go about measuring that value? Not just at a point in time, but as an ongoing, evolving program. Well, there’s no easy answer to that question, but there is a Matt Heusser.
Let’s start off our discussion by explaining why there’s nuance to this topic in the first place. In our previous call, you said that people like an easy answer. Why is this a problematic idea, when it comes to test coverage? Everyone loves an easy answer, right?
HEUSSER: Sure. So what we do is, there’s an infinite number of combinations of possible things in the whole universe that we could test. I mean, just a mobile app. Well, what operating system is it on? What’s the form factor? Is it is it portrait or landscape? And then, that’s just the combination of how much bandwidth do we have. Then, that’s just a combination of the possibilities of the platform. And then, you have all the different — what if it’s a calculator app? What if I do 1 plus 1? What if I do 1 plus 2? What if I do — just adding two numbers together, there’s an infinite number of combinations. The first time you do it, there’s a memory leak that could be the second time. So the coverage space is infinite.
What we want to do is, we want to narrow the complexities down, simplify it to something that is valuable but understandable. Maybe we have a tool to do that. And when you say oversimplify, that’s one set of test ideas that we can run in an hour, if it is our smoke test, and then we say the software is good. And that’s our coverage model. Or we measure the goodness of the software by counting the number of bugs. Or we say, testers find bugs good. Developers have to fix bugs bad, and we actually create an active conflict between the two, where the developers have an incentive to have no bugs. The testers have incentive to find a whole bunch. All of those are very, very simplified, easy to understand models for software testing. They’re incredibly broken. They actually drive the wrong pages.
So Occam’s razor says the simplest solution is usually the correct one, but there’s another. I think it’s HL Mencken. He says, for any problem there is a solution that is simple, that is intuitive, that makes sense, that is easy to apply, and is wrong. So our work is creating ever more complex models that our challenge is to do it in a way that don’t become too complex to find that sweet spot in between no nuance, simple, everybody can understand it, but actually creates bad behavior and really, really complex. Nobody can understand, it doesn’t make any sense. Comprehensive, but it’s incomprehensible.
CARTY: Right. And part of the challenge here and part of the nuance to digital quality is, we have thousands of devices, operating systems, as you mentioned, right? This is why it’s important to consider the value you get from your different types of testing. You work with clients in a variety of industries. You’re tapped into how they think and their action items. How good a job do organizations do generally when it comes to assessing the value of their testing and adapting to whatever those findings are?
HEUSSER: Oh, my goodness. So there’s the state of the practice, and there’s the state of the art. And those are two different things. So I’m almost embarrassed. The answer is so bad that I’m embarrassed to talk about it. One of my colleagues used to say, when it comes to actual time on task versus time waiting in queues for software work, the number is so low. It’s hard to show the customer because it’s hard for them to even believe it.
In terms of testing, most organizations do testing as a thing that has to happen so the software can go out. Thus, it should be done really, really quickly. In terms of assessing the value, that’s not really a thing they do. It’s just a blocker for release.
So a few years ago, I was talking to someone at a Wall Street financial trading company, and he had destroyed the entire QA department and replaced it with some outsourced group. And all they were really doing was the simplest of user interface usability. Type in the number, click, submit, see it shows up testing, like almost nothing. All the executives cared about was that he had reduced test cost by 93%. Because they had no idea of how to assess test value. All they had was test cost, make that number small. It reminds me of a company I worked with a couple of years ago that developed software for their mobile app that was AI ML. It’s going to measure how you use the software, and then magically make predictions, and automatically implement those predictions in their internet of things products. Never worked. They spent, I don’t know, $2-$3 million, 18-month project, very large team, contractors, very expensive. By the time I came in, they brought me into new project rescue. I filled out the Jira ticket to tick the checkbox out of user interface, so you just couldn’t turn it on, which just disappeared. Never worked. I went to him and I said, hey, if you want software that doesn’t work, that you never use, developed in 18 months, I’ll do it for a half a million.
And the reason I tell those stories is that they don’t have a real ability to assess on the value of testing. So all that’s left is cost. And that’s not universal, but I would really interested the people that — the people that watch this podcast, if they want to comment, how does your company evaluate test? And I bet you, I would expect the majority of the answers are overly simplistic model that drives the wrong behaviors, or we don’t do it at all.
CARTY: Yeah, I mean, that’s really interesting, Matt. And just to think about this a different way. How do we go about fixing this? I mean, I know that’s not going to be a one-size-fits-all approach. It’s hard to enforce change, let alone an adaptive approach to digital quality, where you’re rolling in that information you get from the feedback loop on your testing value. So how can organizations make themselves more open to a flexible approach that is considering that testing value and optimizing for it?
HEUSSER: I’m a fan of a book called Getting Things Done, personal process, personal project tracking. Not really from make-the-checklists and fill-out-the-forms perspective, but just figure out all that you’re doing, put it in a box. Look at its effectiveness. Sort it by the things that are important and urgent, then important but not urgent, then not important and urgent, not important not urgent. And chuck stuff out of the box that isn’t valuable. Focus on the things that’s it.
So how would we do that in testing? Where I would — where do you start? Let’s look at all of our testing activities. We do usability, accessibility, performance, load, unit, system, integration, outsourced. I’m working with a — we do some crowdsource. We do this. We do the — how do you possibly compare all them? They go in different buckets. They have different managers. They have different budgets. Where do you start?
I would start with the bugs, right? So to go into our bug tracking system or whatever it is. Maybe you’re a high functioning Agile team, and you fix up the day you find them by. Track them for two weeks, man. It’s two weeks. Get a list, say, where was the defect created, where was it found? Is that right? And you can sort that spreadsheet by, where was it found? You can find what’s — where was it found? Where should it have been found, right? Often, this problem should have been found in design. Any time we have a tester arguing with a developer about what it should do, it’s a design or requirements problem because we’re clear on what it should do from the beginning. We get that list and then we can start to look at the priorities and effectiveness of our test efforts. So we can say, this tough effort is not valuable. None of the defects that are emerging are the kind that this thing should be checking.
So one company I worked with, we would do all of our work. This is a few years ago. We did all of our work under HTTP, and then we did all of it again under HTTPS, so in a secured environment. If we found one HTTPS bug, then the whole thing didn’t work. So I said, what if we did all of our testing here and then spent 15 minutes there, called it good? And that was best based on the defect data. That’s just based on the way defects show up in the world. So based on that, you can reprioritize your testing.
Now, we just did a paper together with Applause, where we talked about — and I had mixed success with send everyone a survey and ask them what they think we’re actually doing and what we are doing, average the numbers, and then look at them. And you can also look at the distributions. So you can find out there’s a radical expectation difference in how much time people think we’re spending on unit testing, or everybody agrees we should be doing more of these and less of that. Those aren’t firm numbers. It’s an overly simplistic model. It doesn’t really have any meaning. But when you aggregate it, when you put it over enough people, and you look at it, you can make better decisions about how to invest your time, which I think goes back to the vertical. How do we invest our time and limited resources to find the most important bugs early? I think that’s what we want to do. Know your mission, then adapt your investments to your mission. Most of the teams I work with at scarce resources, they want to find the most valuable bugs.
CARTY: And this gets into the idea. You say that we should think about testing as risk management. This gets into that space a little bit. Why is that? And how that guide organizations in their decision-making process, particularly as it applies to prioritizing tests and getting the most value out of those efforts?
HEUSSER: Yeah. So if you look at how many organizations do testing, what we do is, we say, we’re doing stories or micro features, whatever you want to call. And then for every story or micro feature, we assign someone to do the testing, and that person does whatever they do. And then, it’s tested. And then, maybe, at the end, we have some regression process that goes through and makes sure that they change this. It didn’t break something over there, unintended. It’s all tested about the same. All the testing is about this wide. I’m going to talk about some numbers, as if it were possible for us to model, that it’s very difficult to do. What these numbers are going to be approximations. It’s going to feel like you’re loaded on the back of a napkin in a bar over a beer you’ll be embarrassed, but it’s going to be better than nothing.
So with a risk-managed approach, you look at the impact if the problem happened, percentage, chance that’s going to be introduced into the world, multiply those two numbers with all of your senses of risks, and then you sort. And what will happen is, the most likely problems with the highest impact will go to the top. And then, you can invest your scarce resources and go on working down that list. So at least, the bottom might get a lot less attention than the things at the top. What that would mean for testing would be, let’s look at these stories. In e-commerce, that’s usually half the purchase. So search has got a worker that can’t find the product. Add to cart has got to work. Check out has — these things are really — if these things don’t work, create a profile. You can’t buy product, every minute, you lose. Product reviews, not really in the same category. So if the impact is higher — and maybe we’re talking about APIs that are consumed and used more commonly all over the site that are going to be reused, a lot of sites now have widgets. And they’re actually composed out of widgets. So these APIs might be used in a bunch of different places, which increases the impact if something goes wrong and increases the impact that some developer accidentally uses it wrong. Or maybe there’s a change in the version control, or it impacts the API, impacts the consumers of that API. So those things should get more attention.
But if we just always do what we always did, everything gets about this much testing. And then, we’re missing the opportunity to reprioritize to catch the defects that have a disproportionate impact on our customer base.
CARTY: Right. A bug here is more severe than a bug here, basically.
HEUSSER: Yeah. And look, again, I don’t see — it’s very common not see that prioritizing happen. Why is that? One, there’s no model. Nobody’s doing the work. And two, what testers do — and this is a real area of conflict — is we don’t really know what tester’s been doing. They’re just doing whatever it is they do. It’s just a thing you have to do before you get to production. And then if there is a defect, most of the time, there’s someone else to point to, to take responsibility. Like, oh, whoopsie, or the requirements were unclear, or, well, everyone made the same mistake. The requirements were unclear, and the developer made the mistake, and I made a — if we hadn’t all made the mistake, it wouldn’t happen. Or this is a regression problem. It works just fine when I tested it, but it’s other things — there’s unintended consequences. There’s a strange combination thing.
So there’s not really a strong incentive to take a responsibility approach, model the whole system. It’s more, I tested my story. My story worked. And then, what did I do to test it? I used my own judgment to come up with a test plan. That’s probably the status, I would dare say, that that’s the status quo for most developing nations. And it’s even worse for legacy systems still have traditional test cases because it’s — I just did what someone else who has left the company told me to do. The one thing we with a new build is that something has changed. That’s a definite. We wouldn’t need to test it. We just took the same executable artifact and dropped it on the website or dropped it on your desktop. The one thing we know is something has changed. The test case approach we say, something has changed. Let’s inspect it the exact same way we did before. Because that’s what you do when you have batteries on an assembly line. You run a million batteries a day. You test because they’re all exactly the same. But the software assembly line, the continuous integration system, your Jenkins, or whatever it is that you’re using, every build is different by definition because that’s what it does. If it wasn’t different, it wouldn’t need a new build.
So that’s why I think we need to actually have the capability to customize our test runs between build. And this is what the humans do to check it, and then there is this small percentage of that, that we institutionalize this tooling and automation that might run.
And so, then how much of that tooling should change for this new build so that we can check it correctly? Ideally, in many cases, if we have coverage, we could make our change, run our tests, see it fail. That’s good. It should because an automated tooling is just an expectation for yesterday’s behavior. Change it so that now it passes, reroute it, see it pass, and nothing else fails. And we have confidence system worked. Now, I just don’t see people using test tools that way today.
CARTY: Even today, it’s surprising to hear. Because you would think, today, we live in uncertain economic times. A lot of businesses are trying to operate in a more lean way, or they’re cutting back, and that can be really painful. So are you seeing any kind of tide shift there in terms of placing extra emphasis on delivering test value and optimizing for test value, or is this still a really nascent idea that hasn’t taken hold yet?
HEUSSER: I don’t mean to be overly critical. What I’ve seen in the past 10 years is, we’ve collapsed testing inside the team, so the testers are embedded members of the team. And then, we’ve often — the Silicon Valley standard now is to collapse testing into development work. So the there’s benefits. To test your developer is the same person, I’m going to build it, and I’m going to check it. And it looks like it’s good. And then send it, do something with it.
But how many of those people have read a book on testing? How many of those people have done anything other than, if we’re lucky, they went to Selenium or some tool vendor’s site and learned enough automation to be able to clickity click, clickity click to make sure it work. And when that happens, most of the time, they’ll get stuck because there’s some strange, weird user interface, for instance, very difficult to tool up in Puppeteer, or Playwright, or Selenium, or whatever you’re using. It’s awkward. It’s weird, clicking the buttons in the right order, setting up the data. For instance, very common pattern is, set everything up. Click a button. See the order ID. Click the radio button for the order ID. Click Submit. How do you know what the order ID is? It’s not deterministic. It’s going to be some new number. Maybe if we grab is a nine digit number that is just one of the top of this — programmers write these workarounds when they’re writing tooling. But a lot of the time, if nobody’s looking over their shoulder, when they get to the really complex, hard to set up stuff, you just skip right in that test. And then, where’s the bug going to be? It’s going to be a really hard, complex to understand stuff.
So when we’ll find a defect, who knows, three months, six months later, you might not even be on the team anymore. There’s not a strong incentive for you to make it check. In fact, if you do the work to check it, it’s more work. So there’s a negative, immediate, certain incentive for you that is more work for you to set that up.
So to examine the systems that we’re creating, the good news is, technology is giant business in the United States. IT, tech, software development, software testing, these have been growing at double digit rates every year for 20 years. The interesting thing about growing at greater than 10%, by definition, is that you’re going to double every five years, every five or six years, which means, at any given point in time, half the people working in the industry have less than five or six years of experience. So we’re relearning over and over again. The things that I’ve described to you are rookie mistakes. Yeah, there are companies that do better. But if they’re growing company, and they’re hiring new graduates, we’re going to have to keep relearning these things over and over again.
I hope I haven’t presented a scenario, where it’s all doom and gloom. But what I think I’m trying to say is that, if you’re looking for craft and excellence in software testing, you’re swimming upstream. You’ve got in your canoe, and you’re rowing, and the water is pushing you the other way. And it’s going to be work, but — my family is from the Pacific Northwest. Salmon, that’s how they survived, man. A few years ago, I did an analysis, and I talked to him about it. And he found something like, IT is growing at 12% annually. A growth rate of IT and software development, 12% to 15%, while testing was growing 3, 4, 5. It depends on how you measure. It depends on whether you count inflation. But what that meant is, there’s always more testing. There’s always more testers. But development is growing faster, so it feels like testing is shrinking, but we’re really, really not. And I would say that to the extent that testing is — I pick out the test case, and I do the thing that someone told me to do a year ago, who’s no longer with the company, that might or might not be relevant, yeah, I am OK with that role-shrinking.
So then, when we think of a wider view of testing as risk management to include performance usability, accessibility, what the developers do to improve quality earlier in the process, working on quality, testing the requirements at the beginning, and what roles could do that, well, we’ve also got another problem in that we don’t want to just do this much testing. We want to do a lot of testing on this, a little bit on that, and a lot on that, a little bit on that. So we want to think of a portfolio of resources to do that work, and that portfolio of resources can include contractors, which is a lot of what Excelon does. They can include crowdsourcing, which is a lot of what Applause does, and it can include full-timers, see if we could transfer people over. So it’s less common than it used to be, but a lot of companies, during crunch time, would transfer people from customer service and other subject matter expert roles to do — the developers can do a lot of this really neat. We’re going to have this 18-second test for login. We’re going to have this four-second test for add to cart. We’re going to have this 11-second test for checkout. We’re going to have a bunch of them. We’re going to randomize the data, and we’re going to run them for overnight in a random order, with random data, logging the data, and seeing if something breaks or defects that are input, transformation output, or in.
But when it comes to this screen looks wrong, that took a long time, we set up a very, very complex telecommunications. We set up a very, very complex bill of materials, where we have one account, where it has 15 different locations, and there’s 200 phones. And there’s a whole bunch of phone calls, and there’s some text messages. And we’re going to generate that. There’s the rounding taxes. And we’re going to generate — we want to make sure that the rounding worked correctly. You’re going to want other resources to do that. Maybe you could institute. But just pure developers, programmers, unless you find one who’s really interested in that, they’re probably not going to think that way. That’s not, we’re going to test this really big because this telecom bill is a huge part of our cash, and it could be legal obligation if we get it wrong.
So where could those resources come from? It’s a very similar exercise to what we talked about before. We write down all of our possible resources, write down all of our different possible kinds of doing testing, and then we try to figure out where our alignment is. So having crowdsourcing as an option and knowing how fast can we spool up people, what can they do, what are their limits, we’re not going to have any subject matter expert. We’re going to get a whole bunch of UI specialists, so we can do a lot of platform testing, localization testing, internationalization testing, bandwidth testing to know what is out there, to know how much it costs, and then on how much notice. 24 hours, 48, over the weekend. We can put it on a credit card and could get results on Monday like magic. I believe that incredibly rapid testing with clear results spool up on demand sufficiently advanced, it’s indistinguishable from test tool. It doesn’t matter. I’ve got a button I can push to get results. Do I care if there’s 500 people all over the North America doing it for beer and pizza money? Or do I care if it’s running in the cloud, and I’m paying by CPU hour. What is the quality of that result? How fast can I get results back?
CARTY: Matt, in one sentence, what does digital quality mean to you?
HEUSSER: I expect my software to work the way I expect it to work the first time.
CARTY: Simple enough. If only it was that easy to implement, right?
What will digital experiences look like, five years from now?
HEUSSER: Gosh, don’t we all wish we had a crystal ball?
Quickly, I would say that there’s a chance that the Oculus virtual reality takes off more, especially for gaming. We’re going to see — we’re going to see digital experiences. Your automobile is going to continue to become more integrated. Your automobile is going to feel more and more like a laptop computer. And the handheld device is going to continue to integrate all of the things. So we’re going to see more internet of things. We’re going to see more Ring doorbells that operate with your phone, home security systems that operate with your phone. 20 years ago, we were talking about turning the lights on and off in your home with your phone. For the people that want to use Alexa, the ability to integrate the home experience, the home electrical experience and see more of it.
And we’re going to have some laggards, where things don’t change a whole lot. But that’s what I see. I see people continuing their mobile first now. So I see more and more software is going to be done on your phone instead of the laptop. So if the world continues as it goes, we’ll see more of that.
CARTY: Matt, what is your favorite app to use in your downtime?
HEUSSER: Right now, if you were to ask my computer what I’m spending on, I’d say Reddit. And that’s really a communal oriented group, where you can find people that are interested in the same things you are, and you can talk to them and share ideas.
I am also on Facebook a fair bit in the meal ideas groups, which is a little crafty a corner of the internet where you can experiment and get better and better at something. It excites me.
CARTY: And finally, Matt, what is something that you are hopeful for?
HEUSSER: One thing I’m hopeful for is, we have these ideas, like this book No Asshole Rule, and Boundaries By Townsend and Cloud, the idea that we can make and keep commitments to each other, so we know where things are.
And on top of that, I would layer, so much of our thinking is zero sum. I want a contract, right? I’m going to make these numbers up. They’re not right. If I pay someone $20 an hour, and I fill out at 25. I make a $5 profit. If I pay them more, I make less. And if I charge my customer more, they make less. So, ultimately, at some level, there’s some amount of zero sum going on. But usually, I find, can I offer you flexible hours? What if people are working remote? What if we gave a small annual bonus for employees, so they could just get out of dodge, go to a vacation spot on Saturday, have Sunday with your family. Work through the week, but be done at 5:00 PM. Enjoy that, whether it’s family or whatever it is, and then come back on Saturday. So you still work in the whole week, but you get to experience a retreat or a corporate retreat. What if we just covered your travel? How can we — and then that that’s tax deductible now, as opposed to you covering your own travel, which pay with after tax dollars.
What can we do to find ways to make the work more palatable and better for everybody? What’s a win-win outcome? And how can we honestly negotiate with each other in a problem solving way? This is a problem to be solved. How do we get this software done instead of a blame or responsibility oriented way where there’s finger pointing. And part of what helped me go independent was that allowed me to change the position so that I could say, this engagement isn’t working out for me. I’m done. And as an employee, it’s very, very difficult to say, I don’t accept an assignment where I can’t be successful. I can’t be successful here.