Why Unit Testing Matters in OT (and How to Do It Right)

Stop me if you've heard this one. An auto mechanic is talking to a heart surgeon:

We do pretty much the same job. I open the car hood, you open the ribcage. We both see what's wrong and fix it. So why do you make 3 times more money than I do?"

The heart surgeon: "I do it while the engine's running."

Let's retell the joke, this time with an Information Tech (IT) guy and an Operations Tech (OT) guy.

The IT guy says, "We do pretty much the same job. We both have integrated systems and automation. We both have clients who stand to lose a lot if something goes wrong. So how come my system is 100% covered by unit tests, and yours isn't?"

If I'm the OT guy, for most of my career, I would have responded,

"Look, I consider it a good day if my client only changes requirements 3 times while I'm driving out to site. Last night, I stayed up 'til 2AM trying to make a green light come on in a PLC cabinet. My system is in a perpetual state of 'just barely working' at all times, so an academic standard on testing may work for text files but doesn't make its way down to the factory floor... Also, what is a unit test?"

The joke falls flat, but the issue is real. Unit testing is something many simply aren't doing in OT. For the unfamiliar, a unit test is a persistent piece of code in our system that verifies that other parts of the system are working. The IT world benefits from a workflow called "Test Driven Development," which elevates these system tests to an even more important level than the code itself. And if we're open, we can learn a lot from them.

OT is Changing

I've seen some trends happening in the OT world that should inspire us to take a serious look at unit testing.

The OT landscape is trending from...

Physical hardware to virtual, making it easier to mock test environments.
Local to remote.
Building intelligence and complex logic into edge devices, to making edge devices as dumb as possible and consolidating intelligence into centralized scripts.

When I started my OT career about five years ago, scripts in an IDE (Integrated Development Environment) were often an afterthought. Now we build entire applications out of them.

This doesn’t make us strictly software engineers. We're still building integrated systems with plenty of dependencies. But the parts of our system that are like software engineering can benefit from unit tests.

I would like, at the very least, for every OT engineer to know how to write a unit test. I would like them to take a page from Test Driven Development (TDD) and learn how to write code test-first. I would like every engineer to work in systems where large sections of code are protected by tests, so they can see the benefit for themselves.

Most of all, I want the quality of our testing to rise to the stakes present in modern-day OT. If something breaks in an IT environment, say, some Flappy Bird knockoff throws a bug, there's an associated bug report. On the other hand, when something breaks in OT, there's a valve that opens and dumps $10,000 in product onto the factory floor! OT deserves unit testing because clients deserve resilient systems, and operators deserve safety.

So let's dive in. I'll start by explaining WHAT a unit test is. Then I’ll move on to WHY we test. Hopefully, the why is compelling enough to carry us through to the HOW.

The WHO of testing is also worth mentioning because the first time you write a unit test in OT, you'll probably do it alone. If your OT culture favors short-term firefighting, band-aid solutions, and "just barely working" over correct procedures, well, they'll likely resist TDD. So the who becomes very important.

It usually goes something like this:

First, a single engineer gets excited about unit testing.
Then, a village of developers discovers they prefer building projects this way.
Then, a city of clients learns it's a time and money-saving proposition
Then, an entire industry demands unit testing as a requirement and leaves unprotected code behind.

I am thoroughly pleased to announce that unit testing, which was previously impossible in OT, is now only very difficult. It is the duty of the present OT generation to make it so that the ones who come after find it easier still. Sound good? Good.

What is a unit test?

A "test," broadly, is some code that verifies production systems are working without affecting the production environment. Here is some function, celsiusToFahrenheit(), in a production line that converts a temperature reading from Celsius to Fahrenheit:

image (1)

Everything marked in red is part of production. The input to celsiusToFahrenheit() comes from a live temperature transmitter inside a tank. The output goes to an active OPC tag for storage and analysis. We need to devise a repeatable test that will verify the calculation is working, and I'd like to perform this test no matter what's happening in production.

So, how do we do it? For a fair test, you must guarantee that the function we're testing is identical to the one in production. If we templatize our function, then at runtime, it can be instantiated once in production and once in testing. You want to avoid having one function for testing and a separate, altered one for production.

If our test function were to produce a production output and write to an OPC tag, this would affect production and be destructive. So, the output of our function must be a native type object like int, float, string, etc. You can’t have the tag write occur as part of your function. You must move it outside the function body.

It is very common during testing to change existing code to make it more testable. After all, the first time you write it, you're likely only concerned with making it work.

Let's continue the decoupling on the input side. You can't rely on a production input to be consistent during testing. Instead, we have to supply our own mocked inputs, and I prefer the function to be deterministic (That is, the same inputs should always produce the same outputs).

OT Testing Path and Production Path diagrams

We now have a production path and a parallel testing path. On the production path, the function runs normally. As part of the testing path, we create an input, run the function, and compare the output to an expected value. By decoupling the function and making it read and write native types (as opposed to reading and writing production objects), we've given both the testing and production environments independent access to the function. Even if the client's factory were hit by a meteor and disappeared, we could still verify that this function works as expected.

The model is generic enough to cover both IT and OT tests, but it also highlights project scripts as an excellent place to put system functionality. If we're serious about testing, our body of tests (called the "Test Suite") should be located somewhere that is accessible to both production and testing.

I would go even further and say the scripts are where the meat of your system's functionality belongs. You should take all the intelligence and calcs and logic out of screens, out of edge devices, out of tag change scripts, and put them into script library scripts.

People come to me and say, how do I test my Ignition project? And when I open up their script library, I see that it's empty! I know immediately that all of their screens carry complex scripting logic as baggage and that they have 400+ looseleaf tag change scripts. I can't get into those to test them! If a function is worth writing, it is worth saving somewhere it can be reused. These remote systems should be calling script library scripts so that they can be reached by the Test Suite (and by other developers!)

Here is the same Celsius to Fahrenheit test, but instead of drawing it graphically, we write it in Python in Ignition's script console:

def check_celsiusToFahrenheit():

# Input

c = 45

# Expected value

expected = 113

# Function execution

r = celsiusToFahrenheit(c)

# Pass/fail assertion

assert r == expected, str(r)

def celsiusToFahrenheit(c):

return c * (9/5.) + 32

print check_celsiusToFahrenheit()

The test conjures an input, runs the function, compares the output to an expected value, and uses the "assert" exception to verify that the function works (The syntax of "assert" is, "verify some conditional, and if it's False, crash and report this string error.")

If the function works, the test executes without incident and returns None. If the function doesn't work, the test fails at the assertion error and reports the value it got instead of the expected value. Later, you can collect all your tests into a Test Suite and give yourself a way to trigger them manually. Then you can set a trigger to run the Test Suite automatically at timed intervals, reporting the results.

The C to F test above meets the definition of a "unit test" - It is intentionally small, it tests one thing, and it has zero dependencies. If you add a tag write to the function, or a tag read, or a database write or read, the function is disqualified from being unit testable. You still have to test it as part of your Test Suite, but you can't use a unit test. This is a common obstacle to testing in OT: the high number of dependencies baked in to everything our systems do. I'll introduce some tricks for avoiding this problem in the “How” section.

Why do we test?

Assigning an automated test to a piece of code makes it resilient, readable, refactorable, and reportable - The four R's of why we test. These benefits apply whether the test is in IT or OT, whether it's small or large, a unit test or some other kind. The single fastest way to improve a piece of code is to add a test. Let's talk about why.

We want our code to be:

Resilient

Something special happens when a piece of code and its test are bound together. They are a matched set, like a pair of gloves or a plane and the runway it's landing on. If someone changes the code, we know immediately because the test fails. Code without a test is vulnerable, compelling us to create one. A test without code is a set of marching orders containing every bit of context necessary for a developer to build the associated functionality. I encourage developers who can do both to write the test prior to writing the code in a process called Test First Development.

Test First Development is comparable to scaling a rock face: You start with both hands on the wall. You run your entire Test Suite to verify all the tests pass, and the system is stable. Then, you reach for the next "hold" and write a test for the next feature. While writing this test, you'll have to answer context questions about where you're going and what it is that you're developing. What does the input look like? What is the expected result? Having answered those questions in the test, you "reach with your other hand" to write the code that makes it pass the test. You'll know the moment you're done - the test will go from failing to passing. Your hands are back together on the wall, the system is stable again, only you're one feature higher than you were before.

Many of my clients express concern that hackers or bad actors will enter the system at night and make unauthorized edits to the code, causing features to stop working. While a valid concern, it doesn't happen frequently. What does happen is that we introduce something new, benevolently, which accidentally breaks a feature we wrote three months ago! To date, I have not found a technique that prevents this better than covering your code with automated tests.

Automated testing has a way of democratizing your functions, from those that run once per second to those that run once per year. All are equal in the eyes of the Test Suite. Each test has a binary opinion of the code - it either "works" or it "doesn't work." Likewise, the whole system "works" if 100% of the tests are passing, and needs attention if just one test fails.

Readable

The most readable parts of your code are the function names. This is why it's a good bit of wisdom to keep functions short, squeezing the maximum number of function names out of the code. The second most readable part of the code is the test. Here's the code for a function called rotateListOfDicts. Can you tell what it does?

def rotateListOfDicts(listIn):

return {key: [d[key] for d in listIn] for key in listIn[0]}

Well, the function name doesn't give you a lot of help. And there's some doubly nested list comprehension, which not everyone can read quickly. Let's take a look at the test.

def check_rotateListOfDicts():

listIn = [

{'a': 1, 'b': 2, 'c': 3},

{'a': 4, 'b': 5, 'c': 6},

]

expected = {

'a': [1, 4],

'b': [2, 5],

'c': [3, 6],

}

r = rotateListOfDicts(listIn)

assert r == expected, str(r)

In the test, we have a much clearer picture of what the function does. It takes in a list of dicts, and it outputs a dict of lists. Without that context, we're stuck guessing what the programmer expected the inputs and outputs to look like.

Notice that in the test, the only reference to the function is a single line. We would use the same test whether the function were complex, simple, or easy to read, even if it were written in a language we don't understand! The function is no longer defined by how hard it is to read, but by whether or not it works.

It helps me to picture all the code in my system as locked inside a series of vaults. The door to each vault is a test, with a red or green light indicating whether the test passes. As long as the light above a door is green, we don't have to open the vault. This is the gift you give a fellow developer when you cover your code with testing - As long as your code works, you have liberated them from having to understand it.

Refactorable

One reason software engineers like unit testing is because they qualify systems based on how refactorable they are (how easy they are to change). Unit tests excel during the refactoring process. We expect a greenfield project to be very refactorable. But as systems age and technical debt builds up, features appear that are wrong, but that can't be changed. The cost is too high.

Unfortunately, these issues tend to compound. Features added later have to work around the offending error, creating even more technical debt. And new developers have to be told why this seemingly obvious error can’t be changed, and so on.

To show the power of refactoring using automated tests, consider the example of two developers, Goofus and Gallant. Let's say they have identical systems. Gallant's functionality is covered by tests, but Goofus's isn't.

The client wants to change their database from Microsoft SQL to MySQL. I picked two SQL dialects with similar syntax so that some of the SQL commands, sprocs and functions they've written will still work, and some of them won't.

How does Goofus accomplish this task without tests? First, he switches over the database. Then he tries to find every single SQL command in his system and test it manually. Some of them still work. Some of them don't. The ones that don't, he fixes. Later, his manager asks, "Are you finished?" Goofus responds, "I hope so, I think I found everything." He won't know for sure until his changes go into production and break.

Unlike Goofus, Gallant has all the functionality in his system covered by a set of 50 tests. The tests cover not just the SQL-coded parts, but the downstream system features that depend on them. The first thing he does is write a new test. He already has an existing test verifying that the database is connected. But the customer has a new requirement - This database must be MySQL. So, he writes a test that verifies the connected database is MySQL. He could do that with the line,

SELECT data FROM someTable LIMIT 1;

(In the old database, he would use SELECT TOP 1). This will fail in the old database, but pass in the new. He runs the test and watches it fail. Then he cuts over the database to MySQL. Now the new test succeeds, but 10 other tests fail. Each of these 10 tests represents a piece of SQL code he must now update. His task is straightforward - He works through the code until all 50 out of 50 lights are green again.

When a system has test coverage, you just make the change. Then, every dependency that the change touches appears as a failed test. You refactor by fixing all the code until the tests pass again.

Note that Gallant doesn't have to waste any time looking at working code. In fact, I would go as far as to say Gallant objectively did the least amount of work required to complete this task. So when I hear somebody in OT say they can’t write a test because it’s too much work or because it takes too long, I don’t listen. It’s too much work to make a system without tests.

The Cost of Testing

Let's take a moment to address a question I get asked a lot: How much testing overhead do we have to add to build all these tests? Well, first, don't call it "testing overhead" around me, because I believe testing is just as much a part of development as naming our variables. Second, here are some factors that will help answer that question:

For every new feature, it takes about 15-30% more time to build it with a test. But when you go to change an existing feature, you'll be able to do it about 3x faster. It's imprecise math, but it's a good estimate based on my experience and software engineering standards.
If you have a complex system that is built once and then changes frequently, you can expect a net savings from testing, not a net cost. If you're like me, you're used to building these large, integrated systems. We like to think that our applications do a million things, but do they? Or do they actually do 40 things, but the means by which they do them change a million times?
The longer ago you wrote the test, the more it should be doing for you.
At the end of the project, everyone will wish they had tests. But at the beginning of the project, nobody will want to build them. In the beginning, it sounds like a raw deal because your asking for guaranteed extra time to build, but offering uncertain, hard-to-predict savings from refactoring.

So how do you budget for this? My suggestion is, for every 6 months of project work you bill, include 2 weeks' worth of hours for automated testing and unit tests. Your client deserves to know you think testing is important, and that you'll be dedicating time to it. Don't just take the total number of hours and add 30%, because I know that the longer the project goes, the more it's virtually guaranteed that testing becomes a net savings.

Here's another guarantee: At the end of the project, Goofus will say what readers are probably thinking in their heads right now: I sure do wish my project were covered in a bunch of tests.

Reportable

Speaking of Goofus, remember when his manager asked him if he was finished? He had a hard time answering. He does have a definition of done; he's done when "everything works". But he doesn't have a good definition of "everything." And he doesn't have a good definition of "works".

Conversely, Gallant's manager has access to the test logs. He knows Gallant took a customer requirement and turned it into a test, which the manager can then review to make sure he's doing the right thing. He has the timestamp at which the first test failed, indicating when Gallant started working. He knows what he's working on because there are 10 failed tests. He has a percent completion towards Gallant's goal. And he knows when Gallant finishes, without Gallant having to tell him.

So, how many times does Gallant have to interact with his manager? Zero times. How many times does Gallant have to open up Jira or Azure DevOps to track this? Zero times. With TDD, it's possible for all of the material work being done in the system to be tracked automatically through the tests.

How do we test in OT?

At this point, OT programmers might be looking skeptically back and forth between my short code examples and the scripts found in their own projects. It's not uncommon for OT scripts to be massive 500+ line monstrosities that make frequent references to dependent systems such as tags and databases.

If code is not unit testable, we must choose between two options: 1) Leave the code alone and make a large and complex test, or 2) break the code up into unit testable pieces. I recommend the second option, and call this process "Function Decomposition."

Here's an example. Consider this pseudocode for a large function:

def getKPIs():

-Read the currently active line from a tag

-Create a series of KPI tagPaths using that line

-Read values from each tag

-Transform and sanitize the values

-Write the sanitized values to a database

The first thing to do is identify those destructive elements which prevent us from testing independently of production. The tag reads and the database writes are such elements. Our goal will be to push those to the outside of smaller functions and get as much unit testability as we can out of the rest.

def getKPIs():

-Read the currently active line from a tag

-Create a series of KPI tagPaths using that line

-Read values from each tag

-Transform and sanitize the values

-Write the sanitized values to a database

Suppose I split the above into five separate functions:

def getKPIs():

activeLine = readTag('[myTags]activeLine'])

tagPaths = getKPIPaths(activeLine)

tagValues = readTags(tagPaths)

sanitizedValues = makeKPIs(tagPaths, tagValues)

writeToKPITable(sanitizedValues)

To be unit testable, a function must take in native type inputs, yield native type outputs, and have no dependencies on a separate system. If we left the large function alone, none of it would be unit testable. Instead, we split it into five functions and came out with two that can be covered by unit tests: getKPIPaths() and makeKPIs().

OT Testing Table 1 - Decomposing Functions

I do not need individual tests to prove I can read every tag that my system reads. I need to write only one test that proves I can read and write in the tag space. This test will verify readTag() and readTags(). For the database, I can set up rollback transactions that ensure my database tests aren't destructive, and, when necessary, I can disable foreign key constraints on individual tables during testing to make conjuring inputs easier.

OT Testing Table 2 - Functions

With two unit tests, one tag test, and one database test in my Test Suite, the entire outer function getKPIs() is covered by testing. As a bonus, our system now has check_tagReadWrite(), so the Test Suite will tell us if the tag space ever becomes inaccessible. If another developer wants to do safe database testing, they can use check_writeToKPITable() as a template.

Perhaps the most significant benefit is letting the function makeKPIs() be a stand-in for some intricate and complex data transform spanning hundreds of lines. How many times have we written clever code that needs to be re-used, only to lock it behind some dependency? Other developers now have access to the transforms in makeKPIs(), and can use it for their own purposes.

We set out only to make the code more testable, but in doing so, we made it more readable, more scalable, more modular, and more refactorable. Adding a test is the single biggest improvement we can give our code, and all we did to this code to make it testable was break it into pieces.

FAQs

1. Why Do You Use "check_" to Prefix Your Tests?

In IT, test functions are often prefixed by "test_", or "_". I found that in OT, whenever someone sees the word "test", they assume it's junk, or some half-finished work. I name my tests after functions but prefix them with another word, like "check_". My Test Suite is set to execute every function starting with this prefix.

On another point, I am aware that it bothers some people when I use “r” (for result) as a variable name in my tests. During the assertion statement, I need to refer to the function result quickly and frequently, mandating a short name. By design, 90% of my tests end with the same line:

assert r == expected, str(r)

2. Can I put multiple cases in a single test?

Yes, just make sure that each assert statement error message clearly conveys which case failed.

3. In What other ways does OT differ from IT testing?

In IT, especially when there's a dedicated QA team, the tests are set up remotely and are placed in a folder structure that mirrors the source code. Since I'm trying to educate engineers about testing, I like to put the tests and the code closer together. I put them in the same module as the function they test. This has the added benefit of letting developers use the tests as documentation.

I also find it difficult, in OT, to comply with the IT Test Pyramid which says that most of our tests should be unit tests. I need my functions tested, even if the presence of dependencies in OT means fewer unit tests. This pattern of ignoring Pyramid-based guidelines extends to another pyramid, the Food Guide Pyramid. 11 servings of bread and pasta?

It’s possible I just have something against pyramids.

4. What tools and libraries do you use for testing?

Before I answer that question, I offer an example of a Test Suite:

def testSuite():

check_function1()

check_function2()

#...

It will run every test in the system and tell us which ones fail. I highlight its simplicity because the best testing tools available don’t add much more to this. I prefer to homebrew a Test Suite. Test Suites that are only a little bit better than our simple example might do these things:

Automatically detect the tests so we don't have to list them explicitly
Execute the tests in a try/except block so we can record multiple failures
Make setup and teardown more convenient
Log and report the test results

A couple of days of coding can produce a tool that does the above.

The attachment below is an Ignition script called testRunner.py that can be used to run tests found in other Ignition modules:

I recommend developers "build, don't buy" until they understand testing well enough to appreciate what difficulties the tools are attempting to relieve - the hard part is not the test framework, the hard part is writing the actual tests.

5. Do clients interact with the tests?

By design, the tests should always maintain a 100% pass rate regardless of what the client is doing in production. There should be nothing a client or operator can do that interferes with a test. That said, you’ve already done the work building your automated test system and have an excellent platform to add client-facing “alerts” that do have production dependencies.

For one of my most successful projects, I created one alert per deliverable on the initial scope of work document and made a “scoreboard screen” that the client could view to see progress. They loved it! We didn’t have to report when we were done with features, and the client had the scoreboard to verify every part of their system was working, even after commissioning.

6. What systems are particularly hard to test?

Frontend screen functionality is difficult to test because traditional inputs and outputs are replaced by “user clicks somewhere” and “user sees a change on the screen”. Something as simple as verifying that a button navigates to a new window could require complex third-party tools to validate. However, keeping your screen scripts short and having them reference a library script right away means that at least the backend functionality of screens remains testable.

Another sticking point in testing is any function that has the passage of time as a dependency. We’d like our Test Suite to execute as quickly as possible, but certain operations prevent that, especially ones that act on time series data at intervals. As a rule, I prefer to test the function against historical data instead of real-time. I bristle whenever I see vendors demo products that require real-time streaming input and have no means of testing with mock historical data. It is my hope that a strong culture of testing among integrators will inspire OT vendors to make their tools more testable.

7. What should I do next?

You should write a test! To improve your code and your coding skills, write a test. To make your system work better, write a test. You don’t need 100 tests, they don’t have to be unit tests, and you don’t have to put them in a special place. One test is better than zero.

Don’t read a book about testing, don’t watch a video, don’t look for a tool. The only way to learn how to test is by writing tests, period. I hope this article inspires you to give unit testing a try in your own OT projects and to discover its benefits for yourself. And if you're looking for help in developing a test-backed solution for your digital plant, please reach out to someone on our team.

Why Unit Testing Matters in OT (and How to Do It Right)

OT is Changing

What is a unit test?

Why do we test?

Resilient

Readable

Refactorable

The Cost of Testing

Reportable

How do we test in OT?

FAQs

2. Can I put multiple cases in a single test?

3. In What other ways does OT differ from IT testing?

4. What tools and libraries do you use for testing?

5. Do clients interact with the tests?

6. What systems are particularly hard to test?

7. What should I do next?

Comments

Related Articles

MES is Ready for You, and it is Attainable

Building a SCADA System for Georgia Power’s 416 Acre Solar Field

Trying to "Catch Them All" Using Unit Testing