Simulating realistic scenarios under which a real system can demonstrate its resilience is hard work... But everything becomes easier with proper planning and a clear understanding of your options, their costs, and their limitations.
This article describes the creation of test scenarios that might endanger your system in production. Have you ever come across a defect that is difficult to reproduce in your test environment? If yes, then this article may contain some ideas to help you out.
Simple test scenarios can be created by choosing specific input values. However, complex scenarios require careful orchestration of multiple components. This may include the use of realistic data, as well as management of misbehaving components, delays, and slowdowns. When these techniques are properly combined, you can reproduce the conditions under which your software may assert its true strength.
When testing a scenario becomes complicated
To start, you should always remember that complex test scenarios are necessary only for complex software. If you have a chance to keep your designs simple you might save yourself a lot of time creating the tests.
But what causes the test complexity? In an ideal world, all functions would be pure functions. They behave deterministically, have no memory or exclusive resources, and they reset themselves after every call. Testing pure functions is straightforward and repeatable. You can simply replay the inputs and compare the expected outputs.
Testing non-pure functions, on the other hand, quickly becomes a multi-domain effort. Developers, deployment engineers, and testers must work together to separate pure and non-pure components. The former can be tested against exact expectations, while the latter require further separation. For example, imagine a function that generates a random number as part of its calculation. Verifying its final result would be difficult because randomness introduces uncertainty. But if you isolate the randomness in a component that behaves randomly in production but is overridden in test environments, you regain full control over its output. The function becomes predictable and pure. After that, you can either trust the randomness that you overrode to reappear in production or you can dissect the function further.
The following code illustrates the situation. To test this function you either need to call it hundreds of times, or find ways to control the output of the randomiser by the means discussed below.
def testme():
tmp = my_random_uniform(1,2) # Non-pure component
return "rare" if tmp > 1.99 else "frequent" # Pure component
Making functions pure often requires a lot of effort. Testing without exact expectations, like accepting random output, may appear as an easy way out. There are some clear signs that you need better control over test scenarios:
- Tests do not check precise results. Instead, they accept wide ranges or any value of a type.
- Tests do not cover error cases and stress scenarios.
- Test coverage is low. Interactions with external modules are not tested.
- Tests take a very long time to complete and cannot run in parallel.
If you notice any of these conditions, you should suggest to developers that they make their functions purer.
A fresh start: How to snapshot and reset all memories
As soon as a tested system has its own memory, your tests cannot be easily repeated. It becomes non-pure. Every request will be answered in a stateful way, with potential knowledge of previous interactions. If you want to test a true “first,” you need to roll back all components that your system had contact with. This can become arbitrarily complex.
The first line of contact is usually the database. It is responsible for creating and retrieving the system's memories. In the simplest setup, all data in the database can be replaced with known content. This is if we assume that all information about past events is located in a local database that can be cleared and filled at will.
The preparation of an initial state can be accomplished by two main strategies: snapshotting and recreation. Snapshotting stores the database state after an initial configuration. Recreation always starts with a clean slate and automates the steps to fill up the required inventory data.
If you choose snapshotting for your test scenarios, consider these aspects early in the software design phase. Snapshots are a great way to recreate real situations that occurred in the past, but care must be taken to keep things going.
- Deployment: As snapshots affect an entire system, make sure you can spawn new instances quickly.
- Sensitive data: Snapshots may contain personal or security-relevant information. Obfuscation can conceal sensitive data while keeping the aspects relevant to your tests.
- Timestamps: An entity might have been new when the snapshot was taken. However, at recovery time, the “aged” data might have a relevant impact on the tested functionality.
- Size: Large entities, such as images or videos, might slow down your restoration time. Consider shrinking or otherwise replacing them.
- Maintenance: Who updates or recreates the snapshots?
An alternative to snapshotting is data recreation. If your data is in main memory then this is your only option. Here you specify—ideally in automation code—how the initial system state is created from an empty state. Here are some aspects to consider in software design:
- Multitenancy: An ideal software design has separate spaces so that multiple data scenarios can be created simultaneously.
- Automation: To ensure consistency and speed of data creation, use automated scripts, possibly also testing the data entry features that users would choose.
- Bulk uploads: Allow for easy methods to create multiple entities at once, either as part of the regular user interface, your regular API, or as a side feature for testers.
- Object mother: This is a technique to hide the complexities of data creation behind a data creator with a simple and readable API.
The outside world: How to reset the unresettable
Once your system under test starts to communicate with outside components, your ability to retain purity is severely challenged. Your choices are to bring all external systems under your control or to replace the external systems with a lookalike, known by the names mock objects, test doubles or Imposters.
For example, assuming that your application pulls live weather data from a cloud service, you could connect it to a mocked service instead, which predicts predefined temperatures. This allows all testers to verify the correctness of the display without relying on the correctness of the weather service.
Mock objects are often hard to build and maintain. This is especially so when the external service evolves over time and the mock has to be adapted to behave in accordance with the latest version. Maintaining the mock can cost a lot of time, so its impact on testability should be considered in the design phase. For example, logical components that require high coverage should never be merged with data retrieval components that have large external dependencies.
Your team may be tempted to procrastinate in introducing mocks. Initial costs are high and major refactorings likely. However, there are compelling reasons why you should use mock objects. If any of the following criteria applies to a component in your application, then you should consider building a mock for it:
- The component is slow and causes major delays in your entire test suite.
- The component consumes too much power or resources to justify its use in test execution.
- Stress testing the application makes the component raise conflicts or alarms.
- A test scenario requires a rare response from a component that is not enforceable (for example, the scenario requires the component to be out of service).
- The component is not available at all times.
- The component is indeterministic, and thus creates unclear expectations for the entire system.
- The component is under development, complicating the error attribution in case of an eventual failure.
Test setups that use mocks can be distinguished by the number of components that are mocked.
- A test in which all but the tested modules are mocked is a component test.
- A setup in which some or most components are replaced with mocks is called an integration test.
- In end-to-end tests, ideally no components are mocked. Real components are used if possible, although often with configurations that differ from those of the other two types.
If you find that your use case requires mocks, then you have several strategies available to build them.
- If the components of your application communicate over standard web protocols (like microservices) then you have access to a plethora of tools, such as MockServer, Mockoon, Mocki, and many more. Everything that communicates over standard protocols is easy to mock. Requests can be recorded in live environments, reviewed, adapted and replayed for the test.
- Structured programming languages allow for a technique called Dependency Injection to swap out components with a minimum of effort. In Java you can use JMock and Mockito to build such replacements, and other languages have similar frameworks.
- With sufficient coding support, any component can be modified to serve as its own mock in tests, such that it evokes the behaviour under scrutiny. Testers need a clear interface to trigger such behaviours. This can be done through dedicated control interfaces, or simply by abusing user data as triggers. For example, a person with the family name "CrashAtStepX" might surreptitiously call for special treatment in tests. (Obviously this feature must be disabled for production.)
In all cases, the mock object will mask a tiny part of your production code, which will then be untestable in this environment.
Slow and fast: How to run tests with faked time
Many applications depend in complex ways on the current time. Tests are intended to run much faster than typical end use in production. This may cause the software to expose its own kind of artefacts or make important features inaccessible. For example, a banking application may process a transaction over the course of days. Expiration times of individual requests are often measured in hours, not seconds.
A straightforward solution to the problem is to refrain from using the system’s built-in time function and replace it with a mockable service. This of course must be consistent throughout the tested environment, such that any slow-downs and speed-ups are reflected coherently. Since time is an inherently global property the impact of time changes gets bigger from unit- to integration- to end-to-end-tests. Making a consistent time warp may not even be possible without unpredictable side effects.
Two different approaches must be considered when changing the system time. Is it a speed-up or a slow-down? If you want the test to run faster than the use case would run in production, there are some possible solutions:
- Use a central time provider: If all time-dependent systems are under your control, you can replace any access to the system time to call a time provider instead. There you can change the system's time to align with your testing schedule. This is difficult, but possible.
- Manipulate timeouts: Expiration times could be reduced by a central factor from days or hours to seconds, such that a negative test can be performed without delay.
- Design your code to be independent of global times as much as possible and allow for faster runs.
Making your code run in slow motion can cause system freezes in modern virtualized and networked architectures. Although such hiccups are rare and short, they often lead to substantial failures that are hard to reproduce. Here are some tips to provoke them:
- Simulate slow internet or 3G: For web applications this is possible using the browser's developer tools, but slowdowns can also be simulated on the network level.
- Randomize the timing of all components that may cause race conditions. This slows down your application, but reveals one of the most obnoxious error types. Such randomizations can be implemented on the operating system level with tools like custom Linux schedulers.
For more tests that involve timing you can switch between time zones, set your date to leap years, and simulate daylight saving time changes. You will never stop being surprised by the complexities that time can bring.
To wrap up
Simulating realistic scenarios under which a real system can demonstrate its resilience is hard work. Databases accrue all kinds of artefacts over time, connected systems can fail in unpredictable ways, and runtimes can suddenly vary, leading to unforeseeable consequences.
Setting up all these factors for maximum testability requires significant effort. But everything becomes easier with proper planning and a clear understanding of your options, their costs, and their limitations. Here are some considerations to keep in mind:
- Discuss test requirements early. Requirements for test strategies can have a big influence on the software architecture.
- Make sure that components have clear responsibilities and define their simplified behavior in mocked test scenarios.
- Estimate efforts. Some strategies are more and some less difficult to create. Choose carefully.
- Monitor the continued correctness of the mock objects and your assumptions.
Unfortunately, there is no simple solution to create all possible scenarios. Making the relevant cases testable in an automated and rapid fashion is an art that requires creativity and technical insight.
For more information
- The Community's Guide to Mocking - MoT Collection