Blog > 2019

The Symphony of Blockchains project comes to Bristol

The groundbreaking immersive blockchain experience was presented at the Arnolfini Gallery, Bristol

20 February 2019 Amy Reeve 3 mins read

The Symphony of Blockchains project comes to Bristol

The Symphony of Blockchains project had its public debut last week at a packed event in Bristol’s Arnolfini Gallery. Over a hundred visitors took the chance to explore the bitcoin blockchain in full virtual reality (VR), at a joint presentation from IOHK creative partners Kuva and IOHK's creative team. The Symphony project is an educational initiative, with the goal of creating new and innovative ways to teach people about blockchain technology. Andy Buchan, founder and creative director of Kuva, described the project as ‘an unprecedented opportunity to visualize new technology, and to establish a vocabulary for the blockchain industry,’ while IOHK creative director Richard Wild spoke of how the project had 'enabled creatives to engage with hard concepts in new ways.’

The exhibition itself spanned two rooms: one with a large, towering monolith in the centre and two VR headsets, and a second room with rows of augmented reality posters and a projector attached to a VR setup.

Symphony Monolith, Bristol The large, imposing monolith which occupied one room of the exhibition

In the first room, visitors queued to put on VR headsets and watch the monolith come to life, while a voiceover explains the fundamentals of blockchain technology. As you listen, bright streaks of light — transactions happening on bitcoin's blockchain in real time — shoot past, gravitating towards the heart of the monolith. There they blend into a glowing ball, a melting pot of transactions that viewers can manipulate, expanding the data into the room around them to explore in 3D.

In a second room, another VR experience — single-handedly created by IOHK creative coder Scott Darby — saw visitors dive deeper into the raw, living data of the bitcoin blockchain. Here you can traverse the intricate structure of Merkle trees, surrounded by bitcoin transaction crystals which glow and hum as you explore. A more in-depth experience, this part of the VR exhibition allowed visitors to travel all the way back to the bitcoin genesis block, generating dynamic audio in real-time as users explored the virtual space.

A visitor exploring the blockchain A visitor exploring the blockchain

Creating the Symphony project isn't just a stunning visual achievement for IOHK and Kuva, but a technical one too. Running the VR simulations requires simultaneously managing and syncing multiple data streams, from the audiovisual cues for the headsets to the lidar mapping the room — and of course, live data from the bitcoin blockchain itself.

Augmented Reality Poster
Augmented reality posters in action

Everyone who attended the event seemed impressed, describing the experience as ‘mind-blowing.’ The opportunity to explore the constructs underpinning bitcoin in a tactile, physical way allowed even those with no blockchain experience to grasp fundamental concepts quickly, and visitors as young as twelve expressed a greater understanding of blockchain as a result.

The event in Bristol was just the start for the Symphony project: there'll be an even bigger exhibit at the IOHK Summit in Miami next month, before Symphony goes on tour to educate people around the world about blockchain technology.

How we use Nix at IOHK

6 February 2019 Rodney Lorrimar 10 mins read

How we use Nix at IOHK

IOHK has a history of putting great research into practice. This includes the use of functional programming languages, adoption of formal methods, and — of course — implementing our own peer-reviewed research into blockchain consensus algorithms. We approach the problem of software deployment in similar vein, using ideas which have emerged from recent academic work. This system applies the principles of functional programming to the domain of software package management and system administration.

In this short technical article I will describe how we leverage Nix at IOHK, and share some of our plans for the future.

What are Nix, Nixpkgs, and NixOS?

NixOS is a Linux distribution built on the Nix package manager.

What I personally like about the Nix package manager is its ability to install multiple versions of the same package, or easily apply a source code modification to a package. For Haskell development in particular, there are binary packages for an impressive number of compilers, tools, and Hackage packages available.

What sets NixOS apart is its configuration system. It includes a large collection of high-quality modules which are configured declaratively in a uniform way. This is a boon for system administrators who need to apply the same configuration to many systems, or integrate a wide variety of software into one system.

Underneath the Nix package manager and NixOS is the Nix language — a domain-specific, purely functional, lazily-evaluated language for software configuration and deployment.

The paper NixOS: A purely functional Linux distribution is an excellent introduction. It clearly sets out the problems which NixOS attempts to solve, many of which will be familiar to technical users.

Who is behind the NixOS project?

NixOS was initially created in 2003 at Utrecht University as the research project of Eelco Dolstra, forming the main part of his doctoral thesis. Fifteen years later, the NixOS project is run on GitHub with hundreds of contributors involved, and the code base is currently improving at a rate of about 500 git commits per week.

The community also communicate through an active IRC channel on Freenode, a discussion board, a (roughly) annual conference, and regular meetups.

Moreover, several companies provide Nix consulting services, such as Tweag I/O and Mayflower.

Why Nix?

There are many existing systems for software configuration management, some with far more users than Nix. However, we believe that Nix has the best available implementation of 'Infrastructure as Code', not only in terms of features, but also in its design and potential.

If you treat your deployment as a program written in a functional language, then you gain the advantages of functional programming like referential transparency, safe code reuse, and the ability to unit test your deployment. At its heart, the Nix 'derivation' abstracts details away and allows software modules to be built and used in more or less the same way, regardless of their source language.

A derivation is a function which takes source code, build tools, and library dependencies, and produces an immutable software package; because Nix is pure (in the functional programming sense), if any of the inputs change, then the result is a different software package. Alternatively, if a derivation with exactly the same inputs has already been built elsewhere (by continuous integration (CI), for example), Nix will just download that package, rather than building the derivation itself, reducing duplication of work.

The way Nix calculates whether a derivation input has changed is by taking its cryptographic hash. Dependencies are recursive, so this hash is actually a hash of hashes — a powerful concept instantly recognizable to blockchain enthusiasts.

How we use Nix

Here are some examples of how we are using Nix at IOHK.

1. Deployment of the Cardano SL network

We run around 200 compute nodes in AWS for the various testnets, the Cardano SL Byron mainnet, and the Cardano SL staging deployment (an environment used to test upgrades).

These are all NixOS systems deployed with NixOps.

We have written a tool called iohk-ops, which generates a NixOps network from an abstract topology description. The topology description describes the number of Cardano SL relay nodes, which data center they are hosted in, and static routes between peers. iohk-ops calls on NixOps to execute the deployment.

The same tool allows deployment of smaller 'developer clusters' which are used for testing activities. These clusters share the same deployment code, and identical software versions, but are smaller and can be started and stopped as necessary.

The rollback features of NixOS and NixOps have already proved invaluable several times in our staging deployments: the whole network can be redeployed with a single command, or just specific nodes.

With NixOps we have the ability to apply configuration tweaks and code patches, then rebuild at deploy-time, confident that the result will be identical to what comes from CI, and can be undone without leaving a mess behind.

2. Reproducible deployment

Extreme measures such as deploy-time code modification are usually not required, because the deployment environment can be reproduced and tested locally, ensuring any problems are caught well in advance.

As mentioned earlier, a Nix derivation is a pure function. The code can be run locally with exactly the same environment and libraries (up to the Linux kernel) as are used in production.

3. Fast rebuilds of Docker images

Docker's strengths include its immutability, encapsulation, and reproducibility, traits shared by Nix. However, it is severely limited by its data model and restricted DSL.

The structure of software dependencies is a tree, not a sequence of layers, as modelled by Docker. A typical annoyance when building Docker images with a Dockerfile is that if a layer near the bottom changes, then everything above must be rebuilt. Furthermore, unless steps are taken to use multiple build stages, intermediate build products from lower layers will be included in the final image.

In contrast, a Nix package will only be rebuilt if its dependencies (or their dependencies) have changed. The minimum closure of a Nix package contains only what is necessary to run that software.

Thus we can generate Docker images from the closure of Nix derivations which are minimal and fast to rebuild. The build process can also be parameterized by defining it as a function in the Nix language.

4. Declarative deployment of macOS build slaves

To produce the macOS installer packages of Daedalus, we maintain a small collection of macOS systems (hosted by MacinCloud). These systems were quite tedious to maintain until we started using nix-darwin. Nix-darwin allows declarative configuration of macOS systems in a similar way to NixOS.

We wrote a script to set up the Mac through SSH. All things are managed through Nix code, including the build environment, monitoring, and SSH access.

5. Cross-compiling Cardano SL for Windows

To produce the Windows installer packages of Daedalus, we needed to rely on resource-limited build instances of CI providers, or maintain a small collection of Windows systems. The Windows CI infrastructure has been a constant source of problems for us.

However, we are now able to cross-compile Haskell code for Windows on Linux! The next major version of Daedalus for Windows will ship with a cardano-node.exe that was built with Nix on Linux.

Due to the size of Cardano SL, this is quite an achievement, and the result of several months of hard work.

6. Development environments

With nix-shell, we can provide development environments containing all tools and libraries necessary to build a project, pinned to precisely defined versions.

In cardano-sl, the Nix shell supports either the stack workflow, or cabal new-build with Haskell pre-built dependencies downloaded from the IOHK binary cache.

In daedalus, the Nix shell provides yarn, and a method of running a Cardano SL blockchain and wallet locally.

Developers and users who are not familiar with build tools from other languages such as Cabal, Stack (Haskell), NPM, Yarn (Javascript), or Cargo (Rust), are still able to rebuild (or download a cached build from CI), just by running nix build.

There are other tools for providing reproducible development environments, such as Apache Maven, or Docker, but these are not easily composable. Because Nix is a functional language, development environments are the application of functions, which can be composed. So with Nix you can write a function that combines development environments, or just the parts that you need.

Future directions

2019 will be an exciting year for IOHK, with several ambitious projects in the pipeline. We anticipate that Nix will play a large part in the success of these projects. Some of the things we have planned are:

Pure Daedalus build

We currently build the Daedalus installers for Windows and macOS using a Haskell script. This is slower than necessary and needs to run on a Windows or macOS system: we would prefer to build on Linux with Nix so that software components can be cached as necessary.


IOHK is splitting the Cardano SL code base into independent modules, which must be integrated into a final product build. We have started a repository of common Nix code iohk-nix that can be re-used by each Cardano SL subproject.

As with all languages, Nix code can be written in a beautiful or an awful way — and because Nix does not have a static type system, the awful code is really bad. So part of this project will be an effort to improve the legibility and documentation of our Nix code so that it's easy to understand and maintain.

Tools for deploying a node or stake pool

In 2019, Cardano SL will be decentralized. We would like to make it super easy for technical users to deploy one or more Cardano SL nodes, or operate a stake pool.

This will include:

  • Smoothing the rough corners of iohk-ops to make it ready for public use.
  • Providing a variety of build options such as Docker images, or fully statically-linked executables.
  • Using the tools available in NixOS to provide a turn-key monitoring setup for all the necessary application-level metrics.

nix-tools Haskell Builder

The Nix Haskell build infrastructure needed some significant rework in order to cross-compile Haskell to target Windows, mostly to track more information about compiler flags and dependencies.

The result is a more powerful Haskell builder, and a new implementation of stack2nix which runs faster, among other things. This Haskell build infrastructure is called nix-tools. We aim to fill in the missing parts and upstream it to Nixpkgs in the coming year.

This subject is so interesting that it deserves its own blog post, which we are looking forward to posting soon.


This was a quick run-through of how we use Nix to help develop and deploy Cardano.

It's not all excellent of course. Users find that Nix has a steep learning curve, and that the community is relatively small. We also have our own wish-list of features and bug fixes for Nix that are not implemented yet.

If you are interested in learning more about Nix, then visit the NixOS website. If you like coding or system administration, it's very easy to start contributing to the NixOS project. It takes a little while to get oriented with the Nix language, but then you'll find that pretty much everything you need to know is in a single git repository.

At IOHK we regularly contribute back to the NixOS project. These contributions include improvements to packages and NixOS modules, and contracting developers to make larger improvements. Additionally, simply by using NixOS at a large scale in research and commercial projects, we contribute to the ecosystem — by publishing our source code under a free licence, submitting bug reports, or sharing our experiences with the community.

We are hiring

The IOHK team is growing, and we are looking for the right people to join us. If helping develop a blockchain platform more advanced than any other is something that interests you, then check out our job listings and get in touch!

Creative Commons
Mike Beeple

Cardano is secure against fake stake attacks

Peer-reviewed design means Ouroboros is free from a flaw affecting many proof-of-stake blockchains

29 January 2019 Philipp Kant 6 mins read

Cardano is secure against fake stake attacks

Ada is not among the 26 cryptocurrencies identified by US researchers last week as being vulnerable to ‘fake stake’ attacks.

The Cardano blockchain underlying Ada is based on proof-of-stake (PoS), but its Ouroboros protocol uses no bitcoin code and is not affected by the PoSv3 problem.
This is not just good luck, but a consequence of the thorough, formally-verified approach taken during Cardano’s development.

The vulnerability

The vulnerability is explained very well in the original article. In order to understand why Cardano is not affected by it, we will summarise the essence of the vulnerability here.

All the vulnerable systems are using PoSv3, a modification of the bitcoin code that aims to replace hashing power with stake for the purpose of determining who is eligible to create a block. In the original bitcoin code, the decision of who gets to create the next block is based purely on hashing power: whoever manages to find a suitable random number, and thus get a correct hash first, wins. PoSv3, however, adds an additional variable, to simulate the notion of stake.

In a PoS system, the likelihood of getting to create a block is proportional to how much stake a user has in the system: the more stake a user has, the more likely it is that they get to create the next block. To mimic this functionality, PoSv3 allows users to add additional information to their candidate block, in the form of a ‘staking transaction’. The more tokens they have available to use in their staking transaction, the easier it becomes for them to get a correct hash, and thus earn the right to create the next block.

Whilst PoSv3 does successfully tie block creation rights to stake in this way, it also makes block validation more difficult. Not only does the hash of the block itself need to be verified (as in bitcoin), but so does a user’s staking transaction: that is, did the user actually own the tokens they used in their staking transaction? To verify this information, a blockchain node has to be able to refer to the ledger, and – if a block does not simply extend the current chain but introduces a fork – also the history of the ledger. Since that is neither cached nor cheap to calculate, blocks in PoSv3 systems are not validated immediately, but are rather (at least partially) stored in memory or on disk when they pass some heuristics.

The vulnerabilities discussed in the original article can be exploited in a number of ways, but ultimately involve fooling those heuristics and presenting lots of invalid blocks to a node, such that the node runs out of memory and crashes before it can correctly identify that the blocks are invalid.

Why Cardano is different

For Cardano, IOHK took a different approach. Instead of finding a minimal variation of bitcoin, we relied on world-leading academics and researchers to create a new protocol and codebase from scratch, with the requirement that it should provide equivalent (or better) security guarantees than bitcoin, but rely entirely on stake. The result is the Ouroboros protocol

, the first provably secure PoS protocol, upon which Cardano is built.

The basic design of Ouroboros is remarkably simple: time is divided into discrete increments, called slots, and slots are grouped into longer periods, called epochs. At the start of each epoch, a lottery determines who gets to create a block for every slot. Instead of this lottery being implicit, ie whoever gets a right hash first wins, the lottery is explicit: a generated random number determines a slot leader for each slot, and the chances of winning for any given slot are proportional to the stake one controls


In this protocol, validating that a block has been signed by the right stakeholder is also simple: it requires only the leader schedule for the current epoch (which will not change in case of a temporary fork), and the checking of a signature. This can and will be done by each node once they get the block header, in contrast to the PoSv3 systems that are vulnerable to fake stake attacks.

In short: Cardano is secure against fake stake attacks because it’s based on a fundamentally different system. PoSv3 cryptocurrencies run on proof-of-work (PoW) systems, modified to take stake into account in the implicit leader election, and the vulnerability in question is a result of that modification, and the additional complexities it involves.

Not only does Cardano have a fundamentally different foundation, but that foundation is the result of multiple peer-reviewed academic papers, and an unprecedented collaboration between researchers and developers. The formal and semi-formal methods involved in creating the upcoming Shelley release of Cardano ensure that its construction at code level evidently matches the protocol described in the peer-reviewed research papers, building in reliability and security by design – and avoiding the problems of PoSv3, which have arisen as a result of modifying an existing protocol instead of creating a thoroughly proven, bespoke protocol like Ouroboros.


1. ‘“Fake Stake” attacks on chain-based Proof-of-Stake cryptocurrencies’ by Sanket Kanjalkar, Yunqi Li, Yuguang Chen, Joseph Kuo, and Andrew Miller of the Decentralized Systems Lab at the University of Illinois at Urbana-Champaign.

2. To be precise, the following discussion is targeted at the upcoming Shelley release of Cardano. The currently deployed Byron release is running in a federated setting, and thereby operationally protected from this kind of attack anyway.

3. There are by now a number of variations of the Ouroboros protocol. We describe only the classic version of Ouroboros here, but the general argument holds for all variants – in particular for Ouroboros Praos, which will be the protocol used in the Shelley release.

4. To be precise, leader election for a given epoch uses the stake distribution at a point in time before the epoch starts, to prevent grinding attacks and a re-calculation of the schedule in case of a temporary fork at the epoch boundary.

Creative Commons
Edan Kwan

An in-depth look at quickcheck-state-machine

28 January 2019 Edsko de Vries 46 mins read

An in-depth look at quickcheck-state-machine

Please note: this post originally appeared on the Well-Typed blog. Stateful APIs are everywhere: file systems, databases, widget libraries, the list goes on. Automated testing of such APIs requires generating sequences of API calls, and when we find a failing test, ideally shrinking such a sequence to a minimal test case. Neither the generation nor the shrinking of such sequences is trivial. After all, it is the very nature of stateful systems that later calls may depend on earlier calls: we can only add rows to a database table after we create it, we can only write to a file after we open it, etc. Such dependencies need to be tracked carefully. Moreover, in order to verify the responses we get back from the system, the test needs to maintain some kind of internal representation of what it thinks the internal state of the system is: when we read from a file, we need to know what was in the file in order to be able to verify if the response was correct or not.

In this blog post we will take an in-depth look at quickcheck-state-machine, a library for testing stateful code. Our running example will be the development of a simple mock file system that should behave identically to a real file system. Although simple, the example will be large enough to give us an opportunity to discuss how we can verify that our generator is producing all test cases we need, and how we can inspect whether the shrinker is doing a good job; in both cases, test case labelling will turn out to be essential. Throughout we will also discuss design patterns for quickcheck-state-machine tests which improve separation of concerns and reduce duplication. It should probably be pointed out that this is an opinionated piece: there are other ways to set things up than we present here.

We will not show the full development in this blog post, and instead focus on explaining the underlying concepts. If you want to follow along, the code is available for download. We will assume version 0.6 of quickcheck-state-machine, which was recently released. If you are using an older version, it is recommended to upgrade, since the newer version includes some important bug fixes, especially in the shrinker.

Introducing the running example

Our running example will be the development of a simple mock file system; the intention is that its behaviour is identical to the real file system, within the confines of what it needs to support. We will represent the state of the file system as

data Mock = M {
    dirs  :: Set Dir
  , files :: Map File String
  , open  :: Map MHandle File
  , next  :: MHandle

type MHandle = Int

emptyMock :: Mock
emptyMock = M (Set.singleton (Dir [])) Map.empty Map.empty 0

We record which directories (folders) exist, the contents of all files on the system, the currently open handles (where mock handles are just integers), and the next available mock handle. To avoid confusion between files and directories we do not use FilePath but instead use

data Dir  = Dir [String]
data File = File {dir :: Dir, name :: String}

As one example, here is the mock equivalent of readFile:

type MockOp a = Mock -> (Either Err a, Mock)

mRead :: File -> MockOp String
mRead f m@(M _ fs hs _)
  | alreadyOpen               = (Left Busy         , m)
  | Just s <- Map.lookup f fs = (Right s           , m)
  | otherwise                 = (Left DoesNotExist , m)
    alreadyOpen = f `List.elem` Map.elems hs

We first check if there is an open handle to the file; if so, we disallow reading this file (“resource busy”); if the file exists, we return its content; otherwise we report a “does not exist” error. The implementation of the other mock functions is similar; the full API is

mMkDir :: Dir               -> MockOp ()
mOpen  :: File              -> MockOp MHandle
mWrite :: MHandle -> String -> MockOp ()
mClose :: MHandle           -> MockOp ()
mRead  :: File              -> MockOp String

Finally, we should briefly talk about errors; the errors that the mock file system can report are given by

data Err = AlreadyExists | DoesNotExist | HandleClosed | Busy

and they capture a subset of the IO exceptions


fromIOError :: IOError -> Maybe Err
fromIOError e =
    case ioeGetErrorType e of
      GHC.AlreadyExists    -> Just AlreadyExists
      GHC.NoSuchThing      -> Just DoesNotExist
      GHC.ResourceBusy     -> Just Busy
      GHC.IllegalOperation -> Just HandleClosed
      _otherwise           -> Nothing


Typically we are developing some stateful code, and then write a pure (mock) implementation of the same thing to test it, making sure that the stateful implementation and the simpler pure model compute the same things. Here we are doing the opposite: we are adjusting the model (the mock file system) to match what the real file system does. Either way, the process is the same: we write tests, execute them both in the real thing and in the model, and compare results.

If we were writing unit tests, we might write tests such as

  • Write to two different files
  • Write to a file and then read it
  • etc.

However, as John Hughes of QuviQ – and one of the original authors of QuickCheck – likes to point out, as systems grow, it becomes impossible to write unit tests that test the interaction between all the features of the system. So, don’t write unit tests. Instead, generate tests, and verify properties.

To generate tests for our mock file system, we have to generate sequences of calls into the API; “open this file, open that file, write to the first file we opened, …”. We then execute this sequence both against the mock file system and against the real thing, and compare results. If something goes wrong, we end up with a test case that we can inspect. Ideally, we should then try to reduce this test to try and construct a minimal test case that illustrates the bug. We have to be careful when shrinking: for example, when we remove a call to open from a test case, then any subsequent writes that used that file handle must also be removed. A library such as quickcheck-state-machine can be used both to help with generating such sequences and, importantly, with shrinking them.

Reifying the API

It is important that we generate the test before executing it. In other words, the test generation should not depend on any values that we only get when we run the test. Such a dependency makes it impossible to re-run the same test multiple times (no reproducible test cases) or shrink tests to obtain minimal examples. In order to do this, we need to reify the API: we need to define a data type whose constructors correspond to the API calls:

data Cmd h =
    MkDir Dir
  | Open File
  | Write h String
  | Close h
  | Read File

Cmd is polymorphic in the type of handles h; this is important, because we should be able to execute commands both against the mock file system and against the real file system:

runMock ::       Cmd MHandle -> Mock -> (.. Mock)
runIO   :: .. -> Cmd IO.Handle -> IO ..

What should the return type of these functions be? After all, different functions return different things: Open returns a new handle, Read returns a string, the other functions return unit. To solve this problem we will simply introduce a union type

for successful responses

data Success h = Unit () | Handle h | String String

A response is then either a succesful response or an error:

newtype Resp h = Resp (Either Err (Success h))

It is now easy to implement runMock: we just map all the constructors in Cmd to the corresponding API calls, and wrap the result in the appropriate constructor of Success:

runMock :: Cmd MHandle -> Mock -> (Resp MHandle, Mock)
runMock (MkDir d)   = first (Resp . fmap Unit)   . mMkDir d
runMock (Open  f)   = first (Resp . fmap Handle) . mOpen  f
runMock (Write h s) = first (Resp . fmap Unit)   . mWrite h s
runMock (Close h)   = first (Resp . fmap Unit)   . mClose h
runMock (Read  f)   = first (Resp . fmap String) . mRead  f

where first :: (a -> b) -> (a, x) -> (b, x) comes from Data.Bifunctor.

We can write a similar interpreter for IO; it will take a FilePath as an additional argument that it will use as a prefix for all paths; we will use this to run the IO test in some temporary directory.

runIO :: FilePath -> Cmd IO.Handle -> IO (Resp IO.Handle)


Our interpreter for IO takes real IO handles as argument; but will not have any real handles until we actually run the test. We need a way to generate commands that run in IO but don’t use real handles (yet). Here is where we see the first bit of infrastructure provided by quickcheck-state-machine, references:

data Reference a r = Reference (r a)

where we will instantiate that r parameter to either Symbolic or Concrete:


data Symbolic a = Symbolic Var
data Concrete a = Concrete a

In other words, a Reference a Concrete is really just a wrapper around an a; indeed, quickcheck-state-machine provides

reference :: a -> Reference a Concrete
concrete  :: Reference a Concrete -> a

However, a Reference a Symbolic is a variable:

newtype Var = Var Int

An example of a program using symbolic references is

openThenWrite :: [Cmd (Reference IO.Handle Symbolic)]
openThenWrite = [
      Open (File (Dir []) "a")
    , Open (File (Dir []) "b")
    , Write (Reference (Symbolic (Var 0))) "Hi"

This program corresponds precisely to our example from earlier: “open this file, open that file, then write to the first file we opened”. Commands can return as many symbolic references in their result values as they want

; in our simple example, only Open creates a new reference, and so Var 0 returns to the handle returned by the first call to Open.

When we execute the test, those variables will be instantiated to their real values, turning symbolic references into concrete references. We will of course not write programs with symbolic references in them by hand; as we will see later, quickcheck-state-machine provides infrastructure for doing so.

Since we will frequently need to instantiate Cmd and Resp with references to handles, we will introduce some special syntax for this:

newtype At f r = At (f (Reference IO.Handle r))
type    f :@ r = At f r

For example, here is a wrapper around runIO that we will need that executes a command with concrete references:

semantics :: FilePath -> Cmd :@ Concrete -> IO (Resp :@ Concrete)
semantics root (At c) = (At . fmap reference) <$>
                          runIO root (concrete <$> c)

This is really just a call to runIO, with some type wrapping and unwrapping.

Relating the two implementations

When we run our tests, we will execute the same set of commands against the mock implementation and in real IO, and compare the responses we get after each command. In order to compare, say, a command “write to this MHandle” against the mock file system to a command “write to this IOHandle” in IO, we need to know the relation between the mock handles and the IO handles. As it turns out, the most convenient way to store this mapping is as a mapping from references to the IO handles (either concrete or symbolic) to the corresponding mock handles.

type HandleRefs r = [(Reference IO.Handle r, MHandle)]

(!) :: Eq k => [(k, a)] -> k -> a
env ! r = fromJust (lookup r env)

Then to compare the responses from the mock file system to the responses from IO we need to keep track of the state of the mock file system and this mapping; we will refer to this as the model for our test:

data Model r = Model Mock (HandleRefs r)

initModel :: Model r
initModel = Model emptyMock []

The model must be polymorphic in r: during test generation we will instantiate r to Symbolic, and during test execution we will instantiate r to Concrete.

Stepping the model

We want to work towards a function

transition :: Eq1 r => Model r -> Cmd :@ r -> Resp :@ r -> Model r

to step the model; we will gradually build up towards this. First, we can use the model to translate from commands or responses in terms of references to the corresponding commands or responses against the mock file system:

toMock :: (Functor f, Eq1 r) => Model r -> f :@ r -> f MHandle
toMock (Model _ hs) (At fr) = (hs !) <$> fr

Specifically, this can be instantiated to

toMock :: Eq1 r => Model r -> Cmd :@ r -> Cmd MHandle

which means that if we have a command in terms of references, we can translate that command to the corresponding command for the mock file system and execute it:

step :: Eq1 r => Model r -> Cmd :@ r -> (Resp MHandle, Mock)
step m@(Model mock _) c = runMock (toMock m c) mock

In order to construct the full new model however we also need to know how to extend the handle mapping. We can compute this by comparing the response we get from the “real” semantics (Resp :@ r) to the response we get from the mock semantics (from step), and simply zip the handles from both responses together to obtain the new mapping. We wrap all this up into an event:

data Event r = Event {
    before   :: Model  r
  , cmd      :: Cmd :@ r
  , after    :: Model  r
  , mockResp :: Resp MHandle

and we construct an event from a model, the command we executed, and the response we got from the real implementation:

lockstep :: Eq1 r
         => Model   r
         -> Cmd  :@ r
         -> Resp :@ r
         -> Event   r
lockstep m@(Model _ hs) c (At resp) = Event {
      before   = m
    , cmd      = c
    , after    = Model mock' (hs <> hs')
    , mockResp = resp'
    (resp', mock') = step m c
    hs' = zip (toList resp) (toList resp')

The function we mentioned at the start of this section is now easily derived:

transition :: Eq1 r => Model r -> Cmd :@ r -> Resp :@ r -> Model r
transition m c = after . lockstep m c

as well as a function that compares the mock response and the response from the real file system and checks that they are the same:

postcondition :: Model   Concrete
              -> Cmd  :@ Concrete
              -> Resp :@ Concrete
              -> Logic
postcondition m c r = toMock (after e) r .== mockResp e
    e = lockstep m c r

We will pass this function to quickcheck-state-machine to be run after every command it executes to make sure that the model and the real system do indeed return the same responses; it therefore does not need to be polymorphic in r. (Logic is a type introduced by quickcheck-state-machine; think of it as a boolean with some additional information, somewhat similar to QuickCheck’s Property type.)

Events will also be very useful when we label our tests; more on that later.

Constructing symbolic responses

We mentioned above that we will not write programs with symbolic references in it by hand. Instead what will happen is that we execute commands in the mock file system, and then replace any of the generated handles by new variables. Most of this happens behind the scenes by quickcheck-state-machine, but we do need to give it this function to construct symbolic responses:

symbolicResp :: Model Symbolic
             -> Cmd :@ Symbolic
             -> GenSym (Resp :@ Symbolic)
symbolicResp m c = At <$> traverse (const genSym) resp
    (resp, _mock') = step m c

This function does what we just described: we use step to execute the command in the mock model, and then traverse the response, constructing a new (fresh) variable for each handle. GenSym is a monad defined in quickcheck-state-machine for the sole purpose of generating variables; we won’t use it anywhere else except in this function.

Generating commands

To generate commands, quickcheck-state-machine requires a function that produces the next command given the current model; this function will be a standard QuickCheck generator. For our running example, the generator is easy to write:

generator :: Model Symbolic -> Maybe (Gen (Cmd :@ Symbolic))
generator (Model _ hs) = Just $ QC.oneof $ concat [
    , if null hs then [] else withHandle
    withoutHandle :: [Gen (Cmd :@ Symbolic)]
    withoutHandle = [
          fmap At $ MkDir <$> genDir
        , fmap At $ Open  <$> genFile
        , fmap At $ Read  <$> genFile

    withHandle :: [Gen (Cmd :@ Symbolic)]
    withHandle = [
          fmap At $ Write <$> genHandle <*> genString
        , fmap At $ Close <$> genHandle

    genDir :: Gen Dir
    genDir = do
        n <- QC.choose (0, 3)
        Dir <$> replicateM n (QC.elements ["x", "y", "z"])

    genFile :: Gen File
    genFile = File <$> genDir <*> QC.elements ["a", "b", "c"]

    genString :: Gen String
    genString = QC.sized $ \n -> replicateM n (QC.elements "ABC")

    genHandle :: Gen (Reference IO.Handle Symbolic)
    genHandle = QC.elements (map fst hs)

A few comments on the generator:

  • When we generate paths, we choose from a very small set of directory and file names. We are not really interested in testing that, for example, our implementation is Unicode-safe here; by picking from a small known set we generate tests that are more likely to contain multiple references to the same file, without having to add special provisions to do so.
  • We cannot, of course, generate handles out of nowhere; fortunately, the model tells us which handles we have available, and so genHandle just picks one at random from that. We do not limit the generator to picking open handles: it is important to also test the behaviour of the system in the error case where a program attempts to write to a closed handle.
  • The elements function from QuickCheck, which picks a random element from a list, is partial: it will throw an error if the list is empty. We must be careful not to generate any commands requiring handles when we have no handles yet.
  • The reason for the Maybe in the type signature is that in some circumstances a generator might be unable to generate any commands at all given a particular model state. We don’t need to take advantage of this feature and therefore always return Just a generator.

We can also define a shrinker for commands, but we can start with a shrinker that says that individual commands cannot be shrunk:

shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ _ = []

Putting it all together

In order to actually start generating and running tests, quickcheck-state-machine wants us to bundle all of this functionality up in a single record:

sm :: FilePath -> StateMachine Model (At Cmd) IO (At Resp)
sm root = StateMachine {
      initModel     = initModel
    , transition    = transition
    , precondition  = precondition
    , postcondition = postcondition
    , invariant     = Nothing
    , generator     = generator
    , distribution  = Nothing
    , shrinker      = shrinker
    , semantics     = semantics root
    , mock          = symbolicResp

We have defined all of these functions above, with the exception of precondition. When quickcheck-state-machine finds a failing test, it will try to shrink it to produce a smaller test case. It does this by trying to remove commands from the program and then checking if the resulting program still “makes sense”: does the precondition of all commands still hold?

The precondition should be as liberal as possible; for instance, the precondition should not require that a file handle is open before writing to it, because we should also check that errors are handled correctly. Typically, the only thing the precondition should verify is that commands do not refer to non-existent handles (i.e., if we remove a call to open, then subsequent uses of the handle returned by that call to open simply cannot be executed anymore). Thus, we will define:

precondition :: Model Symbolic -> Cmd :@ Symbolic -> Logic
precondition (Model _ hs) (At c) =
    forall (toList c) (`elem` map fst hs)

Aside. Although we do not need it for our running example, one other thing a precondition might do is rule out test cases that we explicitly don’t want to test. For example, if our mock file system throws an error in some cases because a particular combination of arguments to a call is simply not supported, then we don’t want to flag this as a bug. A clean way to implement this is to extend the error type with a field that marks an error as an intentional limitation; then the precondition can simply rule out any commands that (in the model) would return an error with this mark. This keeps the definition of the precondition itself simple, and the logic of what is and what isn’t an intentional limitation lives in the implementation of the model itself.

Running the tests

Apart from some additional type class instances (see the code), we are now ready to define and execute the actual test:

prop_sequential :: FilePath -> QC.Property
prop_sequential tmpDir =
    forAllCommands (sm rootUnused) Nothing $ \cmds ->
      QC.monadicIO $ do
        tstTmpDir <- liftIO $ createTempDirectory tmpDir "QSM"
        let sm' = sm tstTmpDir
        (hist, _model, res) <- runCommands sm' cmds
        prettyCommands sm' hist
          $ checkCommandNames cmds
          $ res QC.=== Ok

rootUnused :: FilePath
rootUnused = error "root not used during command generation"

All functions prefixed QC are standard QuickCheck functions. The others come from quickcheck-state-machine:

  • forAllCommands uses QuickCheck’s forAllShrink and instantiates it with the command generation and shrinking infrastructure from quickcheck-state-machine.
  • runCommands then executes the generated commands, validating the postcondition at every step.
  • prettyCommands renders those commands in case of a test failure.
  • checkCommandNames adds some statistics about the distribution of commands in the generated tests.

We can run the test in ghci:

> quickCheck (prop_sequential "./tmp")
+++ OK, passed 100 tests:
56% coverage

 3% [("MkDir",1),("Read",1)]
 3% []
 2% [("MkDir",1),("Open",1),("Read",1)]
 2% [("MkDir",1)]
 1% [("Close",1),("MkDir",1),("Open",5),("Read",4),("Write",2)]

It tells us that all tests passed, and gives us some statistics about the tests that were executed: in 3% of the cases, they contained a single MkDir and a single Read, 3% were completely empty, 2% contained one call to MkDir, one call to Open, one call to Read, and so on.


At this point you might conclude that you’re done. We have the real implementation, we have the mock implementation, they return identical results for all tests, what else could we want?

Let’s think back to those unit tests we were on the verge of writing, but stopped just in time because we remembered that we should generate unit tests, not write them:

  • Write to two different files
  • Write to a file and then read it

How do we know that our generated tests include these two cases (and all the other unit tests that we would have written)? We get some statistics from quickcheck-state-machine, but it’s not terribly helpful. For example, the first line above tells us that 3% of our test cases contain one call to MkDir and one call to Read; but we know that that call to Read must fail, because if these are the only two commands we execute, there aren’t any files for that Read to read.

The solution is to label tests. For example, we might introduce labels, or tags, that correspond to the two unit tests above:

data Tag = OpenTwo | SuccessfulRead

We then need to write a function that looks at a particular test case and checks which labels apply. It is important to realize that this does not mean we are bringing back the same unit tests under a different guise: programs that will be labelled OpenTwo must write to two different files, but may also do a whole bunch of other things.

We can use use the foldl package to write such a labelling function in a convenient manner. The labelling function will look at the Events and produce Tags. To check if we open (at least) two different files, we keep track of all the successful calls to open; for SucessfulRead we simply remember if we have seen a call to Read with a non-error result:

tag :: [Event Symbolic] -> [Tag]
tag = Foldl.fold $ catMaybes <$> sequenceA [
    , successfulRead
    openTwo :: Fold (Event Symbolic) (Maybe Tag)
    openTwo = Fold update Set.empty extract
        update :: Set File -> Event Symbolic -> Set File
        update opened ev =
            case (cmd ev, mockResp ev) of
              (At (Open f), Resp (Right _)) ->
                Set.insert f opened
              _otherwise ->

        extract :: Set File -> Maybe Tag
        extract opened = do
            guard (Set.size opened >= 2)
            return $ OpenTwo

    successfulRead :: Fold (Event Symbolic) (Maybe Tag)
    successfulRead = Fold update False extract
        update :: Bool -> Event Symbolic -> Bool
        update didRead ev = didRead ||
            case (cmd ev, mockResp ev) of
              (At (Read _), Resp (Right _)) ->
              _otherwise ->

        extract :: Bool -> Maybe Tag
        extract didRead = do
            guard didRead
            return SuccessfulRead

(For a read to be successful, we must have created the file first – this is a property of the semantics, we don’t need to enforce this in the labeller.)

The commands we get back from the forAllCommands function of quickcheck-state-machine are of type Commands. This is a simple wrapper around a list of Command; Command in turn bundles a command (Cmd) along with its response (and the variables in that response). We can therefore easily turn this into a list of events:

execCmd :: Model Symbolic
        -> Command (At Cmd) (At Resp)
        -> Event Symbolic
execCmd model (Command cmd resp _vars) =
    lockstep model cmd resp

execCmds :: Commands (At Cmd) (At Resp) -> [Event Symbolic]
execCmds = \(Commands cs) -> go initModel cs
    go :: Model Symbolic
       -> [Command (At Cmd) (At Resp)]
       -> [Event Symbolic]
    go _ []       = []
    go m (c : cs) = e : go (after e) cs
        e = execCmd m c

We can then define a variant on prop_sequential above that replaces checkCommandNames with our own labelling function (you could also use both, if you wanted to):

prop_sequential' :: FilePath -> QC.Property
prop_sequential' tmpDir =
    forAllCommands (sm rootUnused) Nothing $ \cmds ->
      QC.monadicIO $ do
        tstTmpDir <- liftIO $ createTempDirectory tmpDir "QSM"
        let sm' = sm tstTmpDir
        (hist, _model, res) <- runCommands sm' cmds
        prettyCommands sm' hist
          $ QC.tabulate "Tags" (map show $ tag (execCmds cmds))
          $ res QC.=== Ok

Here tabulate is the standard QuickCheck function for adding multiple labels to a test case. If we now re-run our tests, we get

> quickCheck (prop_sequential' "./tmp")
+++ OK, passed 100 tests.

Tags (57 in total):
63% OpenTwo
37% SuccessfulRead

The numbers here are a bit awkward to interpret: it says that across all tests a total of 57 labels were found, and of those 57 labels, 63% were OpenTwo and 37% were SuccessfulRead. In other words, 63% 57 = 36 tags were OpenTwo (36 tests out of 100 were labelled as OpenTwo), and 37% 57 = 21 tags were SuccessfulRead (21 tests out of 100 were labelled as SuccessfulRead). Note that it is perfectly possible for these two sets of tests to overlap (i.e., a single program can be tagged as both OpenTwo and SuccessfulRead).

Inspecting the labelling function

So we have a generator, and we have labels for the unit tests that we didn’t write; have we covered our bases now? Well, how do we know that our labelling function is correct? We might seem to descent into infinite regress here, testing the tests that tests the tests… but it would be good to at least have a look at some test case examples and how they are labelled. Fortunately, this is functionality that QuickCheck provides out of the box through a function called labelledExamplesWith. We can define a simple wrapper around it that gives us the possibility to specify a particular seed (which will be useful once we start working on shrinking):

showLabelledExamples :: Maybe Int -> IO ()
showLabelledExamples mReplay = do
    replaySeed <- case mReplay of
                    Nothing   -> getStdRandom $ randomR (1, 999999)
                    Just seed -> return seed

    putStrLn $ "Using replaySeed " ++ show replaySeed

    let args = QC.stdArgs {
            QC.maxSuccess = 10000
          , QC.replay     = Just (QC.mkQCGen replaySeed, 0)

    QC.labelledExamplesWith args $
      forAllCommands (sm rootUnused) Nothing $ \cmds ->
        repeatedly QC.collect (tag . execCmds $ cmds) $

Instead of tabulate we must use collect to label examples when constructing examples; collect takes a single label at a time though, so we use my all-time favourite combinator to repeatedly call collect for all tags:

repeatedly :: (a -> b -> b) -> ([a] -> b -> b)
repeatedly = flip . List.foldl' . flip

(I wonder if I can patent this function?)

When we run this, we see something like

> showLabelledExamples Nothing
Using replaySeed 288187
*** Found example of OpenTwo
Commands [
    .. Open (File (Dir []) "b")
  , .. Open (File (Dir []) "a")

*** Found example of SuccessfulRead
Commands [
    .. Open (File (Dir []) "b")
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Read (File (Dir []) "b")

(I’ve tidied up the output a bit and removed some redundant information.)

This is looking good: both of these look like the kind of examples we would have written ourselves for these tags.

Standard shrinking

In fact, if we look at those test cases at the end of the previous section, you might be thinking that those examples look surprisingly good: not only are they indeed instances of those tags, but they are very small test cases: they are pretty much the unit tests that we would have written if we were so misguided as to think we need to write unit tests. Were we simply lucky?

No, we were not. QuickCheck’s labelledExamples not only searches for labelled test cases, it also tries to shrink them when it finds one, and continues to shrink them until it can shrink them no further without losing the label. The shrinker itself is implemented in quickcheck-state-machine; if we disable it altogether and re-run the search for labelled examples, we might find an example such as the following for SuccessfulRead, where for clarity I’ve marked all failing commands with an asterisk (*):

Commands [
    .. Open (File (Dir ["z", "x", "y"]) "c")    (*)
  , .. MkDir (Dir ["z", "x")                    (*)
  , .. Read (File (Dir ["z", "y", "y"]) "c")    (*)
  , .. Read (File (Dir []) "c")                 (*)
  , .. MkDir (Dir [ "x" , "x" ])                (*)
  , .. Read (File (Dir ["x", "z", "x"]) "b")    (*)
  , .. MkDir (Dir ["y"])
  , .. MkDir (Dir [])                           (*)
  , .. Open (File (Dir ["z"]) "b")              (*)
  , .. Open (File (Dir []) "a")
  , .. MkDir (Dir [])                           (*)
  , .. Open (File (Dir ["x"]) "b")              (*)
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Open (File (Dir ["x", "y"]) "b")         (*)
  , .. Open (File (Dir []) "b")
  , .. MkDir (Dir ["y"])                        (*)
  , .. Read (File (Dir []) "a")
  , .. MkDir (Dir ["z"])
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Close (Reference (Symbolic (Var 1)))
  , .. Open (File (Dir ["z" , "z" , "y"]) "a")  (*)
  , .. Open (File (Dir ["x" , "x" , "z"]) "c")  (*)
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Open (File (Dir ["y", "y"]) "b")         (*)
  , .. Read (File (Dir ["x", "y", "x"]) "a")    (*)

This is looking significantly less ideal! If there was a bug in read, then this would certainly not be a very good minimal test case, and not something you would want to debug. So how does quickcheck-state-machine shrink tests? The basic approach is quite simple: it simply removes commands from the program. If the resulting program contains commands whose precondition is not satisfied (remember, for our running example this means that those commands would use handles that are no longer created) then it discards it as a possible shrink candidate; otherwise, it reruns the test, and if it still fails, repeats the process.

The large percentage of commands that are unsuccessful can easily be removed by quickcheck-state-machine:

Commands [
    .. MkDir (Dir ["y"])
  , .. Open (File (Dir []) "a")
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Open (File (Dir []) "b")
  , .. Read (File (Dir []) "a")
  , .. MkDir (Dir ["z"])
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Close (Reference (Symbolic (Var 1)))
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Close (Reference (Symbolic (Var 0)))

Both calls to MkDir can easily be removed, and the resulting program would still be tagged as SuccessfulRead; and the same is true for repeated closing of the same handle:

Commands [
    .. Open (File (Dir []) "a")
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Open (File (Dir []) "b")
  , .. Read (File (Dir []) "a")
  , .. Close (Reference (Symbolic (Var 1)))

At this point the shrinker cannot remove the second call to Open because the second call to Close depends on it, but it can first remove that second call to Close and then remove that second call to Open, and we end up with the minimal test case that we saw in the previous section:

Commands [
    .. Open (File (Dir []) "a")
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Read (File (Dir []) "a")


Improving shrinking

Unfortunately, this does not mean that we can depend on quickcheck-state-machine to solve all our shrinking problems. Consider this run of showLabelledExamples:

> showLabelledExamples (Just 166205)
Using replaySeed 166205
*** Found example of OpenTwo
Commands [
    .. MkDir (Dir ["x"])
  , .. Open (File (Dir [])    "c")
  , .. Open (File (Dir ["x"]) "a")

This is indeed an example of a program in which we open at least two files; however, why is that call to MkDir still there? After all, if there was a bug in opening more than one file, and the “minimal” test case would include a call to MkDir, that would be a red herring which might send the person debugging the problem down the wrong path.

The reason that quickcheck-state-machine did not remove the call to MkDir, of course, is that without it the second call to Open would fail: it tries to create a file in directory x, and if that directory does not exist, it would fail. To fix this, we need to tell quickcheck-state-machine how to shrink individual commands; so far we have been using

shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ _ = []

which says that individual commands cannot be shrunk at all. So how might we shrink an Open call? One idea might be to shrink the directory, replacing for instance /x/y/a by /x/a or indeed just /a. We can implement this using

shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ (At cmd) =
    case cmd of
      Open (File (Dir d) f) ->
        [At $ Open (File (Dir d') f) | d' <- QC.shrink d]
      _otherwise ->

If we use this shrinker and run showLabelledExamples a number of times, we will find that all the examples of OpenTwo are now indeed minimal… until it finds an example that isn’t:

> showLabelledExamples (Just 980402)
Using replaySeed 980402
*** Found example of OpenTwo
Commands [
    .. MkDir (Dir ["x"]))
  , .. Open (File (Dir [])    "a")
  , .. Open (File (Dir ["x"]) "a")

In this example we cannot shrink the directory to [] because the resulting program would try to open the same file twice, which is not allowed (“resource busy”). We need a better way to shrink this program.

What we want to implement is “try to replace the file path by some file in the (local) root. It is important to realize however that a shrinker such as

shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ (At cmd) =
    case cmd of
      Open _ -> -- BAD EXAMPLE
        [At $ Open (File (Dir []) f') | f' <- ["t1", "t2"]]
      _otherwise ->

is a bad idea. This shrinker tries to replace any file path with either /t1 or /t2. However, this means that shrinking now never stops:

Open (File (Dir [] "t1"))

can be shrunk to

Open (File (Dir [] "t2"))

which can then be shrunk back to

Open (File (Dir [] "t1"))

and QuickCheck will loop when trying to shrink the test case. It is important that there is a clear direction to shrinking.

An approach that does work is the following: any file path that doesn’t start with a t can be replaced by path /t100; moreover, any /tN (for some number N) can be replaced by /tN' for some number N’ < N:

shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ (At cmd) =
    case cmd of
      Open (File (Dir []) ('t' : n)) ->
        [openTemp n' | n' <- QC.shrink (read n)]
      Open _ ->
        [openTemp 100]
      _otherwise ->
    openTemp :: Int -> Cmd :@ Symbolic
    openTemp n = At $ Open (File (Dir []) ('t' : show n))


Commands [
    .. MkDir (Dir ["x"]))
  , .. Open (File (Dir [])    "a")
  , .. Open (File (Dir ["x"]) "a")

can shrink to

Commands [
    .. MkDir (Dir ["x"]))
  , .. Open (File (Dir []) "a")
  , .. Open (File (Dir []) "t100")

at which point the call to MkDir can be removed; this will eventually shrink down to

Commands [
    .. Open (File (Dir []) "t0")
  , .. Open (File (Dir []) "t1")

Dependencies between commands

It’s been a long road, but we are almost there. The last thing we need to discuss is how to shrink programs with dependencies. The “open at least two” example above was relatively easy to shrink because we could shrink one command at the time. Sometimes however there are dependencies between commands. For example, consider this “minimal” example of “successful read”:

> showLabelledExamples (Just 617213)
Using replaySeed 617213
*** Found example of SuccessfulRead
Commands [
    .. MkDir (Dir [ "z" ])
  , .. Open (File (Dir ["z"]) "b")
  , .. Close (Reference (Symbolic (Var 0)))
  , .. Read (File (Dir ["z"]) "b")

As before, we have a call to MkDir which should ideally be removed. However, if we tried to change the path in the call to Open, the call to Read would fail: both of these commands should refer to the same path. But shrinking can only change one command at a time, and this is important to keep the computational complexity (runtime cost) of shrinking down. What to do?

The problem is that we want that call to Read to use the same path as the call to Open, but we have no way to express this. The solution is to make this expressible. After all, we can already express “the handle returned by that call to Open”; all we need to do is introduce a second kind of reference: a reference to a path.

The language of commands changes to

data Cmd fp h = Read (Expr fp) | -- rest as before

Cmd gets an additional type argument fp to record the types of paths, and instead of a File, Read now takes an Expr as argument:

data Expr fp = Val File | Var fp

We can of course still use a concrete path, as before, but we can also use a variable: “use the same path as used in that call to open”. This means that Open must return that reference, so Success and Resp get adjusted accordingly:

data Success fp h = Handle fp h | -- rest as before
newtype Resp fp h = Resp (Either Err (Success fp h))

Just like was the case for handles, when we actually run code all variables have been resolved, so the interpreter isn’t any more difficult:

runMock :: Cmd File MHandle -> Mock -> (Resp File MHandle, Mock)
runMock (Open f) = first (Resp . fmap (Handle f)) . mOpen f
runMock (Read e) = first (Resp . fmap String)     . mRead (eval e)
-- rest as before

eval :: Expr File -> File
eval (Val f) = f
eval (Var f) = f

The IO interpreter is modified similarly. Most of the rest of the changes to the code are minor and mostly automatic. For example, At must now instantiate both parameters

newtype At f r = At (f (Reference File r) (Reference IO.Handle r))

the model must record the mapping from file references now too

type FileRefs r = [(Reference File r, File)]
data Model    r = Model Mock (FileRefs r) (HandleRefs r)

See Version2.hs in the example package for details.

Crucially, we can now take advantage of this in the shrinker: when we see a call to Read with a file path that we have seen before, we can shrink that to use a variable instead:

shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker (Model _ fs _) (At cmd) =
    case cmd of
      Read (Val f) ->
        [At $ Read (Var r) | r <- mapMaybe (matches f) fs]
      -- other cases as before
    matches :: File -> (r, File) -> Maybe r
    matches f (r, f') | f == f'   = Just r
                      | otherwise = Nothing

This means that the problematic example


> showLabelledExamples (Just 617213)
Using replaySeed 617213
*** Found example of SuccessfulRead
Commands [
    .. MkDir (Dir [ "z" ])
  , .. Open (File (Dir ["z"]) "b")
  , .. Close (Reference (Symbolic (Var 1)))
  , .. Read (Val (File (Dir ["z"]) "b"))

can now shrink to

Commands [
    .. MkDir (Dir [ "z" ])
  , .. Open (File (Dir ["z"]) "b")
  , .. Close (Reference (Symbolic (Var 1)))
  , .. Read (Var 0)

At this point the shrinking for Open that we defined above can kick in, replacing z/b with /t100, making the call to MkDir redundant, and the example can ultimately shrink to

Commands [
    .. Open (File (Dir []) "t0")
  , .. Close (Reference (Symbolic (Var 1)))
  , .. Read (Var 0)


Writing unit tests by hand is problematic for at least two reasons. First, as the system grows in complexity the number of interactions between the various features will be impossible to capture in hand written unit tests. Second, unit tests can only exercise the system in ways that the author of the unit tests foresees; often, bugs arise when the system is exercised in ways that were not foreseen. It is therefore important not to write unit tests, but rather generate them. We have seen how a library such as quickcheck-state-machine can assist in generating and shrinking meaningful sequences of API calls.

We have also seen why it is important to label test cases, how to inspect the labelling function, and how to use labelling to improve shrinking. Without writing a shrinker for individual commands, the standard quickcheck-state-machine shrinker already does a really good job at removing redundant commands. However, if we are serious about producing minimal test cases without red herrings that might lead debugging efforts astray (and we should be serious about that), then we also need to put some thought into shrinking individual commands.

Finally, we have seen how we can improve shrinking by making dependencies between commands explicit. This also serves as an example of a language with multiple types of references; the approach we put forth in this blog post essentially scales to an arbitrary number of types of references without much difficulty.

We have not talked about running parallel tests at all in this blog post. This is an interesting topic in its own right, but affects only the very outermost level of the test: prop_sequential would get an analogue prop_parallel, and nothing else would be affected; the quickcheck-state-machine documentation shows how to do this; the original paper describing the approach (in Erlang) by John Hughes and others is also well worth a read. Finally, quickcheck-state-machine is not the only library providing this functionality in Haskell; in particular, hedgehog, an alternative to QuickCheck, does also.

The way we set up the tests in this blog post is not the only way possible, but is one that we believe leads to a clean design. The running example in this blog post is a simplified version of a mock file system that we use in the new Ouroboros consensus layer, where we use it to simulate file system errors when testing a blockchain database. The tests for that blockchain database in turn also use the design patterns laid out in this blog post, and we have used those design patterns also in a test we added to quickcheck-state-machine itself to test the shrinker. Of course, being Haskellers, we would prefer to turn design patterns into actual code; indeed, we believe this to be possible, but it may require some more advanced type system features. This is something we want to investigate further.


1. Mapping from `IllegalOperation` to `HandleClosed` is bit too coarse, but suffices for the sake of this blog post.

2. This has a somewhat unpleasant untyped feel to it. However, if the design patterns proposed in this blog post are used, this barely affects any code that we write: we never (or almost never) have to pattern match on that `Success` type.

3. I’m eliding a `Typeable` constraint here.

4. Technically, `quickcheck-state-machine` uses a function `Resp :@ Symbolic -> [Var]` to find all the references “bound” by a command.

5. The `Close` call now refers to `Var 1` instead of `Var 0`, because `Open` now creates _two_ variables: the reference to the path and the reference to the handle.

Creative Commons
Mike Beeple