Testing is of course critically important to a cryptocurrency because the correctness and robustness of the system are what you rely on to keep your money safe, and ensure that you can spend it when you need to. So as you would expect, as our development team is getting ready for the Cardano mainnet release, testing is one of the main things that is on our minds.
There are many different ways in which we test Cardano and in this post we will talk about several.
Testing can be divided into two main kinds: functional and non-functional:
Functional testing is about checking that all the system's components meet their specifications.
Functional testing is done with components on their own, in which case we call it unit testing or component testing. It is also done with all the components together, in which case we call it integration testing or system testing.
These kinds of tests are typically of the form: given some scenario, and certain inputs, the component or system produces the correct output or takes the correct next action.
Non-functional testing reveals "how" the system behaves, including the performance of the system, the resources it uses and how the system behaves when under great load or attack.
We have a few major parts of the Cardano system: the core, the wallet backend and the Daedulus frontend. Different parts of the system are appropriate to test in different ways.
The most visible form of testing is of course the public testnets where we ask users to try the system out. This is a kind of beta testing. This is just the tip of the iceberg compared to all the testing we do internally, but it is still very useful because it covers a different set of problems compared to our internal tests.
Users have a huge variety of desktop computers, both in hardware and configuration. It is impossible for us to test all the combinations that our users have. So having lots of real users try out the system really helps to find those strange combinations where something does not work well, and gives us the confidence that we will not bump into similar problems for mainnet.
A testnet release helps us test usability of the system: our websites, the installers, the Daedulus interface and how many cryptocurrency concepts people need to know to use the system.
There is no escaping the fact that a public testnet release is in some ways more "real" than any test situations we can construct artificially. Though we can certainly push the system to breaking point using our internal stress tests, there are complexities of a real world deployment that are hard to replicate in an artificial test.
Finally, it also helps our team practice making public releases, which helps us work out the kinks in our processes so that we can avoid problems during the mainnet release or later updates. And it's not just our developers and technical operations teams, a successful launch also depends on our communications and support teams. The very process of getting questions, feedback and problem reports from users during the testnet phases helps us to make sure that our support teams have the right procedures in place so that we can be confident that they can help everyone effectively during and after the mainnet launch.
Automatic unit and component testing
We have an increasing collection of fully automatic functional tests that cover various important parts of the logic in the core and wallet backend. These are functional tests in that they check that each component meets its specification.
These tests are run automatically by our continuous integration system, which means they are run before any change to the code is accepted into our master branch. This helps to protect us against introducing regressions.
Wherever possible we make use of property based testing, rather than simple individual unit tests. Classic unit tests for a component tend to simply use a specific set of inputs and check that a specific output is produced. To comprehensively test a component in this style often requires a large number of specific pairs of input and expected output. This is laborious and tends to miss corner cases that programmers do not think of. By contrast, property based testing involves taking the component's high level specification and reformulating the specification as an executable property. That means that for any specific inputs the property can actually be executed to check that the property is true for those inputs. These properties are expressed in the same programming language as the code being tested. The technique then involves checking the property on hundreds or thousands of test inputs. The technique is to use systematic random generation to produce test inputs. This means that programmers do not have to think of lots of test inputs and it avoids human bias. So it tends to give much better test coverage with less effort.
Specifically, we use the QuickCheck system for property based testing. Perhaps the greatest advantage is that it makes developers think in terms of the specification and properties of their code, rather than individual inputs and outputs. This is a much higher level way of thinking about code and helps to produce simpler more reliable code.
System level tests and performance tests
While all the unit and component testing gives us confidence that each part of the system works ok on its own, system level tests are to check if the parts all work together as a whole.
For this we have to set up a cluster of machines and configure them to run the blockchain protocol together. The main functional test that we use works like this: we have a special transaction generator program that constructs tens of thousands of transactions and submits them to nodes in the cluster. The code is instrumented to record certain key events in a log file, such as when each transaction reaches each node. We let this run for around an hour. At the end of the run we have a tool that analyses the blockchain and the log files from all the nodes. This checks that all the transactions that were submitted did make it into the blockchain. It also checks if there were any unexpected forks in the blockchain or missing blocks. In normal conditions there will be no forks or missing blocks.
We can use the same basic approach to test the system when we deliberately attack it, such as taking out nodes, or preventing nodes from talking to each other for a while. In this case we expect temporary forks or missing blocks, but we can check that the system recovers properly.
We use the same basic approach for non-functional performance tests. We adjust the transaction generator to submit transactions at a higher rate to stress the system and see how high we can push the throughput before it hits a bottleneck. We can also check that even though the system has hit its maximum capacity it continues to function in a stable way.
Throughput, meaning transactions per second, is important but so is latency. By latency we mean how long it takes for a transaction to get into the blockchain. Our analysis tool can also determine the distribution of latency. A low latency with little variance shows us that transactions are flowing smoothly to the nodes that create blocks and that those nodes are creating blocks on time.
Our Daedalus frontend team have a fully automated set of tests that cover every function of the user interface. In turn this also tests every interaction between the wallet frontend and wallet backend. So this also gives us an automatic integration test for the combination of the wallet frontend and backend.
Frontend testing is a bit different from most other testing. Most testing works by a test program directly using a program interface, whereas frontend testing requires interacting with an actual human interface. User interface testing frameworks simulate what a real user does: clicking buttons and typing in web form boxes.
The result is actually rather fascinating to watch: it's as if an invisible robot is sitting at the computer typing and clicking very quickly to set up accounts, send transactions and all the other things.
Counterintuitively, when it comes to cryptography and security -- which cryptocurrencies of course rely on completely -- testing is in fact not very effective. Testing usually shows us that the expected things do work, but it's hard to use normal testing to show that unexpected things cannot happen. And showing that some hacker cannot subvert the system is just the kind of thing that is hard to test for.
The solution is not testing but auditing by experts in cryptography and security. This means experts carefully reviewing the designs to check that the arguments for why the system should be safe are sound, and also reviewing the code to make sure the code matches up with the design.
Of course, the basic design for the proof of stake blockchain used in Cardano has already been peer reviewed by academic cryptographers. There are other parts of the system that we have had to develop in the last year -- beyond just the blockchain -- and the most security critical parts of those have been reviewed by our research team, and also by an external security audit team. Additionally, the security audit team have reviewed many of the most important parts of the code to check that the code matches the design.
A cryptocurrency system is a surprisingly complex piece of software and it has to work correctly, be robust to deliberate attacks and have good performance. Of course Cardano is a new from-scratch cryptocurrency, not based on any existing system, so all of it has to be carefully tested or reviewed.
Hopefully this post has given you some insight into how much is involved in testing Cardano, and how serious we are about security, robustness and performance.
Developing a secure proof of stake algorithm is one of the big challenges in cryptocurrency, and a proposed solution to this problem won the attention of the academic community this week in California. Several hundred cryptographers from around the world arrived at the University of California Santa Barbara on Sunday for the flagship annual event of their field, Crypto 2017. Over several days, they present cutting edge research for the scrutiny of their peers, while in the evenings they continue discussions with friends and colleagues over dinner on the university campus, with the inspiring backdrop of the Santa Ynez mountains meeting the Pacific ocean behind them.
Ouroboros, developed by a team led by IOHK chief scientist Aggelos Kiayias, made it through a tough admission process for the prestigious conference. This year, 311 papers were submitted and of those 72 were accepted. Only three papers at the conference were on the subject of blockchain. All three papers were supported by IOHK funding.
Speaking after his presentation, Professor Kiayias said: "We’re very happy that we had the opportunity to present Ouroboros at the conference. The protocol and especially its security analysis were very well received by fellow cryptographers."
"Our next steps will be to focus on the next version of the protocol, Ouroboros Praos which improves even further the security and performance characteristics of the protocol."
The Ouroboros protocol stands out as the first proof of stake algorithm that is provably secure, meaning that it offers security guarantees that are mathematically proven. This is essential for a protocol that is intended to be used in cryptocurrency, an infrastructure that must be relied on to carry billions of dollars worth of value. In addition to security, if blockchains are going to become infrastructure for new financial systems they must be able to comfortably handle millions of users. The key to scaling up is proof of stake, a far more energy efficient and cost effective algorithm, and as such this research represents a significant step forward in cryptography. Ouroboros also has the distinction of being implemented – the protocol will be an integral part of Cardano, a blockchain system currently in development.
There were two other papers presented at the bitcoin session on Monday. The Bitcoin Backbone Protocol with Chains of Variable Difficulty, was produced by a team of three researchers and included Prof Kiayias. It is a continuation of previous research into Bitcoin, which was itself the first work to prove security properties of its blockchain.
A third paper on the subject of bitcoin was presented, Bitcoin as a Transaction Ledger: A Composable Treatment.
Other notable talks at the conference included a presentation by John Martinis, an expert on quantum computing and former physics professor at the University of California Santa Barbara, who is now working at Google to build a quantum computer.
Leading cryptographers at the conference included Whitfield Diffie, pioneer of the public key cryptography that made Bitcoin possible, and Ron Rivest, Adi Shamir, and Leonard Adleman, who came up with the RSA public-key cryptosystem that is widely used for secure data transmission.
Mantis – Ethereum Classic Beta Release
A command line interface client for the ETC community
8 August 2017 4 mins read
We are excited to announce that there is now an Ethereum (ETH) client built specifically for the Ethereum Classic (ETC) community. The release of this beta client, Mantis, will take place today and is the culmination of seven months of work by the Grothendieck Team, the IOHK developers dedicated to Ethereum Classic. There are three reasons for the client. First, IOHK wants to demonstrate that it has the technical competency and culture to be a leader for the development of ETC. Second, IOHK wants to dispel the myth that ETC is a “copy and paste” coin that uses other people's code, and show that it is an independent and viable alternative to Ethereum. Third, the client is built in Scala, which is a functional programming language that offers security guarantees that other languages do not.
This release is comprised of the four functional milestones we have been working on since January.
- Blockchain download
- Transaction execution
- Command and query interface
- Mining integration
However, please be aware that this is an early release – the important thing is that we get the Mantis client into the hands of ETC community members who can provide valuable feedback. Tell us what you think, and how we can improve Mantis. We want to stress that this is not yet production ready and has not been optimized for performance, so there will be bugs. **Anyone using the beta release of the Mantis client should be using it on a testnet only, please do not use the Mantis client with actual funds.**
These are some of the features that made it into the beta release for the Mantis client:
Connect the Mist browser to the Mantis client over HTTP.
We have tested the application on recent versions of Linux (16.02), Mac OS (El Capitan, Sierra) and Windows (10, 8).
Testnet and Private Chain Support
The client supports synchronizing with the Morden testnet and also creating private chains.
Our client uses neatly formatted configuration files in the “conf” folder to configure the client, all the keys and values have descriptions to help the user optimize the client's utility.
We are also able to include a “Fast Sync” feature in this release. From start-up, the Mantis client (using default settings) will attempt to discover existing ETC nodes on the internet and fast sync the ETC chain from them. Fast Sync is fantastic feature for a blockchain client because it downloads a recent snapshot of the blockchain and this speeds up the process of setting up a properly functioning full node. It also downloads the entire blockchain history to have this available to other peers on request. Fast Sync is faster, and more convenient than downloading all of the blocks from peers, although this is also supported and can be switched using a flag in the configuration file.
Although Fast Sync is quicker, it is still slow by today's internet standards. For those who would like to get a node synchronized as fast as possible a “bootstrap database” has been provided. This database contains the whole chain up until August 2nd 2017. Users can download this large file, unzip it in their data folder and then start the Mantis client.
The Mantis client is now being passed into the hands of technically savvy community members. Enthusiasts, who are comfortable with a command line interface and are willing to install code that has not been fully tested, will have a lot of fun using the Mantis client and can provide us with valuable feedback. We would like to encourage anyone with the necessary technical skills to try out the client and report any bugs to the ETC Slack channel.
We will have more updates and news coming soon, and will share our progress with the community in the upcoming weeks. Please stay tuned for more details!
Cardano: Resilient and Scalable by Design
System performance engineering so DevOps may soundly sleep
3 August 2017 10 mins read
What we expect from traditional providers of financial services such as banks is both security (my money is safe) and responsiveness (I can move my money in and out at will in a timely fashion). The days when banks delivered such services using legions of clerks writing in double-entry ledgers are long gone – nowadays it’s all done by software, and so the security and performance of such systems is critical to a bank’s reputation. Banks invest heavily in their computing and network infrastructure and personnel to mitigate this risk (for example, we happen to know the Head of Unix System Engineering at a major international bank is paid a LOT more than any of us!). Customer expectations are high, tolerance of poor performance is low, and it is the poor DevOps who end up dealing with the consequences of any deficiencies and emergent instabilities. Another advantage that traditional banks have is periods when they aren’t expected to be fully operational – after local markets close, for example, and during public holidays. As global infrastructure, Cardano needs to run both continuously and indefinitely. Its resulting performance needs to be acceptable even in the presence of hardware or software failures and cyber-attacks – and it must do all this without constant maintenance from a large DevOps team.
This requires the application of the emerging discipline of distributed systems performance engineering to anticipate and mitigate the issues associated with long-term, continuous and scalable operation. This combines failure-mode effects analysis with stochastics (which uses probability and randomness over real-time) to model the impact of both resource-sharing (for example in packet networks and virtualised infrastructure) and the possibility of failures and exceptions. Of course, it’s natural to ask: if the software correctly implements the specification, how can there be failures? What have we done wrong? The answer is that we haven’t done anything wrong, it’s just that we’re not operating in a closed environment. The real world is an open environment, many elements of which are not under our control. Messages between components of Cardano may be lost or corrupted, either by accident or malicious intent; VMs running such components may crash or be starved of resources; and DoS attacks may exhaust resources. Even if our code is perfect, the world in which it runs is not.
Performance engineering can be used in a post-hoc way to assess the expected performance and resource consumption of an existing system, but for Cardano we’ll use it to help guide design decisions in the re-implementation of the network layer. In a previous blog, Duncan Coutts of Well-Typed, and Cardano's Director of Engineering, talked about how formal methods can help to ensure that a design decision doesn’t break the top-level specification of what the system should (and should not) do; what performance engineering adds is an assessment of whether such a design decision moves us closer to (or further away from) meeting the resilience, performance and scalability targets for the eventual deployment.
Current state of the art
With the exception of “hard real time” systems such as anti-lock brake controllers, it’s rare to see performance, resilience and robustness treated as first-class citizens in the software development lifecycle (SDLC). Even where such properties are considered, this typically occurs late in the SDLC. Performance, in particular, is regarded as something that will “emerge” after the design, and much, if not all, of the implementation has been done. Although robustness, resilience and performance are closely linked, let’s focus on performance, as this is the most widely misunderstood.
In the academic world, system steady-state performance has been widely studied using queueing theory. Approaches tend to take a resource-centric view of the system, focusing on the utilisation/idleness of individual components. Where job/customer performance is considered (such as in mean-value analysis or Jackson/BCMP networks) it is in the context of “steady-state” and “averages”. Thus these methods cannot deliver metrics such as the distribution of the system’s response time to a stimulus, or the probability that such a response will occur within a specified time.
Meanwhile, in today’s customer experience-centric, performance-critical service-delivery environments, such metrics are essential. An end customer doesn’t care how efficient the system is, only how long it takes to process her transaction; and an acceptable average does not compensate for the disappointment of a particular transaction taking a hundred times too long!
This has led academic research into the characterisation of “passage-times”, i.e. the time taken for a system to follow a particular path to a state, that path being characteristic of an outcome. Such a style of analysis has been combined with stochastic/probabilistic algebras to generate tools that can be applied in the SDLC, such as PEPA and PRISM. These are retrospective validation tools, operating on fully specified systems, that will give probabilistic measures of outcomes for certain classes of system under steady-state assumptions.
However, constructing a large-scale system such as Cardano is expensive, and no-one wants to iterate large parts of the design just because the required performance is not achieved and/or the resources consumed are uneconomic. Mitigating that risk requires an approach that supports both prospective and retrospective validation and verification. It needs to be able to capture performance requirements, to validate them, to construct performance properties/invariants that “witness” those requirements, and to support the reification and abstraction of such properties/invariants throughout the SDLC. In other words, the analysis approach needs to be composable at all points in the SDLC, a property which all of the other approaches above lack with respect to performance.
A composable approach
Composability is the key to managing complexity in the SDLC. The principle of composability is as follows: the meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them. For composable properties, what is “true” about small subsystems (e.g. their timeliness, their resource consumption) tells us what is “true” about their (appropriately constructed) combination. Conversely, it means that there is an invariant that must hold (e.g. timeliness, aspects of functional correctness) over the reified components of the system.
This is the same as checking functional correctness by breaking down a top-level specification into a number of component specifications and proving that the combination meets the top-level spec.
Engagement with the general notion of composability, and the associated improvement in productivity, can be seen in the increasing tendency of leading ICT practitioners (e.g. Google, Facebook, WhatsApp, leading banks’ real-time trading systems – and of course, Cardano) to use functional/declarative programming approaches such as Haskell for their key systems. Such approaches are improving the verification and validation (V&V) of functional aspects of software systems; composable performance engineering represents a similar step-change in the V&V of the “non-functional” aspects of performance and resource consumption.
PNSol has developed a framework around a composable measure of performance that we call “∆Q”. This enables a new development process that is composable with respect to both performance hazards (i.e. time to complete and probability of non-completion/divergence) and aspects of resource consumption (e.g. CPU time, network/interconnect capacity). PNSol represent the operational semantics with a stochastic process algebra (using a combination of improper random variables and serial-parallel flow graphs), to capture both communication and computational behaviour. This approach has a supporting software library/API that PNSol has been using for more than 10 years in consultancy engagements, which supports both symbolic and concrete representations of the metrics of interest, helping to capture design and operational uncertainties as part of the SDLC.
This approach also helps to pinpoint performance sensitivities, i.e. to reveal which parameters have the most impact on the eventual system performance. The DevOps team can then know what to measure, track, and trend in order to have early warning of performance or resource consumption problems, and hence can get some sleep from time to time!
Applying the ∆Q Framework
To apply this in practice we need to first establish some “quantifiable intent”, that is to say, to set bounds on the performance of some observable behaviour of the system. An observable is something that starts and finishes; in Cardano we might think about submitting a transaction and seeing it embedded in the blockchain, although a simpler and more familiar example would be clicking a button on a web page and getting some response. The quantified intent for that web server response might be something like: 50% of the time it should take less than 2s; 95% of the time it should take less than 5s; 99.9% of the time it should take less than 20s. Note that we allow the possibility that it might fail altogether – technically this means we represent the observable with an “improper random variable” – which is very important for dealing with the real world, in which things can (and do) fail. Taking proper account of this allows us to design systems that degrade gracefully rather than collapsing apparently arbitrarily. The next step is to extract from the design what other observables the initial one depends on, typically transfers of information across a network and computations using that information (each of which also consumes some resources). Given the way these observables are causally related (called the serial-parallel flow graph, SPFG), the ∆Q framework allows us to combine their performance distributions to calculate the resulting distribution for the original observable, and to check whether it meets the original intent (here is a worked example). If it doesn’t meet this intent, we may need to tighten the distributions for some of the component observables, or change the design to alter the SPFG. Note that we can either apply this approach top-down (as a set of performance “budgets”) or bottom-up (some elements such as network delays may not be changeable), or use a combination of the two. We can also treat a whole subsystem as delivering a single observable, and then break the delivery of that observable into its constituent parts, thus iterating the whole process – this makes the approach composable, as discussed above.
At the same time we can add up the distributions of resource consumption to obtain not merely averages but also probabilities of thresholds being exceeded. Once the relationship between delivered performance and resource consumption is properly modelled, it is straightforward to address issues of scalability, exception/failure propagation and/or mitigation, and the impact of correlations in demand (discussed in more detail here).
Applying this to something as complex as Cardano-SL will be a challenging project, but will enable us to address the issues of robustness, resilience and performance with our eyes wide open – resulting in an economic and appropriately scaled solution.