Enter the Hydra: scaling distributed ledgers, the evidence-based way
Learn about Hydra: the multi-headed ledger protocol
26 March 2020 Prof Aggelos Kiayias 10 mins read
Scalability is the greatest challenge to blockchain adoption. By applying a principled, evidence-based approach, we have arrived at a solution for Cardano and networks similar to it: Hydra. Hydra is the culmination of extensive research, and a decisive step in enabling decentralized networks to securely scale to global requirements.
What is scalability and how do we measure it?
Scaling a distributed ledger system refers to the capability of providing high transaction throughput, low latency, and minimal storage per node. These properties have been repeatedly touted as critical for the successful deployment of blockchain protocols as part of real-world systems. In terms of throughput, the VISA network reportedly handles an average of 1,736 payment transactions per second (TPS) with the capability of handling up to 24,000 TPS and is frequently used as a baseline comparison. Transaction latency is clearly desired to be as low as possible, with the ultimate goal of appearing instantaneous to the end-user. Other applications of distributed ledgers have a wide range of different requirements in terms of these metrics. When designing a general purpose distributed ledger, it is natural to strive to excel on all three counts.
Deploying a system that provides satisfactory scaling for a certain use case requires an appropriate combination of two independent aspects: adopting a proper algorithmic design and deploying it over a suitable underlying hardware and network infrastructure.
When evaluating a particular algorithmic design, considering absolute numbers in terms of specific metrics can be misleading. The reason is that such absolute quantities must refer to a particular underlying hardware and network configuration which can blur the advantages and disadvantages of particular algorithms. Indeed, a poorly designed protocol may still perform well enough when deployed over superior hardware and networking.
For this reason, it is more insightful to evaluate the ability of a protocol to reach the physical limits of the underlying network and hardware. This can be achieved by comparing the protocol with simple strawman protocols, in which all the design elements have been stripped away. For instance, if we want to evaluate the overhead of an encryption algorithm, we can compare the communication performance of two end-points using encryption against their performance when they simply exchange unencrypted messages. In such an experiment, the absolute message-per-second rate is unimportant. The important conclusion is the relative overhead that is added by the encryption algorithm. Moreover, in case the overhead approximates 0 for some configuration of the experimental setup, we can conclude that the algorithm approximates the physical limits of the underlying network’s message-passing ability for that particular configuration, and is hence optimal in this sense.
Hydra – 30,000-feet view
Hydra is an off-chain scalability architecture for distributed ledgers, which addresses all three of the scalability challenges mentioned above: high transaction throughput, low latency, and minimal storage per node. While Hydra is being designed in conjunction with the Ouroboros protocol and the Cardano ledger, it may be employed over other systems as well, provided they share the necessary salient characteristics with Cardano.
Despite being an integrated system aimed at solving one problem – scalability – Hydra consists of several subprotocols. This is necessary as the Cardano ecosystem itself is heterogenous and consists of multiple entities with differing technical capabilities: the system supports block producers with associated stake pools, high-throughput wallets as used by exchanges, but also end-users with a wide variety of computational performance and availability characteristics. It is unrealistic to expect that a one-shoe-fits-all, single-protocol approach is sufficient to provide overall scalability for such a diverse set of network participants.
The Hydra scalability architecture can be divided into four components: the head protocol, the tail protocol, the cross-head-and-tail communication protocol, as well as a set of supporting protocols for routing, reconfiguration, and virtualization. The centerpiece is the 'head' protocol, which enables a set of high-performance and high-availability participants (such as stake pools) to very quickly process large numbers of transactions with minimal storage requirements by way of a multiparty state channel – a concept that generalizes two-party payment channels as implemented in the context of the Lightning network. It is complemented by the 'tail' protocol, which enables those high-performance participants to provide scalability for large numbers of end-users who may use the system from low-power devices, such as mobile phones, and who may be offline for extended periods of time. While heads and tails can already communicate via the Cardano mainchain, the cross-head-and-tail communication protocol provides an efficient off–chain variant of this functionality. All this is tied together by routing and configuration management, while virtualisation facilitates faster communication generalizing head and tail communication.
The Hydra head protocol
The Hydra head protocol is the first component of the Hydra architecture to be publicly released. It allows a set of participants to create an off-chain state channel (called a head) wherein they can run smart contracts (or process simpler transactions) among each other without interaction with the underlying blockchain in the optimistic case where all head participants adhere to the protocol. The state channel offers very fast settlement and high transaction throughput; furthermore, it requires very little storage, as the off-chain transaction history can be deleted as soon as its resulting state has been secured via an off–chain 'snapshot' operation.
Even in the pessimistic case where any number of participants misbehave, full safety is rigorously guaranteed. At any time, any participant can initiate the head's 'closure' with the effect that the head's state is transferred back to the (less efficient) blockchain. We emphasize that the execution of any smart contracts can be seamlessly continued on-chain. No funds can be generated off-chain, nor can any single, responsive head participant lose any funds.
The state channels implemented by Hydra are isomorphic in the sense that they make use of the same transaction format and contract code as the underlying blockchain: contracts can be directly moved back and forth between channels and the blockchain. Thus, state channels effectively yield parallel, off-chain ledger siblings. In other words, the ledger becomes multi-headed.
Transaction confirmation in the head is achieved in full concurrency by an asynchronous off-chain certification process using multi-signatures. This high level of parallelism is enabled by use of the extended UTxO model (EUTxO). Transaction dependencies in the EUTxO model are explicit, which allows for state updates without unnecessary sequentialization of transactions that are independent of each other.
Experimental validation of the Hydra head protocol
As a first step towards experimentally validating the performance of the Hydra head protocol, we implemented a simulation. The simulation is parameterized by the time required by individual actions (validating transactions, verifying signatures, etc.), and carries out a realistic and timing-correct simulation of a cluster of distributed nodes forming a head. This results in realistic transaction confirmation time and throughput calculations.
We see that a single Hydra head achieves up to roughly 1,000 TPS, so by running 1,000 heads in parallel (for example, one for each stake pool of the Shelley release), we should achieve a million TPS. That’s impressive and puts us miles ahead of the competition, but why should we stop there? 2,000 heads will give us 2 million TPS – and if someone demands a billion TPS, then we can tell them to just run a million heads. Furthermore, various performance improvements in the implementation can improve the 1,000 TPS single head measurement, further adding to the protocol’s hypothetical performance.
So, can we just reach any TPS number that we want? In theory the answer is a solid yes, and that points to a problem with the dominant usage of TPS as a metric to compare systems. While it is tempting to reduce the complexity of assessing protocol performance to a single number, in practice this leads to an oversimplification. Without further context, a TPS number is close to meaningless. In order to properly interpret it, and make comparisons, you should at least ask for the size of the cluster (which influences the communication overhead); its geographic distribution (which determines how much time it takes for information to transit through the system); how the quality of service (transaction confirmation times, providing data to end users) is impacted by a high rate of transactions; how large and complicated the transactions are (which has an impact on transaction validation times, message propagation time, requirements on the local storage system, and composition of the head participants); and what kind of hardware and network connections were used in the experiments. Changing the complexity of transactions alone can change the TPS by a factor of three, as can be seen in the figures in the paper (refer to Section 7 – Simulations).
Clearly, we need a better standard. Is the Hydra head protocol a good protocol design? What we need to ask is whether it reaches the physical limits of the network, not a mere TPS number. Thus, for this first iteration of the evaluation of the Hydra head protocol, we used the following approach to ensure that the data we provide is properly meaningful:
- We clearly list all the parameters that influence the simulation: transaction size, time to validate a single transaction, time needed for cryptographic operations, allocated bandwidth per node, cluster size and geographical distribution, and limits on the parallelism in which transactions can be issued. Without this controlled environment, it would be impossible to reproduce our numbers.
- We compare the protocol’s performance to baselines that provide precise and absolute limits of the underlying network and hardware infrastructure. How well we approach those limits tells us how much room there would be for further improvements. This follows the methodology explained above using the example of an encryption algorithm.
We use two baselines for Hydra. The first, Full Trust, is universal: it applies to any protocol that distributes transactions amongst nodes and insists that each node validate transactions one after the other – without even ensuring consensus. This yields a limit on TPS by simply adding the message delivery and validation times. How well we approach this limit tells us what price we are paying for consensus, without relying on comparison with other protocols. The second baseline, Hydra Unlimited, yields a TPS limit specifically for the head protocol and also provides the ideal latency and storage for any protocol. We achieve that by assuming that we can send enough transactions in parallel to completely amortize network round-trip times and that all actions can be carried out when needed, without resource contention. The baseline helps us answer the question of what can be achieved under ideal circumstances with the general design of Hydra (for a given set of values of the input parameters) as well as evaluate confirmation latency and storage overhead against any possible protocol. More details and graphs for those interested can be found in our paper (again, Section 7 – Simulations).
What comes next?
Solving the scalability question is the holy grail for the whole blockchain space. The time has come to apply a principled, evidence-based approach in designing and engineering blockchain scalability solutions. Comparing scalability proposals against well-defined baselines can be a significant aide in the design of such protocols. It provides solid evidence for the appropriateness of the design choices and ultimately leads to the engineering of effective and performant distributed ledger protocols that will provide the best possible absolute metrics for use cases of interest. While the Hydra head protocol is implemented and tested, we will, in time, release the rest of the Hydra components following the same principled approach.
As a last note, Hydra is the joint effort of a number of researchers, whom I'd like to thank. These include Manuel Chakravarty, Sandro Coretti, Matthias Fitzi, Peter Gaži, Philipp Kant, and Alexander Russel. The research was also supported, in part, by EU Project No.780477, PRIVILEDGE, which we gratefully acknowledge.
In a proof of stake (PoS) blockchain protocol, the ledger is maintained by the stakeholders that hold assets in that ledger. This allows PoS blockchains to use less energy compared with proof of work (PoW) or other types of blockchain protocols. Nevertheless, this requirement imposes a burden on stakeholders. It requires a good number of them to be online and maintain sufficiently good network connectivity that they can collect transactions and have their PoS blocks reach the others without substantial network delays. It follows that any PoS ledger would benefit from reliable server nodes that hold stake and focus on maintenance.
The argument for stake pools
Wealth is typically distributed according to a power-law such as the Pareto distribution, so running reliable nodes executing the PoS protocol may be an option only for a small, wealthy, subset of stakeholders, leaving most without the ability to run such services. This is undesirable; it would be better if everyone had the ability to contribute to ledger maintenance. An approach to rectify this problem is by allowing the creation of stake pools. Specifically, this refers to the ability of stakeholders to combine their stake and form a single entity, the stake pool, which can engage in the PoS protocol using the total stake of its members. A pool will have a manager who will be responsible for running the service that processes transactions. At the same time, the pool manager should not be able to spend the stake that their pool represents, while members who are represented by the pool should be free to change their mind and reallocate their stake if they wish to another pool. Finally, and most importantly, any stakeholder should be able to aspire to become a stake pool manager.
Participating in PoS ledger maintenance incurs costs. Certainly not as high as in the case of a PoW protocol but, nevertheless, still significant. As a result, it is sensible that the community of all stakeholders incentivizes in some way those who support the ledger by setting up servers and processing transactions. This can be achieved by a combination of contributions from those that use the ledger (in the form of transaction fees) and inflation of the circulating supply of coins (by introducing new coins in circulation to be claimed by those engaged in the protocol).
In the case of Bitcoin, we have both the above mechanisms, incentivization and pools. On the one hand, mining is rewarded by transaction fees as well as a block reward that is fixed and diminishes over time following a geometric series. On the other hand, pools can be facilitated by dividing the work required for producing blocks among many participants and using ‘partial’ PoWs (which are PoWs that are of smaller difficulty than the one indicated by the current state of the ledger) as evidence of pool participation.
It is straightforward to apply a similar type of incentivization mechanism in the PoS setting. However, one should ask first whether a Bitcoin-like mechanism (or any mechanism for that matter) would converge to a desirable system configuration. Which brings us to the important question: what are the desirable system configurations? If the only consideration is to minimize transaction processing costs, in a failure-free environment, the economically optimal configuration is a dictatorial one. One of the parties maintains the ledger as a service while all the others participate in the pool created by this party. This is clearly an undesirable outcome because the single pool leader becomes also a single point of failure in the system, which is exactly the type of outcome that a distributed ledger is supposed to avoid. It follows that the coexistence of many pools, in other words decentralization, should be a desirable characteristic of the ledger incentivization mechanism.
Reward-sharing schemes for PoS
So what would a reward-sharing scheme look like in a PoS setting? Rewards should be provided at regular intervals and pool maintenance costs should be retained by the pool manager before distributing the remaining rewards among the members. Given that it is possible to keep track of pool membership in the ledger itself using the staking keys of the participants, reward splits within each pool can be encoded in a smart contract and become part of the ledger maintenance service. First things first, pool managers should be rewarded for their entrepreneurship. A pool creation certificate posted on the ledger will declare a profit margin to be shaved off the pool’s rewards after subtracting operational costs, which should also be declared as part of the pool creation certificate. The cost declaration should be updated frequently to absorb any volatility that the native token of the system has with respect to the currency that denominates the actual costs of the pool manager. At the same time, the pool creation certificate, backed up by one or more staking keys provided by stakeholders, can declare a certain amount of stake that “stands behind” the pool and can be used either as an indication that the pool represents the genuine enterprise of one or more stakeholders or as collateral guaranteeing compliance with correct protocol behavior.
Given the above setup, how do Bitcoin-like mechanisms fare with respect to the decentralization objective? In Bitcoin, assuming everyone follows the protocol, pool rewards are split in proportion to the size of each pool. For example, a mining pool with 20% of the total hashing power is expected to reap 20% of the rewards. This is because rewards are proportional to the number of blocks obtained by the pool and the number of blocks is in turn proportional to the pool’s mining power. Does this lead to a decentralized system? Empirical evidence seems to suggest otherwise: in Bitcoin, mining pools came close (and occasionally even exceeded) the 50% threshold that is the upper boundary for ensuring the resilience of the ledger. A simple argument can validate this empirical observation in the framework of our reward-sharing schemes: if pools are rewarded proportionally to their size and pool members proportionally to their stake in the pool, the rational thing to do would be to centralize to one pool. To see this consider the following. At first, it is reasonable to expect that all players who are sufficiently wealthy to afford creating a pool will do so by setting up or renting server equipment and promoting it with the objective to attract members so that their share of rewards grows. The other stakeholders that are not pool managers will join the pool that maximizes their payoff, which will be the one with the lowest cost and profit margin. Pool competition for gaining these members will compress profit margins to very small values. But even with zero profit margin, all other pools will lose to the pool with the lowest cost. Assuming that there are no ties, this single pool will attract all stakeholders. Finally, other pool managers will realize that they will be better off joining that pool as opposed to maintaining their own because they will receive more for the stake they possess. Eventually, the system will converge to a dictatorial single pool.
Figure 1 shows a graphical representation of this. It comes from one of the numerous simulations our team has conducted in the process of distilling effective reward sharing schemes. In the experiment, a number of stakeholders follow a reactive process where they attempt to maximize their payoff based on the current system configuration. The experiment leads to a centralized single pool, validating our theoretical observations above for Bitcoin-like schemes. From a decentralization perspective, this is a tragedy of the commons: even though the participants value decentralization as an abstract concept, none of them individually wants to bear the burden of it.
A better reward sharing scheme
Clearly we have to do better than a dictatorship! A first observation is that if we are to achieve decentralization, linearity between rewards and size should taper off after a certain level. This is because, while linearity is attractive when the pool is small and wants to attract stakeholders, after a certain level it should be diminished if we want to give an opportunity for smaller pools to be more competitive. Thus, we will divide the behavior of the reward-sharing scheme depending on the size of the pool to two stages: a growth stage, when linearity is to be respected, and a stabilization stage when the pool is large enough. The point where the transition happens will be called the saturation point and the pool that has passed this point will be saturated. We can fix rewards to be constant after the saturation point, so that if the saturation point is 1%, two pools, with total stakes of 1% and 1.5%, will receive the same rewards.
To appreciate how the dynamics work from the perspective of a single stakeholder, consider the following example. Suppose there are two pools, A and B managed by Alice and Bob, with operational costs of 25 and 30 coins respectively, each one with a profit margin of 4%. Suppose further that the total rewards to be distributed are 1,000 coins and the saturation point of the reward-sharing mechanism is 20%. At a given point in time, Alice’s pool has 20% of the stake, so it is at the saturation point, while Bob’s pool is at 19%. A prospective pool member, Charlie, holds 1% of the stake and considers which pool to join. Joining Alice’s pool will bring its total stake to 21%, and because it has exceeded the saturation point the reward will be 200 coins (20% of the total rewards). Deducting operational costs will leave 175 coins to be distributed between Alice and the pool members. After removing Alice’s profit margin and considering Charlie’s relative stake in the pool, he will receive 8 coins as a reward. If Charlie joins Bob’s pool, the total rewards will be 200 coins, or 170 coins after removing the operational costs. However, given that Charlie’s stake is 5% (1/20) of the pool, it turns out that he will receive 2% more coins than if he had joined Alice’s pool. So Charlie will join Bob’s pool if he wants to maximize his rewards.
Now, let us see what happens in the case that Charlie is facing the same decision at a hypothetical earlier stage of the whole process when Alice’s pool was already at 20% of the total stake, while Bob’s pool was only at 3%. In this case, Bob has a very small pool and the total rewards available for its members are much less compared with the previous case. As a result, if Charlie did the same calculation for Bob’s pool, his 1% stake would result in a 4% total stake for the pool but, if one does the calculations, he would receive a mere 30% of the rewards that he would have obtained had he joined Alice’s pool. In such a case, the rational decision is to join Alice’s pool despite the fact that his membership will make Alice’s pool exceed the saturation point. Refer to Table 1 below for the exact figures.
Being far-sighted matters
The above appears to be contradictory. To understand what Charlie needs to do we have to appreciate the following fact. The choice of Charlie to join Alice’s pool in the second scenario is only rational in a very near-sighted (aka myopic) sense. In fact, Charlie is better off with Bob’s pool, as is demonstrated by the first scenario, as long as Bob’s pool reaches the saturation point. Thus, if Charlie believes that Bob’s pool will reach the saturation point, the rational choice should be to support it. Other stakeholders will do the same and thus Bob’s pool will rapidly reach the saturation point making everyone that participated in it better off, while also supporting the ideal of decentralization: Alice’s pool instead of constantly growing larger will stop at the saturation point and other pools will be given the ability to grow to the same size. This type of strategic thinking on behalf of the stakeholders is more far-sighted (aka non-myopic) and, as we will see, has the ability to help parties converge to desirable decentralized configurations for the system.
It is worth noting that it is unavoidable that the system in its evolution will reach pivotal moments where it will be crucial for stakeholders to exercise far-sighted thinking, as in the scenario above where Alice’s pool reaches the saturation point while other pools are still quite small. The reason is that due to the particular circumstances of each stake pool manager, the operational costs will be variable across the stakeholder population. As a result, it is to be expected that starting from a point zero where no stake pools exist, the pool with the lowest operational cost will be also the one that will be the first to grow. This is natural since low operational costs leave a higher level of rewards to be split among the pool members. It is to be expected that the system will reach moments like the second scenario above where the most competitive pool (the one of Alice with operational cost 25) has reached saturation point while the second-most competitive (the one of Bob with operational cost 30) is still at a small membership level.
One might be tempted to consider long-term thinking in the setting of a Bitcoin-like reward sharing schemes and believe that it can also help to converge to decentralization. Unfortunately, this is not the case. In a Bitcoin-like scheme, contrary to our reward-sharing scheme with a saturation point, there is no point in the development of Alice’s and Bob’s pools when Bob’s pool will become more attractive in Charlie’s view. Indeed, without a saturation point, Alice’s bigger pool will always offer more rewards to Charlie: this stems from the fact that the operational costs of Alice are smaller and hence leave more rewards for all the stakeholders. This will leave Bob’s pool without any members, and eventually, as discussed above, it will be the rational choice for Bob also to dissolve his pool and join Alice’s, making Alice the system’s dictator.
Going back to our reward-sharing scheme, we have established that non-myopic strategic thinking promotes decentralization; nevertheless, there is an important point still open. At a pivotal moment, when the non-myopic stakeholder Charlie rationally decides to forgo the option to join Alice’s saturated pool, he may have a number of aspiring pools to choose from. For instance, together with Bob’s pool that has operational costs of 30 and profit margin 4%, there could be a pool by Brenda with operational cost of 33 and profit margin 2%, and a pool by Ben with operational cost of 36 and profit margin 1%. The rational choice would be to go with the one that will reach the saturation point; is there a way to tell which one would be the best choice? In our full analysis paper we provide an explicit mechanism that orders the pools according to their desirability and, using the information recorded in the ledger about each stake pool, it can assist stakeholders in making the best possible choice at any given moment. In our example, it is Brenda’s pool that Charlie should join if he wants to maximize his rewards (see Table 1). To aid Cardano users, the pool-sorting mechanism will be built into Daedalus (and other Cardano-compatible wallets) and will provide a visual representation of the best choices available to stakeholders using the information in the ledger regarding pool registrations.
So how does our reward scheme fare with respect to decentralization? In the full analysis paper we prove that there is a class of decentralized system configurations that are “non-myopic Nash equilibria.” An equilibrium strategy here means that stakeholders have a specific way to create pools, set their profit margins and/or delegate to other pools, so that no stakeholder, thinking for the long term, is better off following a different strategy. Moreover, we demonstrate experimentally that reactive play between stakeholders with non-myopic thinking converges to this equilibrium in a small number of iterations, as shown in Figure 2.
A characteristic of our approach is that the number of pools is only part of the description of the reward-sharing scheme and thus is in no way enforced by the system on the stakeholders. This means stakeholders are free to experiment with pool creation and delegation of stake without having to conform to any predetermined system architecture. This is in contrast to other approaches taken in PoS systems such as EOS where the number of participants is a hardcoded parameter of the consensus system (specifically, 21 pools). At the same time, our approach allows the whole stakeholder set to to express its will, by freely joining and leaving pools, receiving guaranteed rewards for their participation while witnessing how their actions have a quantifiable impact on the management of the PoS distributed ledger no matter the size of their stake. This is contrast to other approaches taken in PoS systems such as Ethereum 2.0 where ledger maintenance is performed by registered validators on the basis of a collateral deposit without a built-in process of vetting by the stakeholder set.
So what would be a sensible choice for the number of pools that should be favored by the reward scheme for Cardano? Given that decentralization is our main objective, it is sensible to set this parameter to be as high as possible. Our network experiments showed that the system can still operate effectively with as many as 1,000 running pools. Choosing a saturation threshold for our reward-sharing scheme based on this number will make having a stake pool profitable even if the total stake delegated in them is as little as 0.1% of the total circulation of Ada.
Looking ahead – Sybil attacks
Given that decentralization can be achieved by a large number of independent stake pools, it is also important to see whether some decentralized system configurations are more preferable than others. As described so far in this post, our reward-sharing scheme will lead rational stakeholders towards promoting the stake pools that will incur the smallest total cost. Even though this maximizes rewards and minimizes costs, it may not be necessarily the most desirable outcome. The reason is that in the equilibrium point one may see a set of stakeholders promoted as stake pool managers who possess collectively a very small stake themselves. This imbalance, in which a small total stake represents the total stake of the system, can be detrimental in many ways: stake pool managers may be prone to corruption or bribery, or, perhaps even worse, a large stake holder may register many stake pools in the hope of controlling the whole ecosystem, performing in this way a Sybil attack that would hurt decentralization. For this reason, the reward-sharing scheme as presented in our full analysis paper is suitably modified to be sensitive to the stake backing the pool so that this type of behaviour is mitigated. We will delve deeper into this aspect of Cardano reward-sharing in the next blog post.
Artwork, Mike Beeple
TL;DR In response to recent discussions in social media, we give a brief comparison of the Ouroboros and Casper proof-of-stake protocols.
Ouroboros is a formally specified and analysed protocol with mathematically proven security guarantees based on clearly specified assumptions. The protocol description, models and proofs are all public. Hence, the underlying assumptions, the target protocol properties, and the respective correctness proofs can be publicly scrutinised. Ouroboros offers stake-based finality with the strongest possible guarantees in terms of the amount of stake backing up honest operation. It also provides a solid foundation over which services such as near instant finality of transactions can be offered in optimistic network conditions.
Regarding Casper, we are not aware of any currently published source that sufficiently describes the protocol's mode of operation nor any provable guarantees about it. Still, from what has been presented about Casper until now, as compared to Ouroboros, we can safely conclude that Casper provides much weaker guarantees in terms of how much stake the adversary needs to control in order to disrupt the protocol. Below, we compare the two protocols along several dimensions; for lack of proper documentation, many properties of Casper have to be assumed to the best of our knowledge.
In response to a discussion here and here, we give a brief comparison of the Ouroboros proof-of-stake (PoS) protocol and Casper PoS. For Ouroboros, we refer to the original version underlying the Cardano Settlement Layer (published at Crypto 2017), however most of our comments apply to later versions Ouroboros Praos and Ouroboros Genesis as well. For Casper, we primarily refer to the Casper Friendly Finality Gadget (FFG) as described in the white paper, being the most recent Casper proposal that is sufficiently descriptive to draw a full comparison (other references include Ethereum Mauve, Casper+Sharding v2.1, FFG-RPJ, Casper TBG/CBC).
Any PoS ledger consensus protocol should satisfy two fundamental properties: persistence and liveness. The first ensures that the ledger is final and immutable. The second ensures that transactions broadcasted by honest parties are eventually included in the (immutable) ledger. Such properties, typically, cannot be proven unconditionally: they will rely on certain conditions, some of them cryptographic, e.g., that digital signatures cannot be forged, while others are related to the behaviour of the participants, e.g., that the players who follow the protocol control a majority of the stake. There are other desirable properties that a PoS protocol should satisfy (such as that executing the protocol as prescribed is the only rational strategy for the participants), but persistence and liveness as defined above constitute the bare minimum pair of fundamental properties necessary for ledger consensus.
Let us now discuss some of the differences between the two protocols and their analyses.
Execution Model and Falsifiability of Claims
The Ouroboros protocol is analyzed in a model that is fully described: it unambiguously defines all the participants’ programs, their execution and interactions, their communication – including network properties – and the potential corruption by an adversarial entity of any set of parties controlling a minority of the stake. Such a model allows the formulation of mathematically precise security guarantees satisfied by any execution, such as the persistence and liveness properties proven for Ouroboros. In particular, the formal modeling of Ouroboros permits precise, quantitative statements about stake bounds and settlement times; see below. This makes all the claims we make about Ouroboros entirely concrete; there is nothing left up to interpretation or reader perspective. Without such a model (notably missing in the Casper FFG white paper or in any other available sources related to Casper), it is impossible to prove the correctness of any claims about the protocol. Consensus protocols, in general, are complex objects; designing them without the development of rigorous mathematical arguments that establish the required properties can prove to be precarious as prior practice in secure systems design has shown. Good design intuition and best effort are just not sufficient when a ledger consensus protocol is supposed to carry assets worth billions.
A comprehensive solution to PoS ledger consensus
Given the above, it is important to appreciate that the Ouroboros protocol is proven to provide persistence and liveness under clearly defined assumptions such as honest stake majority which is the bare minimum assumption needed in the PoS setting. On the other hand, Casper FFG, as described in the white paper, is an enhancement on top of a pre-existing “block proposal mechanism”, e.g., a PoW blockchain (namely Ethereum); in particular, its security guarantees as a ledger consensus protocol depend on the security of this proposal mechanism. As the authors of Casper FFG observe, “a wholly compromised block proposal mechanism will prevent Casper from finalizing new blocks”, hence the honest-majority-of-hashing power assumption is still necessary for Casper FFG’s liveness. Similarly, other versions of the Casper protocol, such as Casper FFG-RPJ, are incomplete and/or not accompanied by any proofs of security.
Ouroboros is proven to achieve persistence and liveness under the assumption of honest majority of all stake in the system, even in the case that some significant portions of stakeholders are not participating in the protocol (see e.g., Theorem 1 in the Ouroboros Genesis paper for the most comprehensive statement on Ouroboros security). In contrast, Casper requires a ⅔-fraction of deposited stake to be controlled by honest parties (see section 2.1 of the white paper). Since the deposited stake is blocked and cannot be used for other purposes in the meantime, it is reasonable to assume that the deposited stake will be a small fraction of the total stake in the system. Naturally, larger amounts of stake are more difficult to control so that basing security on the total stake in the system, as in Ouroboros, is a more prudent choice. As a concrete example, in the current sharded version of Ethereum (Ethereum Mauve paper or Casper+Sharding chain v2.1), a minimum of 32 ETH per validator is required with 100-128 validators per shard depending on the reference, without any other restriction. It follows that if the total deposited stake among all prospective validators turns out to be minimal and is not otherwise restricted then just a few thousand ETH would be enough to register a set of sybil validators that could disrupt the ledger consensus security properties.
Though the notion is not formally defined in the Casper FFG white paper, it is easy to see that the property of “stake-based finality” is subsumed by persistence, the property that ensures that transactions become permanently part of the public immutable ledger; the stake-based adjective on finality used in Casper FFG refers to the fact that the condition under which finality is to be attained is based on stake as opposed to, e.g., a hashing power assumption. As mentioned above, no protocol can be deemed to solve the ledger consensus problem without providing persistence (and hence finality). In fact, all PoS protocols provide such properties only with a high probability – if for no other reason, cryptography can always fail with (very) small probability (for example, someone may guess your key). We do in fact know that Bitcoin and (pre-Casper) Ethereum provide finality (shown by the works of GKL15, GKL17 and PSS17) assuming honest majority of computational power), and so does Ouroboros, assuming honest majority of stake as shown in KRDO17, DGKR18, BGKRZ18.
Put simply, Ouroboros provides stake-based finality and it does so with the strongest possible guarantee in terms of stake: against a malicious coalition controlling any amount of the total stake existing in the system as long as it is bounded below 50%. In the Casper FFG white paper, where Casper operates over the Ethereum blockchain, stake-based finality is provided every 100 blocks under the assumption that ⅔ of the deposited stake is honest. As a concrete example, in the same window of time, which is a little over half an hour in our current deployment, we can derive from our formal analysis that Ouroboros will offer finality against, say, a 10% stake adversary with probability of error less than 2^(-44). This is less than 1/10000000000000, one over ten trillion. To appreciate such small numbers, consider that it is expected to have one large asteroid hit the earth once every 100 million years (Scientific American). Thus, it is 10 thousand times more likely that a big asteroid will hit the earth next month than that Ouroboros will reorganise its chain to drop a particular transaction after it has been included in the ledger for about half an hour.
Eventual Consensus vs. (near-)Instant finality
Blockchain protocols like Bitcoin and Ouroboros are called eventual-consensus since they ensure that the irreversibility of a block increases gradually with the number of blocks that are added on top of it. This means that finality is more nuanced than just a true or false value, and is quantified by the probability of reverting a transaction as a function of the strength of the adversary and the length of time that has passed since the block containing that transaction was added. This design enables these protocols to work in the strongest possible adversarial settings and still be very efficient in terms of the number of messages that need to be exchanged; furthermore, they have the feature that the recipient of a transaction can decide for herself how important a transaction is and adjust her own notions of stability per transaction. Their downside is that they do not provide near-instant finality, or in other words, a fast assurance that the transaction will be finalised. This may be a potential advantage of classical BFT protocols that have inspired the design of Casper FFG as well as other protocols in the space including Algorand.
However, near-instant finality typically also comes with significant downsides in terms of the security model such as a much higher requirement of honest stake or, perhaps more importantly, a high degree of guaranteed online presence that must be offered by the participants following the protocol. This hurts the dynamic availability of the participants (see below) which is one of the hallmarks of the bitcoin era of consensus protocols. On the other hand, near-instant finality can be built as a service on top of Ouroboros and this is something that we will be releasing in due course. Moreover, we can argue that this is the best possible way forward: use the Ouroboros eventual consensus protocol which is secure under the strongest possible stake-based guarantees as the solid foundation over which services such as near-instant settlement in optimistic network conditions can be safely built.
Incentives and dynamic availability
Casper FFG is inspired by pre-Bitcoin era standard BFT consensus protocols and as such it cannot handle uncertainty in terms of the number of participating entities once the set of validators becomes fixed. This means that the protocol cannot operate in the “sleepy setting” and “dynamic availability” setting, where a significant number of parties that are supposed to act in the protocol are unavailable due to network conditions, hardware failure or simply lack of interest. This is a significant concern in a decentralized setting where the execution of the protocol is not meant to be left in the hands of a few centralized-power actors, but is rather distributed proportionally among a great number of smaller players. The Casper-FFG white paper acknowledges this as the “Catastrophic Crash” scenario and observes that in this case “no future checkpoints can be finalized”. The authors propose a mitigation in the form of the so-called “inactivity leak.” This idea is only described informally as draining “the deposit of any validator that does not vote for checkpoints, until eventually its deposit sizes decrease low enough that the validators who are voting are a supermajority.” Unfortunately, this modification would in turn negate any potential advantage Casper can claim in face of network splits, as the authors also recognise: “The inactivity leak introduces the possibility of two conflicting checkpoints being finalized without any validator getting slashed.” This also affects the incentives running the protocol. Ouroboros allows for a natural and incentive-driven aggregation of stake into stake pools that will be performed over a period of time using our stake pool reward mechanism, without forcing the behaviour of stakeholders onto a predetermined structure, while Casper has to impose preset numbers of block validators.
While the original Ouroboros protocol does not use VRFs to generate protocol randomness (instead it uses a guaranteed-output-delivery coin-tossing protocol based on verifiable secret-sharing), the follow-up versions Praos and Genesis do so for performance gains. The VRFs proposed for use in Ouroboros Praos and Genesis are proven secure under standard cryptographic assumptions (such as the Computational Diffie Hellman assumption) while the security analysis we have performed ensures Ouroboros’ resilience to randomness manipulation (see Ouroboros Praos and Ouroboros Genesis).
Ouroboros is analysed in the “partially synchronous” setting where messages are delivered to the majority of the parties executing the protocol within a time window upper bounded by a network delay Δ which is unknown to the parties. The order of messages is adversarial and it is not guaranteed that two honest parties will receive messages in the same order. The adversary is allowed to inject arbitrary messages selectively to any of the parties. Casper makes no explicit claims about the network setting it operates in, nevertheless, when describing defenses against long range revisions it alludes to a similar type of model.
This property refers to the ability of a database or ledger consensus protocol to scale its processing power as more nodes (or processing capacity) enter the system, ideally with a linear speedup in the number of nodes added. Ouroboros Hydra, the scalable version of Ouroboros is in development and will be released in due time following our usual mode of discourse, i.e., the release of a full paper containing complete mathematical formulations of the problem that we solve, a full description of our protocol solution, as well as concrete statements about the protocol’s properties that are accompanied by all necessary proofs. At present, the version of Casper that enables sharding, (Casper+Sharding v2.1), is incomplete even in terms of protocol description, and as such, it cannot allow any proof of security.
Learn more about Ouroboros.
Team effort is a hallmark of IOHK research and this blog post is no exception. I am grateful to Christian Badertscher, Matthias Fitzi, Peter Gaži, Alexander Russell, Jeremy Wood, and Vassilis Zikas for various suggestions, comments, and corrections to the above text.
A blog post on the Steemit website appeared recently making a number of claims regarding Ouroboros. The article contains several factual inaccuracies. For instance, it is claimed that “DPOS” in the Ouroboros paper stands for “delegated proof of stake”, while in fact, DPOS means “dynamic proof of stake”, or that the protocol requires a "2/3+" ratio of parties being honest, while in reality it just requires an honest majority, i.e. the stake controlled by parties following the protocol is more than half the total stake. For the benefit of those that are interested in the Ouroboros protocol and who appreciate its general philosophy, we feel it is appropriate to provide here a response to this article making along the way a few broader points. While pointing out inaccuracies in the blog, we take the opportunity to highlight some of the general approaches followed in the design of Ouroboros and in the related research efforts that are currently underway at IOHK.
Ouroboros is a proof of stake (PoS) protocol that uses delegation in the spirit of the PoS idea as discussed in the Bitcoin forum starting from 2011. The references that influenced its design are listed in our paper. PoS is a powerful concept that has inspired a number of other efforts prior, concurrent and post the first Ouroboros paper. Among all other implemented PoS blockchain systems that carry real assets, Ouroboros is unique in that it was designed in tandem with a formal security model and a mathematical proof that it implements a robust transaction ledger. This marks a fundamental shift in the methodology of blockchain system design.
Blockchain systems are in a period of transition from curiosities to critical infrastructure; as such, the all too typical software industry approach of releasing a “minimum viable product” as early as possible and then fixing bugs as they appear, is not appropriate. Failures of critical infrastructure have a significant impact on people’s lives and thus require rigorous engineering discipline to the highest possible standards. Dependability, rather than maximum performance according to some arbitrarily chosen metric, is the primary goal. Performance is important, of course, but the performance required is a function of the ultimate application domain, and from the point of view of dependability it is the worst-case performance that is important, not the ideal-scenario peak rate.
Like all other protocols in the blockchain space, Ouroboros requires some degree of synchronisation. The block production interval has to be consistent with the likely time to complete the required information exchanges. The 20-second slot time in Ouroboros represents a conservative choice for a block of transactions to traverse the diameter of a peer-to-peer network, where the peers may be significantly geographically distributed, the system is operating at peak transaction load and the interconnection is significantly less than perfect. It is improbable for a block of transactions to consistently traverse a global network much faster than that, and as a result any solution that does significantly better (or claims to do significantly better) is either wrong, or provides a weaker level of decentralisation or security, i.e. it solves an easier problem than Ouroboros. There is a tradeoff between achieving a robust, global, participatory service that delivers sustained effective performance even under an adversarial attack, and creating a high performance, limited participation (in geographical scope or network resource requirement) solution that makes overly optimistic assumptions on network stability.
Irreversibility, the property that transactions persist and are immutable in a blockchain protocol, has to be presented as a function of the level of the adversarial strength. This is true in Nakamoto’s Bitcoin paper and also in the Ouroboros paper, see Section 10.1 for the actual time needed for confirmation of transactions. Thus, one should be very wary of statements about irreversibility that do not quantify the level of adversarial power. For instance, Ouroboros will confirm a transaction with 99.9% assurance in just five minutes against an adversary holding 10% of the total stake, which in today’s market cap in the Cardano blockchain would amount to more than two billion dollars. Byzantine agreement protocols can provide a more “black and white” irreversibility, in other words the protocol can be guaranteed to be irreversible within a certain time window provided an honest majority or supermajority exists depending on the protocol. Nevertheless, the performance and decentralisation penalty suffered is very high if the level of adversity is allowed to come close to the 1/2 barrier, which is the level of adversity that Ouroboros can withstand.
The issue of possible dominance of the consensus process by a small group of stakeholders holding a large proportion of the stake is important but is not applicable to the current release of the Cardano system (the Byron release). What we have proved for Ouroboros is that it can facilitate a “fair” transaction ledger (where fairness here means that the ledger can fairly record all significant actions that are performed by the protocol participants despite the presence of an adversary). This enabled us to neutralise a number of rational protocol deviations (e.g. the equivalent of selfish mining attacks in the PoS setting) and provide a Nash equilibrium argument showing how the protocol can support many different types of mechanisms for incentivising participant behaviour. Currently, IOHK Research is actively working to finalise the incentive structure that will be incorporated in the Shelley release of Cardano, where stake pools will be supported and delegation behaviour will be properly incentivised so that it offers effective decentralisation of power. The crux of our methodology is the engineering of a novel reward mechanism for rational participants that provides appropriate incentives to partition their delegation rights. The objectives are first, to avoid concentration of power to a small group of participants – as it could happen by a naïve reward mechanism in a Pareto distributed stakeholder population – and second, to provide appropriate incentives to ensure a desired number of delegates. We are very excited about this work; it will be the first of its kind in the area and, as before, we will be disseminating it widely including full technical details, as well as submitting it for peer review.
This brings us to the final distinguishing advantage of the philosophy of Cardano. Scientific peer review has been refined over centuries. The way it is implemented by the International Cryptology Conference (also called Crypto), where Ouroboros was presented, and the other top conferences in the area, strives to remove conflicts of interest and produce the highest level of objectivity. The method of reviewing is known as "double blind”, i.e. papers are submitted anonymously and reviewers are experts that also remain anonymous to the authors. The committee of experts that reviews submitted papers each year is formed by two program co-chairs that are appointed by the International Association of Cryptologic Research, the pre-eminent organisation of cryptology research that was founded in 1982.
Being invited to serve in the committee as an expert is an important recognition of an individual’s long-term commitment to the area of cryptography (and even a precise count of how many times one has served is maintained). Blockchain protocols fit perfectly within the cryptography scientific literature and thus scientific peer review is to be done by this community. Of course, we welcome reviews from anyone. That is why we make public very detailed whitepapers with precise and specific claims that leave no uncertainty about what is being claimed, and we appreciate any factual discussion about any of these claims. We strongly encourage other projects to submit their work for scientific peer review as well. They will enjoy the benefits of thorough, well-founded and objective critique and they will have the opportunity to showcase any advantages and novelty that their approach possesses.
Prof Aggelos Kiayias