Blog > 2018 > August

IOHK releases Icarus to the Cardano community

Developers can now build their own light wallets

15 August 2018 Brian McKenna 7 mins read

IOHK releases Icarus to the Cardano community

Today IOHK releases Icarus, a reference implementation for a lightweight wallet developed by the IOHK engineering team. We hope that this code base will be used as a point of reference to enable developers to create their own secure light and mobile wallets for Cardano. Icarus is a fully open source code base that will be the first step in a range of open source initiatives to provide developers with a suite of tools for Cardano.

Icarus was born out of a series of proof of concepts which began back in March of this year. A small section of the IOHK engineering team were interested to find out if they could demonstrate that it would be possible to create a lightweight Cardano wallet with all the features of Daedalus but that was easy to use and fast to set up. Whilst we are improving Daedalus synchronization speeds all the time, notably in the recent 1.3 release, we wanted to see if we could build something fast for Ada users who might not require the full features of Daedalus, or might not have the bandwidth or machine requirements to easily run Daedalus.

Therefore, investigating whether it would be possible to build a wallet where the user did not have to download the whole blockchain – and could run in a browser or on a mobile device – was worth the effort of a small dedicated team.

To build a wallet like this, we would need to prove that we could safely and securely store private keys and execute client-side cryptographic operations in the browser. In tandem, we would need to communicate with the Cardano nodes to provide users with their current UTxO state. If this could be accomplished, it would be no mean feat, hence the name Icarus – from the beginning we knew we would be flying close to the sun.

The team set out in at the beginning of March with ambitious goals to see if, within a month, we could build a skeleton Chrome extension application and verify that Cardano cryptography could be run in the browser using WebAssembly compiled from Rust. The Rust library of Cardano primitives has been developed by Vincent Hanquez and Nicolas Di Prima, IOHK Specialised Cryptographic Engineers, and has already been used for the paper wallet certificate feature in Daedalus.

To build this Chrome extension, we would need to successfully demonstrate we could import and track a wallet balance. Of course, we would have to do all this without sacrificing the IOHK engineering principles of quality and security.

The demo at the end of March went well and produced a functional prototype that we could develop. Once each demo had been given, the wider IOHK engineering team had a chance to review, critique and provide feedback about the design decisions the Icarus project team was taking, which proved invaluable to the process. After proof of concept 1, it was felt that good progress was being made and another month’s effort from the team would be worthwhile.

Proof of concept 2 was delivered in mid-April. The Rust engineers had spent the intervening time extending the Rust library to support the Cardano primitives for the creation, signing, and broadcast of transactions, and providing an API so that these could be run in the browser. On the application side, we wanted to see if we could reuse the UX/UI components of Daedalus to provide a smooth user experience. Luckily, the IOHK Daedalus development team has maintained a high-quality, portable UI framework for React, called React-Polymorph, which we found to be easily portable to the Chrome extension.

Proof of concept 3 in late May involved making Icarus fully interoperable with the Daedalus wallet. The team worked to develop a new Hierarchical Deterministic (HD) address scheme that Daedalus will use in future and will ensure ongoing compatibility. One important feature we built at this point was to allow the user to enter their Daedalus wallet recovery phrase in Icarus, and for their Ada in Daedalus to be transferred to the Icarus wallet. In effect, this allows users to retrieve their Ada without using the Daedalus wallet. We also optimised wallet restoration times. Finally, after only three months and three demo’s we had a fully functional prototype lightweight Cardano wallet!

Before we could ensure this was a reference implementation we could release to the community, we wanted to ensure that it performed at scale. This, along with some code clean-up, was the main task of the final proof of concept 4 in early June. We called upon the experience of Philipp Kant, in IOHK benchmarking, and Neil Davies, leading networking, and successfully conducted a series of rigorous stress and failover tests on the architecture.

The code base has been quality assured by Allied Testing, a leading quality assurance and testing company. We also engaged Kudelski Security, a cybersecurity consultancy, to perform a full security audit of the code – their report will be published soon.

emurgo trip
L-R: Nicolás Arqueros, Alan Verbner, Brian McKenna, Sebastien Guillemot

We knew that Emurgo, the organisation that supports new Cardano ventures, was interested in releasing the first implementation of Icarus to the community. To that end, we invited two Emurgo staff – Nicolás Arqueros, chief technology officer, and Sebastien Guillemot, technical manager – to Buenos Aires to meet lead Icarus developer Alan Verbner and his team in July. The goal of this trip was to see if the code could be understood and deployed by open source community members. Emurgo provided feedback on how we could make the reference implementation ready to release as a product and they wrote a technical specification for the code base. We are excited that Emurgo will soon be launching their implementation of Icarus, the Yoroi wallet, and look forward to seeing how they carry through their vision for the product.

In mid-July, Hayley McCool Smith, IOHK Product Marketing Manager, visited Emurgo at their offices in Tokyo. One of the purposes of the trip was to take part in a naming workshop which would help Emurgo bring their product to life. After spending a day working through a plethora of contenders that Emurgo had shortlisted, it was decided that “Yoroi” was the perfect fit. In Japanese, Yoroi means “great armour” and is a prominent example of the type of secure armament that Samurais would wear to protect themselves. With the name decided, it was up to the team to create a logo that would reflect a new lightweight wallet, while also incorporating the traditional Samurai meaning of the word.

emurgo tokyo
Emurgo team in Tokyo

The Rust library that was used to bring the Cardano cryptography into the browser has spawned another IOHK project, the Cardano Rust project. (This has been known as Project Prometheus internally.) IOHK will be releasing more information on this in due course. The Cardano Rust project will maintain the open source spirit of Icarus and further extend the toolbox of Rust modules. The project will be made available to the open source community to easily build high-quality applications using Cardano. The first product of the project will be a full command line interface wallet, which you can expect to see in September.

The segmented development team and rapid iteration approach to software development has worked well on Project Icarus and we will be employing this strategy again. We are happy that Ada holders will have the ability to store their Ada in the really cool Yoroi wallet and that developers have a high-quality reference implementation on which to base their own new light and mobile wallets for Cardano. The project has also given rise to Project Prometheus which is the natural evolution of the spirit of Icarus.

We feel that we have developed, in quite a short time, a very useful quality assured and security audited reference implementation for a lightweight Cardano wallet. We encourage the open source community to fork the Icarus code base, compile it, and maybe even build your own wallet for Cardano. We welcome contributions and hope that this effort will benefit the entire community.

This blog has been amended to update the name of the Cardano Rust project from Project Prometheus.

Artwork, Creative Commons Mike Beeple

How does Casper compare to Ouroboros?

Differences between the proposed Ethereum protocols and Cardano’s consensus algorithm

9 August 2018 Prof Aggelos Kiayias 13 mins read

How does Casper compare to Ouroboros?

TL;DR In response to recent discussions in social media, we give a brief comparison of the Ouroboros and Casper proof-of-stake protocols.

Ouroboros is a formally specified and analysed protocol with mathematically proven security guarantees based on clearly specified assumptions. The protocol description, models and proofs are all public. Hence, the underlying assumptions, the target protocol properties, and the respective correctness proofs can be publicly scrutinised. Ouroboros offers stake-based finality with the strongest possible guarantees in terms of the amount of stake backing up honest operation. It also provides a solid foundation over which services such as near instant finality of transactions can be offered in optimistic network conditions.

Regarding Casper, we are not aware of any currently published source that sufficiently describes the protocol's mode of operation nor any provable guarantees about it. Still, from what has been presented about Casper until now, as compared to Ouroboros, we can safely conclude that Casper provides much weaker guarantees in terms of how much stake the adversary needs to control in order to disrupt the protocol. Below, we compare the two protocols along several dimensions; for lack of proper documentation, many properties of Casper have to be assumed to the best of our knowledge.


In response to a discussion here and here, we give a brief comparison of the Ouroboros proof-of-stake (PoS) protocol and Casper PoS. For Ouroboros, we refer to the original version underlying the Cardano Settlement Layer (published at Crypto 2017), however most of our comments apply to later versions Ouroboros Praos and Ouroboros Genesis as well. For Casper, we primarily refer to the Casper Friendly Finality Gadget (FFG) as described in the white paper, being the most recent Casper proposal that is sufficiently descriptive to draw a full comparison (other references include Ethereum Mauve, Casper+Sharding v2.1, FFG-RPJ, Casper TBG/CBC).

Any PoS ledger consensus protocol should satisfy two fundamental properties: persistence and liveness. The first ensures that the ledger is final and immutable. The second ensures that transactions broadcasted by honest parties are eventually included in the (immutable) ledger. Such properties, typically, cannot be proven unconditionally: they will rely on certain conditions, some of them cryptographic, e.g., that digital signatures cannot be forged, while others are related to the behaviour of the participants, e.g., that the players who follow the protocol control a majority of the stake. There are other desirable properties that a PoS protocol should satisfy (such as that executing the protocol as prescribed is the only rational strategy for the participants), but persistence and liveness as defined above constitute the bare minimum pair of fundamental properties necessary for ledger consensus.

Let us now discuss some of the differences between the two protocols and their analyses.

Execution Model and Falsifiability of Claims

The Ouroboros protocol is analyzed in a model that is fully described: it unambiguously defines all the participants’ programs, their execution and interactions, their communication – including network properties – and the potential corruption by an adversarial entity of any set of parties controlling a minority of the stake. Such a model allows the formulation of mathematically precise security guarantees satisfied by any execution, such as the persistence and liveness properties proven for Ouroboros. In particular, the formal modeling of Ouroboros permits precise, quantitative statements about stake bounds and settlement times; see below. This makes all the claims we make about Ouroboros entirely concrete; there is nothing left up to interpretation or reader perspective. Without such a model (notably missing in the Casper FFG white paper or in any other available sources related to Casper), it is impossible to prove the correctness of any claims about the protocol. Consensus protocols, in general, are complex objects; designing them without the development of rigorous mathematical arguments that establish the required properties can prove to be precarious as prior practice in secure systems design has shown. Good design intuition and best effort are just not sufficient when a ledger consensus protocol is supposed to carry assets worth billions.

A comprehensive solution to PoS ledger consensus

Given the above, it is important to appreciate that the Ouroboros protocol is proven to provide persistence and liveness under clearly defined assumptions such as honest stake majority which is the bare minimum assumption needed in the PoS setting. On the other hand, Casper FFG, as described in the white paper, is an enhancement on top of a pre-existing “block proposal mechanism”, e.g., a PoW blockchain (namely Ethereum); in particular, its security guarantees as a ledger consensus protocol depend on the security of this proposal mechanism. As the authors of Casper FFG observe, “a wholly compromised block proposal mechanism will prevent Casper from finalizing new blocks”, hence the honest-majority-of-hashing power assumption is still necessary for Casper FFG’s liveness. Similarly, other versions of the Casper protocol, such as Casper FFG-RPJ, are incomplete and/or not accompanied by any proofs of security.

Stake Assumptions

Ouroboros is proven to achieve persistence and liveness under the assumption of honest majority of all stake in the system, even in the case that some significant portions of stakeholders are not participating in the protocol (see e.g., Theorem 1 in the Ouroboros Genesis paper for the most comprehensive statement on Ouroboros security). In contrast, Casper requires a ⅔-fraction of deposited stake to be controlled by honest parties (see section 2.1 of the white paper). Since the deposited stake is blocked and cannot be used for other purposes in the meantime, it is reasonable to assume that the deposited stake will be a small fraction of the total stake in the system. Naturally, larger amounts of stake are more difficult to control so that basing security on the total stake in the system, as in Ouroboros, is a more prudent choice. As a concrete example, in the current sharded version of Ethereum (Ethereum Mauve paper or Casper+Sharding chain v2.1), a minimum of 32 ETH per validator is required with 100-128 validators per shard depending on the reference, without any other restriction. It follows that if the total deposited stake among all prospective validators turns out to be minimal and is not otherwise restricted then just a few thousand ETH would be enough to register a set of sybil validators that could disrupt the ledger consensus security properties.

Finality

Though the notion is not formally defined in the Casper FFG white paper, it is easy to see that the property of “stake-based finality” is subsumed by persistence, the property that ensures that transactions become permanently part of the public immutable ledger; the stake-based adjective on finality used in Casper FFG refers to the fact that the condition under which finality is to be attained is based on stake as opposed to, e.g., a hashing power assumption. As mentioned above, no protocol can be deemed to solve the ledger consensus problem without providing persistence (and hence finality). In fact, all PoS protocols provide such properties only with a high probability – if for no other reason, cryptography can always fail with (very) small probability (for example, someone may guess your key). We do in fact know that Bitcoin and (pre-Casper) Ethereum provide finality (shown by the works of GKL15, GKL17 and PSS17) assuming honest majority of computational power), and so does Ouroboros, assuming honest majority of stake as shown in KRDO17, DGKR18, BGKRZ18.

Put simply, Ouroboros provides stake-based finality and it does so with the strongest possible guarantee in terms of stake: against a malicious coalition controlling any amount of the total stake existing in the system as long as it is bounded below 50%. In the Casper FFG white paper, where Casper operates over the Ethereum blockchain, stake-based finality is provided every 100 blocks under the assumption that ⅔ of the deposited stake is honest. As a concrete example, in the same window of time, which is a little over half an hour in our current deployment, we can derive from our formal analysis that Ouroboros will offer finality against, say, a 10% stake adversary with probability of error less than 2^(-44). This is less than 1/10000000000000, one over ten trillion. To appreciate such small numbers, consider that it is expected to have one large asteroid hit the earth once every 100 million years (Scientific American). Thus, it is 10 thousand times more likely that a big asteroid will hit the earth next month than that Ouroboros will reorganise its chain to drop a particular transaction after it has been included in the ledger for about half an hour.

Eventual Consensus vs. (near-)Instant finality

Blockchain protocols like Bitcoin and Ouroboros are called eventual-consensus since they ensure that the irreversibility of a block increases gradually with the number of blocks that are added on top of it. This means that finality is more nuanced than just a true or false value, and is quantified by the probability of reverting a transaction as a function of the strength of the adversary and the length of time that has passed since the block containing that transaction was added. This design enables these protocols to work in the strongest possible adversarial settings and still be very efficient in terms of the number of messages that need to be exchanged; furthermore, they have the feature that the recipient of a transaction can decide for herself how important a transaction is and adjust her own notions of stability per transaction. Their downside is that they do not provide near-instant finality, or in other words, a fast assurance that the transaction will be finalised. This may be a potential advantage of classical BFT protocols that have inspired the design of Casper FFG as well as other protocols in the space including Algorand.

However, near-instant finality typically also comes with significant downsides in terms of the security model such as a much higher requirement of honest stake or, perhaps more importantly, a high degree of guaranteed online presence that must be offered by the participants following the protocol. This hurts the dynamic availability of the participants (see below) which is one of the hallmarks of the bitcoin era of consensus protocols. On the other hand, near-instant finality can be built as a service on top of Ouroboros and this is something that we will be releasing in due course. Moreover, we can argue that this is the best possible way forward: use the Ouroboros eventual consensus protocol which is secure under the strongest possible stake-based guarantees as the solid foundation over which services such as near-instant settlement in optimistic network conditions can be safely built.

Incentives and dynamic availability

Casper FFG is inspired by pre-Bitcoin era standard BFT consensus protocols and as such it cannot handle uncertainty in terms of the number of participating entities once the set of validators becomes fixed. This means that the protocol cannot operate in the “sleepy setting” and “dynamic availability” setting, where a significant number of parties that are supposed to act in the protocol are unavailable due to network conditions, hardware failure or simply lack of interest. This is a significant concern in a decentralized setting where the execution of the protocol is not meant to be left in the hands of a few centralized-power actors, but is rather distributed proportionally among a great number of smaller players. The Casper-FFG white paper acknowledges this as the “Catastrophic Crash” scenario and observes that in this case “no future checkpoints can be finalized”. The authors propose a mitigation in the form of the so-called “inactivity leak.” This idea is only described informally as draining “the deposit of any validator that does not vote for checkpoints, until eventually its deposit sizes decrease low enough that the validators who are voting are a supermajority.” Unfortunately, this modification would in turn negate any potential advantage Casper can claim in face of network splits, as the authors also recognise: “The inactivity leak introduces the possibility of two conflicting checkpoints being finalized without any validator getting slashed.” This also affects the incentives running the protocol. Ouroboros allows for a natural and incentive-driven aggregation of stake into stake pools that will be performed over a period of time using our stake pool reward mechanism, without forcing the behaviour of stakeholders onto a predetermined structure, while Casper has to impose preset numbers of block validators.

Randomness

While the original Ouroboros protocol does not use VRFs to generate protocol randomness (instead it uses a guaranteed-output-delivery coin-tossing protocol based on verifiable secret-sharing), the follow-up versions Praos and Genesis do so for performance gains. The VRFs proposed for use in Ouroboros Praos and Genesis are proven secure under standard cryptographic assumptions (such as the Computational Diffie Hellman assumption) while the security analysis we have performed ensures Ouroboros’ resilience to randomness manipulation (see Ouroboros Praos and Ouroboros Genesis).

Network Assumptions

Ouroboros is analysed in the “partially synchronous” setting where messages are delivered to the majority of the parties executing the protocol within a time window upper bounded by a network delay Δ which is unknown to the parties. The order of messages is adversarial and it is not guaranteed that two honest parties will receive messages in the same order. The adversary is allowed to inject arbitrary messages selectively to any of the parties. Casper makes no explicit claims about the network setting it operates in, nevertheless, when describing defenses against long range revisions it alludes to a similar type of model.

Sharding

This property refers to the ability of a database or ledger consensus protocol to scale its processing power as more nodes (or processing capacity) enter the system, ideally with a linear speedup in the number of nodes added. Ouroboros Hydra, the scalable version of Ouroboros is in development and will be released in due time following our usual mode of discourse, i.e., the release of a full paper containing complete mathematical formulations of the problem that we solve, a full description of our protocol solution, as well as concrete statements about the protocol’s properties that are accompanied by all necessary proofs. At present, the version of Casper that enables sharding, (Casper+Sharding v2.1), is incomplete even in terms of protocol description, and as such, it cannot allow any proof of security.

Learn more about Ouroboros.

Team effort is a hallmark of IOHK research and this blog post is no exception. I am grateful to Christian Badertscher, Matthias Fitzi, Peter Gaži, Alexander Russell, Jeremy Wood, and Vassilis Zikas for various suggestions, comments, and corrections to the above text.

Artwork, Creative Commons Mike Beeple

From free algebras to free monads

7 August 2018 Marcin Szamotulski 29 mins read

From free algebras to free monads

In universal algebra freeness is a well defined algebraic property. We will explore equational theories which are tightly connected to free algebras. We will consider free monoids. Then we'll explain how monads can be brought into the picture in the context of monoidal categories. This will lead to a precise definition of a free monad as a free monoid.

This post requires familiarity with some very basic Category Theory and does not assume any knowledge on universal algebra. Most mathematical notions will be introduced but you might want to dig into literature for some more examples; though most of the books are quite lengthy and not suited for non-mathematicians - you've been warned ;). Knowing this I tried to bring all the required definitions, together with some very basic examples. As you read you may want to read about semigroups, monoids, groups, $G$-sets, lattices, Boolean or Heyting algebras from Wikipedia articles or try to find info on nCatLab (though this is is a heavy resource, with mostly with higher categorical approach, so probably better suited for more familiar readers).

Preliminaries

We will need some preliminary definitions. Let's begin with a definition of algebra. For a set $A$ we will denote $A^n$ the $n$th cartesian product of $A$, i.e. $A^2=A\times A$.

Definition Algebra
An algebra $\underline{A}$ is a set $A$ together with a finite set of operations $f_i^{\underline{A}}:A^{j_i}\rightarrow A$ ($i=1,\ldots,n$), usually simply written as$\underline{A}=(A, (f_i^{\underline{A}})_{i=1,\ldots,n})$. For operation $f_i^{\underline{A}}$ the natural number $j_i$ is called its arity, which is a finite natural number. The set $A$ is usually called universum of the algebra $\underline{A}$. The set of symbols $f_i$ is called type of the algebra $\underline{A}$.

Examples includes many classical algebraic structures, like semigroups, where there is only a single operation of arity 2, monoids which in addition have one operation of arity $0$ - the unit element of multiplication. Other source of examples are Boolean algebras with two 2-ary operations $\wedge$ and $\vee$ or more generally lattices, Heyting algebras. Also rings, modules, fields, vector spaces and countless other structures. Universal algebra has a very general theory describing common concepts but also deals with very special cases of some of more esoteric algebras.

Definition Homomorphism
A homomorphism between two algebras $\underline{A}=(A, (f_i^{\underline{A}})_{i=1,\ldots,n})$ and $\underline{B}=(B, (f_i^{\underline{B}})_{i=1,\ldots,n})$ (of the same type) is a map $h:A\rightarrow B$ with the property that for every $i$: $$ h(f^{\underline{A}}_i(a_1,\ldots, a_{i_j})) = f^{\underline{B}}_i(h(a_1),\ldots,h(a_{i_j}))$$

This means that homomorphism preserve operations. For example a homomorphism of monoids is a map that preserves the multiplication and unit. For boolean algebras, it means that a homomorphism preserves the $\vee$ (also called join) and $\wedge$ (usually called meet) operations, etc.

It is an easy observations that homomorphism are closed under composition and since the identity map is always a homomorphism this leads to well defined categories, e.g. category of monoids, category of boolean algebras, category of rings, ...

Free algebras and Equational theories

Free Algebra
An algebra $\underline{A}=(A,(f_i^{\underline{A}})_{i=1,\dots,n})$ is free in a class of algebras $\mathcal{C}$ over a subset $S\subset A$ if for every $\underline{B}=(B,(f_i^{\underline{B}}){i=1,\dots,n})\in\mathcal{C}$ every map $S\rightarrow B$ uniquely extends to a homomorphism $\underline{A}\rightarrow \underline{B}$. The set $S$ is called the set of generators.

As you can see the definition of a free algebra requires a context, this interesting in its own! There are free monoids in the class of all monoids and there are free commutative monoids in the class of commutative monoids (i.e. monoids in which $m\cdot n=n\cdot m$ for each elements $m,n$).

Many theories allow free algebras. Let's see some other examples:

  • The monoid of natural numbers $\mathbb{N}$ with addition and $0$ as its unit element is a free monoid generated by $\{1\}$. It is both free in the class of all monoids and in the class of commutative ones. The $n$-th cartesian product $\mathbb{N}^n$ is a free commutative monoid generated by the set $\{(1,0,\ldots,0),(0,1,0,\ldots,0),\ldots,(0,\ldots,0,1)\}$, but it's not a free monoid in the class of all monoids.

  • The additive group of integers $\mathbb{Z}$ is a free group with one generator, it is also free in the class of commutative groups. As in monoids: $\mathbb{Z}^n$ is a free commutative group with $n$ generators.

  • A free group with two generators can be pictured as the Cayley graph (which is a fractal) (note that its first quarter is the free monoid with two generators).

  • Every vector space is free, since every vector space admits a basis.

  • In the class of $G$-sets, free $G$ sets are exactly all the cartesian products of $G^n$.

  • In the class of rings, polynomial rings with integer coefficients, usually denoted by: $\mathbb{Z}[X]$ or $\mathbb{Z}[X_1,\dots,X_n]$ for polynomials with many variables) are free (you likely have learned quite a lot about them in school, you just haven't been told the really interesting part ;)). This example was the motivation for terms, their algebra and term functions which we will discover next.

    This is also true for semi-rings. You might have used this fact when using purescript validation library. A free semiring generated by a type `a` has type `[[a]]`; for example `[[()]]` is isomorphic to $\mathbb{N}[X]$, since (please excuse mixing Haskell and mathematical notation): $$[[()]]\simeq[\mathbb{N}]\simeq\mathbb{N}[X]$$

Free algebras play an essential role in a proof of beautiful and outstanding Birkhoff theorem. It states that a class of algebras $\mathcal{C}$ is an equational theory if and only if the class is closed under cartesian products, homomorphic images and subalgebras. Equational theories are classes of algebras which satisfy a set of equations; examples includes: semigroups, monoids, groups or boolean or Heyting algebras but also commutative (abelian) semigroups / monoids / groups, and many other classical algebraic structures.

We need to be a little bit more precise language to speak about equational theories in the full generality of universal algebra, which we are going to introduce.

Terms, term functions and their algebra

Definition Term

Let’s consider an algebra type $(f_i)_{i=1,\ldots,n}$. Then the set of terms on a set $X$ (set of variables) is inductively defined as:

  • each $x\in X$ is a term (of arity $0$)

  • each $f_i(x_1,\dots ,x_{j_i})$ is a term of arity $j_i$ for $x_1,\dots ,x_{j_i}\in X$

  • if $g_1,\dots g_n$ are terms of arities $j_1$ to $j_n$ respectively, and $g$ is a term of arity $n$ then $g(g_1(x_{11},\dots,x_{1j_1}),\dots, g_n(x_{n1},\dots,x_{nj_n}))$ is a term of arity $j_1+\dots+j_n$ with $x_{kl}\in X$.

We will denote the set of terms on $X$ by $\mathsf{T}^{(f_i)_{i=1,\dots,n}}(X)$ or simply $\mathsf{T}(X)$.

For example in groups: $x^{-1}\cdot x$, $x\cdot y$ and $1$ (the unit of the group) are terms. Terms are just abstract expressions that one can build using algebraic operations that are supported by the algebra type. Each term $t$ defines a term function on every algebra of the given type. In groups the following terms are distinct but they define equal term function: $x^{-1}\cdot x$ and $1$; on the other hand the two (distinct) terms $(x\cdot y)\cdot z$ and $x\cdot (y\cdot z)$ define equal term functions. The two terms $x\cdot y$ and $y\cdot x$ define distinct term functions (on non commutative groups or commutative monoids). Another example comes from boolean algebras (or more broadly lattice theory) where the two terms $x\wedge (y\vee z)$ and $(x\wedge y)\vee(x\wedge z)$ define equal term functions on Boolean algebras (or more generally distributive lattices). If $t$ is a term then the associated term function on an algebra $\underline{A}$ we let denote by $\tilde{t}^{\underline{A}}$. Term functions are natural to express equalities within a theory. Now we are ready to formally define equational classes of algebras.

Definition Equational Theory
A class of algebras $\mathcal{C}$ is an equational theory if and only if there exists a set of pairs of terms $\mathbf{E}\subset\mathsf{T}(X)^2$ such that the class consists exactly of algebras $\underline{A}=(A,(f_i^{\underline{A}})_{i=1,\dots,n})$ for which the following condition is satisfied: for each pair of terms $(t, s)\in \mathbf{E}$ two corresponding term functions $\tilde{t}^{\underline{A}}$ and $\tilde{s}^{\underline{A}}$ are equal.

For example the class of monoids is an equational theory for $$\mathbf{E}=\bigl\{(1\cdot x,\, x),\; (x\cdot 1,\, x),\; \bigl((x\cdot y)\cdot z,\, x\cdot (y\cdot z)\bigr)\bigr\}$$ i.e. all the algebras with two operations: one of arity 0 (the unit) and one of arity 2 (the multiplication), such that the $1$ is the unit for multiplication $\cdot $ and multiplication is associative. The class of commutative monoids is also an equational theory with one additional equation $(x\cdot y,\, y\cdot x)$. Groups, Boolean or Heyting algebras, lattices are also equational theories.

Coming back to free algebras: it turns out that the set of terms $\mathsf{T}^{(f_i)}(X)$ on a given set of variables $X$ has an algebra structure of type $(f_i)_{i=1,\dots,n}$: it is given by the inductive step in the definition of terms: if $t_i\in \mathsf{T}^{(f_i)}(X)$ for $i=1,\dots,j_i$ then $$ f_j^{\underline{\mathsf{T}^{(f_i)}(X)}}(t_1,\ldots,t_{j_i}) := f_j(t_1,\ldots,t_{j_i})\in \mathsf{T}(X) $$ Furthermore $\underline{\mathsf{T}^{(f_i)}(X)}$ is a free algebra over $X$ in the class of all algebras of the given type $(f_i)_{i=1,\dots,n}$. An extension of a map $h:X\rightarrow\underline{A}=(A,(f_i^{\underline{A}})_{i=1,\ldots,n})$ can be build inductively following the definition of terms and using the homomorphism property: $$ h(f_i(t_1,\ldots,f_{i_j})) := f_i^{\underline{A}}(h(t_1),\ldots,h(t_{i_j})) $$ The map $h$ is indeed a homomorphism: $$ \begin{array}{ll} h\bigl(f_i^{\underline{\mathsf{T}(X)}}(t_1,\ldots,t_{i_j})\bigr) & = h(f_i(t_1,\ldots, t_{i_j}) \\\\ & = f_i^{\underline{A}}(h(t_1),\ldots, h(t_{i_j})) \\\\ \end{array} $$ Note that the class of algebras of the same type is usually very broad, but this is the first approximation to build free algebras in an equational theory. This is just the equational theory for the empty set $\mathbf{E}$.

Let’s see this on an example and let us consider algebras of the same type as a monoid: with one nullary operation (unit $1$ or `mempty` if you like) and one 2-ary operation (multiplication / `mappend`). Let $X$ be a set of variables. Then $1$ is a valid term, and also if $t_1$ and $t_2$ are terms on $X$ then also $t_1\cdot t_2$ is a term, but also $t_1\cdot 1$ and $1\cdot t_2$ are valid and distinct terms. $\mathsf{T}(X)$ resembles a monoid but it isn't. It is not associative and the unitality condition is not valid since $t\cdot 1\neq t\neq 1\cdot t$ as terms. We still need a way to enforce the laws. But note that if you have a map $f:X\rightarrow M$ to a monoid $M$ which you'd like to extend to a homomorphism $\mathsf{T}(X)\rightarrow M$ that preserves $1$ (which is not the unit, yet) and multiplication (even though it is not associative), you don’t have much choice: $\mathsf{T}(X)\rightarrow M$: $t_1\cdot t_2$ must be mapped to $f(t_1)\cdot f(t_2)\in M$.

We need a tool to enforce term equations. For that one can use

Definition Congruence relation
Let $\underline{A}=(A,(f^A_i)_{i=1,\dots,n})$ be an algebra and let $\sim$ be an equivalence relation on $A$, i.e. a subset of $A\times A$ which is:
  • reflexive: for each $a\in A$: $a\sim a$
  • symmetric: for each $a,b\in A$: if $a\sim b$ then $b\sim a$
  • transitive: for each $a,b,c\in A$: if $a\sim b$ and $b\sim c$ then $a\sim c$
An equivalence relation is a congruence relation if for all operations $f_i$ and any $a_1,\dots,a_{i_j}\in A$ and $b_1,\dots,b_{i_j}\in A$ the following implication holds: $$ a_1\sim b_1,\dots,a_{i_j}\sim b_{i_j}\Rightarrow f_i^{\underline{A}}(a_1,\dots,a_{i_j})\sim f_i^{\underline{A}}(b_1,\dots,b_{i_j}) $$
If you have an equivalence relation $~$ on a set $A$ then you can always construct the quotient set $A/\sim$. An equivalence class of $a\in A$ is the set $[a]:={x\in A:\; x\sim a}$, then $A/\sim$ is just the set of equivalence classes. However if you have a congruence then the quotient $A/\sim$ carries algebra structure which turns the quotient map $A\rightarrow A/\sim$ into a homomorphism.

Equivalence relations and congruences form complete lattices (partial ordered which have all suprema and minima, also infinite). If you have two equivalence relations (congruences) then their intersection (as subsets of $A^2$) is an equivalence relation (congruence).

The set of equations that defines the class of monoids generates a congruence relation on the term algebra $\underline{\mathsf{T}^{f_i}(X)}$ (i.e. an equivalence relation which is compatible with operations: $x_1\sim y_1$ and $x_2\sim y_2$ then $(x_1\cdot y_1) \sim (x_2\cdot y_2)$). One can define it as the smallest congruence relation which contains the set $\mathbf{E}$. Equivalence relation on a set $A$ is just a subset of the cartesian product $A\times A$ (which satisfy certain axioms), so it all fits together! One can describe this congruence more precisely, but we'll be happy with the fact that it exists. To show that, first one need to observe that intersection of congruences is a congruence, then the smallest congruence containing the set $\mathbf{E}$ is an intersection of all congruences that contain $\mathbf{E}$. This intersection is non empty since the set $A\times A$ is itself a congruence relation.

The key point now is that if we take the term algebra and take a quotient by the smallest congruence that contains all the pairs of terms which belong to the set $\mathbf{E}$ we will obtain a free algebra in the equational class defined by $\mathbf{E}$. We will leave the proof to a curious reader.

Free monoids

Let’s take a look on a free monoid that we can build this way. First let us consider the free algebra $\underline{\mathsf{T}(X)}$ for algebras of the same type as monoids (which include non associative monoids, which unit does not behave like a unit). And let $\sim$ be the smallest relation (congruence) that enforces $\mathsf{T}(X)/\sim$ to be a monoid.

Since monoids are associative every element in $\underline{\mathsf{T}(X)}/\sim$ can be represented as $x_1\cdot( x_2\cdot (x_3\cdot\ldots \cdot x_n))$ (where we group brackets to the right). Multiplication of $x_1\cdot( x_2\cdot (x_3\cdot\ldots \cdot x_n))$ and $y_1\cdot( y_2\cdot (y_3\cdot\ldots \cdot y_m))$ is just $x_1\cdot (x_2\cdot (x_3\cdot\ldots\cdot(x_n\cdot (y_1\cdot (y_2\cdot (y_3\cdot\ldots\;\cdot y_m)\ldots)$. In Haskell if you’d represent the set $X$ as a type $a$ then the free monoid is just the list type $[a]$ with multiplication: list concatenation and unit element: the empty list. Just think of

-- A set with `n` elements corresponds
-- to a type with `n` constructors:
data X = X_1|⋯|X_n

Free Monads

It turns out that monads in $\mathcal{Hask}$ are also an equational theory. Just the terms are higher kinded: $*\rightarrow*$ rather than $*$ as in monoids. The same construction of a free algebra works in the land of monads, but we need to look at them from another perspective. Let us first take a mathematical definition of view on monads.

Definition Monad
A monad is an (endo) functor `m` with two natural transformations:
class Monad m where
return :: a -> m a
join   :: m(m a) -> m a

which is unital and associative, i.e. the following law holds:

-- | associativity
join . join == join . fmap join
-- | unitality
join . return  = id = join . fmap return

These axioms are easier to understand as diagrams:

and

It is a basic lemma that this definition a monad is equivalent to what we are used to in Haskell:

class Monad m where
return :: a -> m a
>>=    :: m a -> (a -> m b) -> m b

Having `join` one defines `>>=` as

ma >>= f = join $ f <$> ma

and the other way, having `>>=` then

join = (>>= id)`

Not only these two constructions are reverse to each other, but also they translate the monad laws correctly.

Monoids in monoidal categories

To define a monoid $M$ in the category $\mathcal{Set}$ (of sets) one needs the product $M\times M$. Abstraction of this structure leads to monoidal categories.

Definition Monoidal Category
Category $\mathcal{C}$ with a bifunctor $-\otimes-:\mathcal{C}\times\mathcal{C}\rightarrow\mathcal{C}$ is called strict monoidal category if `\otimes` is associative and unital, i.e. for all $a,b,c\in\mathcal{C}$ $(a\otimes b)\otimes c = a\otimes (b\otimes c)$ and there exists a unit object $1$ such that $1\otimes a=a=a\otimes 1$.

Most examples of monoidal categories are not strict but are associative and unital up to a natural transformation. Think of $(A\times B)\times C\simeq A\times(B\times C)$ in $\mathcal{Set}$ (or any category with (finite) products, like $\mathcal{Hask}$). Let me just stress out that since $\otimes$ is a bifunctor, for any two maps $f:\;a_1\rightarrow b_1$ and $g:\;a_2\rightarrow b_2$ we have a map $f\otimes g: a_1\otimes a_2\rightarrow b_1\otimes b_2$, and moreover it behaves nicely with respect to composition: $(f_1\otimes g_1) \cdot (f_2\otimes g_2) = (f_1\cdot f_2)\otimes(g_1\cdot g_2)$ for composable pairs of arrows $f_1,\;f_2$ and $g_1,\;g_2$.

Now we can generalise a definition of a monoid to such categories:

Definition Monoid in a Monoidal Category
A monoid in a monoidal category $\mathcal{C}$ with monoidal product $-\otimes-$ and a unit $1$ is an object $m$ with a pair of morphisms $$ \mathrm{mappend}:\;m\otimes m\rightarrow m\quad\mathrm{mempty}:\;1\rightarrow m $$ such that

and

The main point of this section is that these diagrams have exactly the same shape as associativity and unitality for monads. Indeed, a monoid in the category of endo-functors with functor composition as a monoidal product $\otimes$ and unit the identity functor is a monad. In category theory this category is strict monoidal, if you try to type this in Haskell you will end up with a non strict monoidal structure, where you will need to show penthagon equation.

These consideration suggest that we should be able to build a free monad using our algebraic approach to free algebras. And this is what we will follow in the next section.

Free monads in $\mathcal{Hask}$

Firstly, what should replace the set of generators $X$ in $\mathsf{T}(X)/\sim$? First we generalised from the category of sets $\mathcal{Set}$ to a monoidal category $(\mathcal{C},\otimes, 1)$: its clear that we just should pick an object of the category $\mathcal{C}$. Now since our category is the category of (endo) functors of $\mathcal{Hask}$ the set of generators is just a functor. So let's pick a functor `f`.

To get a free monad we need to decypher $\mathsf{T}(f)/\sim$ in the context of a monoid in a monoidal category of endofunctors. Note that here $\mathsf{T}(f)$ and $\mathsf{T}(f)/\sim$ are functors! To simplify the notation, let $\mathsf{Free}(f):=\mathsf{T}(f)/\sim$. So what is a term in this setting? It should be an expressions of a Haskell's type: $$ \begin{equation} \begin{array}{c} \bigl(\mathsf{Free}(f)\otimes\mathsf{Free}(f)\otimes\ldots\otimes \mathsf{Free}(f)\bigr)(a) \\\\ \quad\quad = \mathsf{Free}(f)\bigl(\mathsf{Free}(f)\bigl(\ldots (\mathsf{Free}(f)(a)\bigr)\ldots\bigr) \end{array} \end{equation} $$ In our setup the monoidal product $-\otimes-$ is just the functor composition, thus $\mathsf{Free}(f)(a)$ must be a type which (Haskell's) terms are of Haskell's types:

a, f a, f (f a), f (f (f a)), ...

The monadic `join` will take something of type $\mathsf{Free}(f)\;(\mathsf{Free}(f)\;(a))$, e.g. $f^n(b)=f\;(f\;(\dots f\;(b)\dots)$ (by abusing the notation $f^n$) where $b$ has type $f^m(a)=(f\;(f\;(\dots(f\;(a)\dots)$ and return something of type $\mathsf{Free}(f)(a)$ and it should be quite clear how to do that: just take the obvious element of type $f^{n+m}(a)$. Altogether, this is a good trace of a monad, so let us translate this into a concrete Haskell type:

data Free f a
= Return a
-- ^ the terms of type a
| Free (f (Free f a))
-- ^
-- recursive definition which embraces
-- `f a`, `f (f a)` and so on

instance Functor f => Functor (Free f) where
fmap f (Return a) = Return (f a)
fmap f (Free  ff) = Free (fmap (fmap f) ff)

`Free f` is just a tree shaped by the functor `f`. This type indeed embraces all the terms of types: `a, f a, f (f a), ...` into a single type. Now the monad instance:

instance Monad (Free f a) where
return = Return
join (Return ma) = ma
-- ^ stitch a tree of trees into a tree
join (Free fma) = Free $ join <$> fma
-- ^ recurs to the leaves

As you can see, takes a tree of trees and outputs a bigger tree, that's what join does on the Return constructor.

Before formulating the next result let's describe morphisms between monads. Let `m` and `n` be two monads then a natural transformation `f :: forall a. m a -> n a` is a homomorphism of monads iff the following two conditions are satisfied:

f . return == return
join . f == f . fmap f . join

Note that this two conditions are satisfied iff f is a monoid homomorphism in the category of (endo)functors of $\mathcal{Hask}$.

Proposition
Let `f` be a functor, then `Free f` then there exists a morphism:
foldFree :: Functor f => (forall x. f x -> m x) -> (Free f a -> m a)

which restricts to an isomorphism of natural transformations on the left hand side and monad homomorphisms on the right hand side, and thus Free f is rightly colled free monad..

Proof
Let start with a defintion of `foldFree`.
foldFree :: Functor f => (forall x. f x -> m x) -> (Free f a -> m a)
foldFree _ (Return a) = return a
foldFree f (Free ff)  = join $ f $ foldFree f <$> ff

It's inverse is:

liftF :: Functor f => (forall x. Free f x -> m x) -> (f a -> m a)
liftF f fa = f $ Free $ Return <$> fa

First let's check that foldFree f is a morhpism of monads:

foldFree f (Return a)
-- | by definition of (foldFree f)
= return a
  
foldFree f (join (Return a))
= foldFree f a
-- | by monad unitality axiom
= join $ return $ foldFree f $ a
-- | by definition of (foldFree f)
= join $ foldFree f (Return $ foldFree f a)
-- | by definition of functor instance of (Free f)
= join $ foldFree f $ fmap (foldFree f) $ Return a

foldFree f (join (Free ff)
-- | by definition of join for (Free f)
= foldFree f (Free $ fmap join $ ff)
-- | by definition of foldFree
= join $ f $ fmap (foldFree f) $ fmap join $ ff
= join $ f $ fmap (foldFree f . join) $ ff
-- | by induction hypothesis
= join $ f $ fmap (join . foldFree f . fmap (foldFree f)) $ ff
= join $ f $ fmap join $ fmap (foldFree f)
$ fmap (fmap (foldFree f)) $ ff
-- | f is natural transformation
= join $ fmap join $ f $ fmap (foldFree f)
$ fmap (fmap (foldFree f)) $ ff
-- | monad associativity
= join $ join $ f $ fmap (foldFree f)
$ fmap (fmap (foldFree f)) $ ff
-- | by definition of (foldFree f)
= join $ foldFree f $ Free
$ fmap (fmap (foldFree f)) $ ff
-- | by functor instance of (Free f)
= join $ foldFree f $ fmap (foldFree f) $ Free ff

And we have

foldFree . liftF :: (forall x. Free f x -> m x) -> (Free f a -> m a)
(foldFree . liftF $ f) (Return x)
-- ^ where f is a morphism of monads
= foldFree (liftF f) (Return x)
= return x
= f (Return x) -- since f is assumed to be a morphism of monads

(foldFree . liftF $ f) (Free ff)
-- ^ where f is a morphism of monads
= foldFree (liftF f) (Free ff)
= join $ liftF f $ fmap (foldFree (liftF f)) $ ff
-- | by induciton hypothesis
= join $ liftF f $ fmap f $ ff
-- | by definition of (liftF f)
= join $ f $ Free $ fmap Return $ fmap f $ ff 
-- | by functor instance of (Free f)
= join $ f $ fmap f $ Free (Return ff)
-- | since f is a morphism of monads
= f $ join $ Free (Return ff)
= f $ Free ff
liftF . foldFree :: (forall x. f x -> m x) -> (f a -> m a)
(liftF . foldFree $ f) fa
-- ^ where f is a natural transformation
= liftF (foldFree f) $ fa
-- | by definition of liftF
= (foldFree f) $ Free $ fmap Return $ fa
-- | by definition of (foldFree f)
= join $ f $ fmap (foldFree f)  $ fmap Return $ fa
= join $ f $ fmap (foldFree f . Return) $ fa
-- | by defintion of (foldFree f)
= join $ f $ fmap return  $ fa
-- | since f is a natural transformation
= join $ fmap return $ f fa
-- | by monad unitality axiom 
= f fa

foldFree corresponds to foldMap which is defined in a very similar way

foldMap :: Monoid m => (a -> m) -> [a] -> m
foldMap _ [] = mempty
foldMap f (a : as) = mappend (f a) (foldMap f as)

Note that foldMap is an isomorphism onto monoid homomorphisms with an inverse

g :: Monoid m => ([a] -> m) -> a -> m
g f a = f [a]

Furthermore, if we had polymorphic functions over monoidal categories in our type system, `foldMap` and `foldFree` would be specialisations of the same function!

Some examples of free monads

Let us study some simple examples of free monads
  • First let us consider the constant functor:
data Const a b = Const a

Then Free (Const a) is isomorphic to Either a

toEither :: Free (Const a) b -> Either a b
toEither (Return b) = Right b
toEither (Free (Const a)) = Left a
    
fromEither :: Either a b -> Free (Const a) b
fromEither (Right b) = Return b
fromEither (Left a)  = Free (Const a)

Since Either () is isomorphic with Maybe also Maybe is a free monad.

  • `Free Identity` is isomorphic to:
  • data Nat = Zero | Succ Nat
    
    newtype Writer m a = Writer { runWriter :: (m, a) }
    deriving Functor
    
    toFree1 :: Free Identity a -> Writer Nat a
    toFree1 (Return a)           = Writer (Zero, a)
    toFree1 (Free (Identity fa)) = case toFree1 fa of
    Writer (n, a) -> (Succ n, a)
    
    fromFree1 :: (Nat, a) -> Free Identity a
    fromFree1 (Writer (Zero,   a))
    = Return a
    fromFree1 (Writer (Succ n, a))
    = Free (Identity (fromFree1 (Free1 n a)))

    Note that Nat is the free monoid with one generator (Nat$\simeq$[()]) in the cateogry $\mathcal{Hask}$, and so is Free Identity but in the monoidal category of endofunctors of $\mathcal{Hask}$!

  • If you take a functor with two constructors
  • data F2 a = FX a | FY a
    deriving Functor

    . Then we have

    data S2 = SX | SY
    
    toFree2 :: Free F2 a -> Writer [S2] a
    toFree2 (Return a) = Writer ([], a)
    toFree2 (Free (FX fa)) = case toM2 fa of
    Writer (m, a) -> Writer (SX : m, a)
    toM2 (Free (FY fa)) = case toM2 fa of
    Writer (m, a) -> Writer (SY : m, a)
    
    fromFree2 :: Writer [S2] a -> Free F2 a
    fromFree2 (Writer ([], a))
    = Return a
    fromFree2 (Writer (SX : xs, a))
    = Free (FX (fromM2 $ Writer (xs, a)))
    fromFree2 (Writer (SY : xs, a))
    = Free (FY (fromM2 $ Writer (xs, a)))

    toFree2 and fromFree2 are isomorphisms. I think you see the pattern: if you take a functor with $n$ constructors you will end up with a writer monad over a free monoid with $n$ generators. You might ask if all the monads are free then? The answer is no: take a non free monoid m then the monad Writer m is a non free monad. You can prove your self that the writer monad Writer m is free if and only if the monoid m is a free monoid in $\mathcal{Hask}$.

    Final remarks

    I hope I convinced you that monads are algebraic constructs and I hope you'll find universal algebra approach useful. In many cases we are dealing with algebraic structures which we require to satisfy certain equations. Very often they fit into equational theories, which have a very clear description and which allow free objects. Freeness is the property that lets one easily interpret the free object in any other object of the same type. In the monad setting they are really useful when writing DSLs, since you will be able to interpret it in any monad, like IO or some pure monad.

    References


    Note: This post originally appeared here.

    Artwork, Creative Commons Mike Beeple