This article is inspired by my recent visit to a blockchain technology conference and my discussions with colleagues about ideas to improve blockchain. Most of the conference speakers were from big Russian banks and their talks were about blockchain use cases, mainly as databases or smart contract platforms. However, none of the speakers were able to answer the question, ‘why did they really need blockchain?’. The correct answer for this question recently came from the R3 CEV consortium: "no blockchain because we don't need one". Blockchain is not needed for banks, it is needed instead of banks. It is required for decentralized systems only, applications with a trusted party will always be more efficient, simple, etc. The meaning of decentralization has been widely discussed (see, for example, this post from Vitalik Buterin and it is the only real purpose of blockchains. In this blog post I'm going to discuss the degree of centralization of existing cryptocurrencies and the reasons leading to it.
Governance and development centralization
Let's start with a non-technical topic. It is nice to think that no-one controls blockchain, ie that network participants (miners) act as a decentralized community and chose the direction of development. In fact, everything is much worse.
The first source of centralization here, is the source of the protocol for improvement. Only a small group of core developers is able to accept changes to the source code or even understand some protocol improvement proposals. No-one works for free and the organization that pays money to the core team in fact controls the cryptocurrency’s source code. For example, Bitcoin development is controlled by Blockstream, and this organization has its own interests. A treasury system like the one for Dash or the one proposed for Ethereum Classic may be the solution here. However, a lot of questions are still open (for example, the 78 pages of ETC treasury proposal are quite complicated, while the Dash treasury system was developed without any documentation).
The next centralization risk in governance is the cult of personality. While Vitalik tells us in his post, that no-one controls cryptocurrencies, his opinion is so important for the Ethereum community, that most of its members accepted the bailout of the DAO, which breaks the basic immutability principle of cryptocurrencies.
Finally, there are a lot of interested parties behind cryptocurrencies, and the opinion of some of them (for example the end users of the currency) is usually ignored. Anyway, the development of cryptocurrencies is a social consensus, and it is good to have a manifesto, declaring its purpose from the start.
One of the biggest problems with existing cryptocurrencies is the centralization of services. Blockchain processing is heavy (eg Ethereum processing from the genesis block may take weeks) and regular users that just want to send some coins have to trust centralized services. Most Bitcoin users trust blockchain.info, Ethereum users trust myetherwallet and so on. If these popular wallets were to be compromised, users’ funds would be lost.
Moreover, most users trust in blockchain explorers and never check that the blocks in it are correct. What is the meaning of the "decentralized" social network Steemit, if almost none of its users download a blockchain and believe that the data on Steemit.com is correct? Or imagine that blockchain.info was compromised: an attacker could steal all the users’ money from their wallets, hide the criminal transactions and show user-created transactions in blockchain explorer, and the attack could go unnoticed for a long time. Thus, trust in centralized services produces a single point of failure, allows censorship and puts user coins in jeopardy.
With popular cryptocurrencies, hardware requirements are high even just for blockchain validation. Even if you own modern hardware able to process blocks fast, your network channel may not be wide enough to download the created blocks fast enough. This leads to a situation where only a few high-end computers are able to create new blocks, which leads to mining centralization. Being open by design, Bitcoin mining power is now concentrated in a limited group of miners, which could easily meet and agree to perform a 51% attack (or just censor selected transactions or blocks). Mining pools worsen the situation, for example, in Bitcoin just five mining pools control more than 50% of the hash rate (at least if you believe blockchain.info. Another option for miners is to skip transaction processing and produce empty blocks, which would also make blockchain meaningless.
Proof-of-Stake is usually regarded as more hardware friendly, however, a really popular blockchain requires a wide network channel to synchronize the network anyway. Also, it is usually unprofitable to keep a mining node in PoS and just a small percentage of coins are online that makes the network vulnerable. This is usually fixed by delegating mining power to someone else, that also leads to a small amount of mining nodes.
Centralization as a solution
The most scary point of all this is that more and more often, centralization is regarded as a solution for some problems of cryptocurrencies. To fix scalability issues, cryptocurrencies propose to use a limited number of trusted "masternodes", "witnesses", "delegates", "federations" and so on to "fix" the too-large amount of mining nodes in the network. The number of these trusted nodes may vary, but by using this method to fix scalability issues, developers also destroy the decentralized nature of the currency. Eventually this would lead to a cryptocurrency with only one performing node, that processes transactions very efficiently, without confirmation delays and forks, but suddenly a blockchain is not needed, as in R3's case.
Unfortunately, most users are not able to determine the lie in cryptocurrencies and like these centralized blockchains more and more, because for sure, the centralized way is (and will always be) more simple and user-friendly.
We are going to see more centralized cryptocurrencies and that will inevitably lead to mass disappointment in blockchain technology, because it is not needed for centralized solutions. It is still a user choice, whether to believe a beautiful and fast web interface or to use trustless, but harmful software, requiring you to download blockchain data and process it.
Most centralization risks may be fixed, if trustless full-nodes, wallets, explorers are cheap to launch and easy to use. I'm not going to propose a solution in this paper, but I hope it is coming soon!
Authenticated Dynamic Dictionaries, with Applications to Cryptocurrencies
20 February 2017 3 mins read
Our paper "Improving Authenticated Dynamic Dictionaries, with Applications to Cryptocurrencies" will appear at the Financial Cryptography 2017 conference in Malta in April. It was also presented at the Real World Crypto 2017 conference in New York and I highly recommend watching the impressive presentation from Leonid Reyzin, professor of computer science at Boston University and one of the four authors of the paper.
Some background. Previously I worked for the Nxt platform which has assets and many more cool features. The problem is, the blockchain processing becomes incredibly heavyweight (considering the pretty low number of transactions, in comparison with Bitcoin) with new features added. The same problem with Ethereum these days - after the attacks in autumn, it is nearly impossible to wait until processing being finished on an ordinary laptop.
The problem is in a state (e.g. UTXO set in Bitcoin) persistence. Once it hits a secondary storage (HDD or SSD), processing becomes very slow.
Thus two considerations behind our work on AVL+ trees and a proposed scheme for cryptocurrencies:
It should be feasible to run a full-node (maybe not a mining node) on commodity hardware
Initial blockchain processing, and then block processing must use RAM only
As commodity hardware is pretty limited in RAM, the idea is not to store the state for full-nodes at all. The scheme is as follows:
The state is authenticated with the help of a 2-party dynamic authenticated dictionary.
A mining node is storing the whole state. When packing transactions into a block, it generates proofs of the authenticated state transformations and announces a new root hash after the transformations being done in a blockheader. Proofs are to be included into the block.
A full-node receiving the block checks that 1) Transactions are correct (format and signatures are correct etc) 2) State transformation operations derived from the transactions are corresponding to the proofs 3) Proofs are correct 4) Resulting roothash (a verifier is getting it just by processing proofs) is the same as the announced one. Thus the node is checking everything, but without holding the state (e.g. UTXO set).
Then the paper is about to find a most efficient structure out of many candidates (and the winner is custom-tailored authenticated AVL+ trees).
Not mentioned in the paper but worth mentioning is that proofs in a block could be authenticated themselves (with the help of a Merkle tree which is perfect for static data) with a root hash included in a blockheader. Then if node is holding the state it could skip downloading proofs from the network, also there is possibility to prune them in the future (this scheme reminds me of the SegWit proposal for Bitcoin).
Proofs are adding a significant burden regarding block size (actually a proof can be longer than the corresponding transaction), so decreased throughput is to be considered seriously.
The code had been released on GitHub during RealWorldCrypto – see the section on authenticated data structures. There are some possible further minor optimizations (possibly reducing proof size by few percent in total) we are now discussing.
Log-Structured-Merge trees (LSMT) are a good fit for modern SSD storage and offer good performance and reliability. LSMT are also a good fit for blockchain storage requirements (snapshots, consistency, proof of existence). This blog post describes a database designed specifically for blockchain storage, inspired by existing LSMT implementations (RocksDB, COLA tree).
The current state-of-the-art LSMT implementation is probably RocksDB, with in-memory write buffers, parallel compaction and snapshots. Another similar algorithm are COLA tree. That is a btree-like structure where each node has separate write buffer. Finally there is SSTable from Cassandra which is fairly simple, but offers great write performance.
For Scorex and other projects, we designed a storage with codename IODB, which combines features from all the above implementations. It has log-structured storage with fast binary search and multilevel-multithreaded compaction. It will also have snapshots, MVCC isolation, concurrent transactions, online backups etc...
Compared to RocksDB IODB it will have better concurrency, lesser write amplification and smaller disk overhead during compaction. IODB also runs on JVM, is easier to deploy and its functionality can be extended with plugins.
RocksDB is probably the current state-of-the-art LSMT implementation. It offers great performance and great concurrency. However it also has some trade-offs.
RocksDB (and other LSMT implementations) is using compaction to reclaim disk space, and to remove the old version of data from logs. The compaction process is not optimal:
- To perform full compaction all files have to be rewritten.
- Full compaction requires extra disk space to create new files, twice more than the data size.
- Compaction process operates over large files. It can take a long time before a single compaction task finishes. This makes it hard to temporarily pause the compaction process or close the database quickly.
- Write amplification is too big, the data entries are moved too many times during compaction process, even if they were not modified.
To solve that IODB took an inspiration from COLA tree (also known as Fractal Tree), which offers great performance under random updates. A COLA tree is a BTree like structure, where each node has its own write buffer. If the buffer becomes too large its content is propagated to a lower level by compaction process. A COLA tree is complex, but the basic idea of separate writer buffers is simple and great.
A COLA tree is difficult to implement; the code for compaction and rebalancing is complex, almost impossible in a concurrent environment. IODB avoids that by sharding data into non-overlapping intervals, rather than hierarchical BTree like nodes. Self-balancing code is simple; if an interval becomes too large (over 64MB), it is sliced into smaller intervals by the compaction process. If an interval becomes too small, it is merged into its neighbours.
Each interval in IODB is compacted with multi-level merge, the same way as RocksDB or any other LSMT implementation. So in practice IODB is composed of several small LSTM, one for each interval.
Background compaction is composed of several small atomic tasks. Each task operates on a single interval with limited size (bellow 64MB). Small tasks are easier to tune and run better in multiple threads. The limited size requires very little space overhead. Finally it is fast to close store or temporary pause compaction.
Multi source pre-sorted merge
Every commit (or write batch) is saved into a separate file. Over time you get multiple files sorted by time. To find an entry one starts from the newest file and traverses all files, until the required key is found. The compaction process merges those multiple files together.
RocksDB compaction can merge only two files at a time. IODB compaction can merge more than two files at a time. If the source data are presorted, the multi merge requires only one single pass over the source data. Creating the new merged file takes approximately
O(N*log(M)) time, where N is the total number of entries in interval, and M is the number of source files.
The number of source files (M) and the interval size (N) can be tuned. Large M means less IO and more CPU usage. Larger N reduces IO and CPU usage at the expense of larger memory usage. Configuring those parameters should make IODB usable in various workloads.
There is a number of optimizations. For example to reduce CPU overhead from comparing keys, IODB can be configured so that all keys in one interval share the same prefix. Only the key suffix is then compared (this trick comes from Cassandra).
Binary search over sorted table
Scorex uses fixed size keys. We can eliminate a file offset translation layer, and perform binary search directly over a memory-mapped file. This is very disk cache friendly and much faster than traditional hierarchy based structures, such as BTrees.
IODB is designed to be very easy to deploy. It is pure-Java and can run on any Java8 enabled JVM.
It is also written in a way which allows to integrate it into J2EE container or Spring Dependency Injection. It is a set of independent components wired together with the constructor parameters.
It also uses common Java components such as
ScheduledExecutorService to run background tasks. IODB can share thread pools with other Java libraries etc..
Finally IODB takes some features from RocksDB. It will support incremental backups and snapshots, created with hard file links...
MapDB will not use IODB directly, but will get a storage component based on IODB design. MapDB needs a different set of features (variable key size, 8-byte recids, less random data) and is written in a different language (Scala vs Kotlin), so it will have to get separate implementation.
MapDB will get append-only storage engine based on IODB design. Recids (keys) in MapDB store are 8-byte longs, which allows many optimizations by using primitive arrays and collections.
SortedTableMap could also use compaction similar to that of IODB design. So we will have a writable
on top of the append-only storage. It will use
SortedTableMap storage format for a single file, but it will support
updates, snapshots, compaction and all the features of IODB.
When you starting a Bitcoin node it is downloading all the transactions for more than 7 years in order to check them all. People are often asking in the community resources whether it is possible to avoid that. In a more interesting formulation the question would be “can we get fullnode security without going from genesis block”? The question becomes even more important if we consider the following scenario. Full blocks with transactions are needed only in order to update a minimal state, that is, some common state representation enough to check whether is arbitrary transaction is valid(against the state) or not. In case of Bitcoin, minimal state is unspent outputs set (we call this stateminimal as a node could also store some additional information also, e.g. historical transactions for selected addresses, but this information is not needed to check validity of an arbitrary transaction). Having this state (with some additional data to perform possible rollbacks) full blocks are not needed anymore and so could be removed.
In Bitcoin fullnodes are storing all the full blocks since genesis without a clear selfish need. This is the altruistic behavior and we can not expect nodes to follow it in the long term. But if all the nodes are rational how a new node can download and replay the history?
The proposal recently put on Arxiv trying to solve the problems mentioned with a new Proof-of-Work consensus protocol. I very lightly list here modifications from Bitcoin, for details please read the paper.
- State is to be represented as an authenticated data structure(Merkle tree is the simple example of such a data structure) and a root value of it is to be included into a blockheader. It is the pretty old idea already implemented in Ethereum(and some other coins).
- We then modify a Proof-of-Work function. A miner is choosing uniformly k state snapshot versions out of last n a (sufficiently large) network stores collectively. In order to generate a block miner needs to provide proofs of possession for all the state snapshots. On a new block arrival a miner updates k+1 states, not one, so full blocks (since minimal value in k) are also needed.
The RollerChain fullnode is storing only sliding window of full blocks thus storing disk space, also less bandwidth is needed in order to feed new nodes in the network and so bandwidth saved could be repurposed for other tasks.