One-Block Irreversibility for Delegated Proof-Of-Stake (DPOS)
Today we bring you a look behind the scenes in Hive development with a glimpse into the thought process of the lead Hive developer as he analyses new potential approaches for Hive.
Read and comment on Hive: https://hive.blog/hive-139531/@blocktrades/one-block-irreversibility-for-delegated-proof-of-stake-dpos
One-block Irreversibility (OBI) is a proposed protocol change whereby a block will be considered irreversible as soon as a sufficiently large super majority of the currently scheduled block producers vote that the block in question should become the next valid block in the blockchain. The goal of this protocol change is to enable most Hive transactions to become irreversible within a few hundred milliseconds (before a second block has even been produced, hence the name “One-block Irreversibility”).
Despite the name, the OBI protocol doesn’t guarantee that EVERY block will become irreversible in one block. Indeed, in the general case, it might take N blocks for irreversibility, which we could refer to as an OBI-N case, but in the overwhelming common case, we can expect irreversibility to happen in one block (i.e. OBI-one) because of the force inherent in this protocol.
As a side note, my understanding is that there have been proposals for somewhat-related, but not yet implemented, protocols to speedup block finality by Ethereum devs (Casper?) and EOS devs (DPOS 3.0+BFT), but for various reasons I didn’t review either of those proposed protocols in sufficient depth, so I won’t be comparing and contrasting them to this protocol.
Before going into detail on the proposed protocol change, let me first explain what is meant by an irreversible block, and why it is an important concept for DPOS-based blockchains such as Hive.
Similarities between Irreversible (DPOS terminology) and Fully Confirmed (Proof-of-work terminology)
An irreversible block in DPOS is similar to a fully confirmed block in a proof-of-work blockchain.
Both concepts are used as a way to be confident that a crypto transaction (for example, a money transfer to your wallet) has been accepted by the network and that enough block-producing nodes have agreed that this transaction happened that it is safe to assume the transaction (in the example case, your payment) can’t be reversed by a fork.
For example, if you operate a store that accepts bitcoin payments, you might not want to let your customer leave the store with their items until their bitcoin transaction has fully confirmed. For bitcoin, a block is generally considered fully confirmed when 6 further blocks have been built on top of the block (each subsequent block can be viewed as “vote of confidence” in the original block). With an average block production time of 10 minutes, this means you could be waiting about an hour (6 * 10 minutes) to be sure of your payment.
Obviously waiting one hour for a payment wouldn’t be practical for most retail stores, and this has led to many workarounds (including perhaps most famously, the Bitcoin Lightning Network).
An important difference: irreversible blocks cannot be automatically reversed
Theoretically, even “fully confirmed” blocks can automatically be reverted by bitcoin nodes, but it is generally assumed that such an eventuality is so extremely unlikely in practice that it is safe to rely on the payment.
So, while they are similar concepts, there is an important difference between DPOS irreversible blocks and POW’s fully confirmed blocks: irreversible blocks cannot be reversed automatically, but fully confirmed blocks can be.
In other words, unlike transactions on the bitcoin network, the transactions in irreversible blocks are irreversibly confirmed and can no longer be reverted from the node’s internal financial ledger due to a fork unless the node operator manually intervenes by popping the most recent blocks from the block history of the node and then replaying the blocks in the blockchain.
So if two nodes in a DPOS network end up on different forks with irreversible blocks in the two forks, those two nodes can never switch to a common fork (an irreversible split in the blockchain) without manual intervention by at least one of the node operators. This is undesirable, so it is best to be conservative when choosing a heuristic for determining when a node should treat a block as irreversible.
Irreversibility under current Hive DPOS protocol
Right now, a block becomes irreversible in Hive once 3/4ths of the witnesses (currently scheduled block producers) have “voted” on including the block into the blockchain.
Under the current DPOS protocol, a block producer votes for a previous block by linking to it when it produces its own block. For example, if block producer 1 (bp1) produces block A, the next block producer (i.e. bp2) can create a block B that links back to A (by including the hash of block A in block B). This can viewed as bp2 voting for the fork that includes block A. If the next block producer (bp3) builds off block B, this is yet another vote for the fork that includes block A.
Once 15 of the 21 block producers (3/4*21=15.75 rounded down to 15) have built off a block, it becomes irreversible. The basic idea behind this is that if ¾ of the block producers are on the same fork, it would be extremely unlikely that the remaining ¼ of the block producers could create a longer chain.
Another thing that further makes such a possibility unlikely is that Hive block producers, by default, are configured to not generate blocks if the “participation rate” drops below 33%. The participation rate is a metric used by block producers to see how many other block-producing nodes they are directly or indirectly in contact with via the peer-to-peer network (they measure this by tracking if they receive the most recent blocks produced by these block producers).
For example, imagine a network split happens between North America and Europe due to an ocean cable being cut, with 3/4ths of the block producers connected via the European side of the split, and the remaining 1/4th connected on the North American side of the split. The block producers on the European side would continue to produce blocks (because the participation rate would be ¾ = 75%) but the North American block producers would stop producing blocks entirely after the participation dropped below 33% (it would rapidly drop to ¼ = 25%) and only the chain fork on the European side would continue to add new blocks. This is generally beneficial, because it makes it difficult to launch a double-spend attack during the time the network is split.
So how does irreversibility play out in such a case? If all 3/4ths of the nodes on the European side were successfully producing blocks, blocks would still become irreversible, because a block would eventually get 75% of the block producers to build off of it. But if one of these block producers stopped producing because of some computer outage, even the European fork would no longer have enough block producers to mark the new blocks as irreversible.
Since the current irreversibility algorithm requires 15 blocks to be built off a block before that block becomes irreversible, the fastest time a block can become irreversible in Hive now is 45 seconds (15 blocks * 3 seconds /block).
We can see that the current delay in finalizing a block occurs because each block producer can only “vote” by creating his scheduled block, and these blocks are produced sequentially, once every 3 seconds. But what if all the scheduled block producers could vote immediately after they receive a block, instead of having to wait their turn to vote?
The OBI protocol in action
The distinguishing feature of the OBI protocol is that each block producer will broadcast a “valid vote” to the p2p network for each block it receives, immediately after it has successfully validated the block and made it the new head block for its local copy of the blockchain (instead of just waiting its turn to implicitly vote for the block when it produces its own block).
This new mechanism allows for block producers to reach consensus on the validity of a block much faster than the existing mechanism (in a well-connected network, a block should typically become irreversible before the next block is even produced).
Here’s a simple example of how this works in practice:
- Block producer 1 (the block producer scheduled to produce the next block) generates and broadcasts a block to the p2p network.
- Other nodes receive this block and temporarily apply it as the next block in their local copy of the blockchain to test if the block is valid. If the block is valid, the node’s local state will be updated with the transactions contained in the block. If the block is invalid, the node will roll back the changes made by the block to their local state. So far, this is how the DPOS protocol currently works.
- New OBI step: If the node is one of the scheduled block producers, the node signs and broadcasts a new type of p2p message call a block_validity_vote if it considered the block valid and made it the new head block for it’s local copy of the blockchain. This message, signed with the block producer’s signature, contains the block producer’s name and the block id of the newly applied block.
- New OBI step: Each node will keep a temporary buffer of the valid block_validity_votes it receives (and also propagate these votes to their peers using the normal p2p rules for message propagation). If a nodes receives the required ¾ majority of distinct block producer votes for a block, that block can be marked as irreversible and written to its block_log.
In a normally well-connected Hive p2p network, this should result in most blocks becoming irreversible on a node within a second or less after they are produced. The exact time required depends on the message latency between nodes and number of network hops between the node and the block producers.
As a side note, recent optimizations to the p2p network code have reduced the time for messages to traverse hops between nodes (and also made it easier for nodes to cheaply maintain direct connectivity to more peers and thus reduce the number of hops between nodes, but I think the current default of 20 peers will be more than sufficient for most use-cases).
Faster irreversibility without blockchain bloat
At the inception of the idea, the design behind One-block Irreversibility included storing the approval vote messages into the next block as a means of proving irreversibility of the prior block. But this adds unnecessary bloat to the size of blockchain, because the existing mechanism for proving a block is irreversible already works well for all but the most recent blocks.
Instead, to prove that recent blocks are irreversible, nodes can keep around the block_validity_votes that they receive to mark a block as irreversible until they have received a sufficient number of follow-on blocks that build off the block. At that point, the block_validity_votes for that block can be discarded.
So one of the nice aspects of the OBI protocol is that it doesn’t increase the amount of blockchain storage, because the block_validity_votes are only kept temporarily in memory (and only a small amount of memory is required).
New votes by a block producer override its old votes at a given block number
Nodes employing the OBI protocol only track the most recent vote cast by each block producer. So if a block producer switches to a different fork, all the votes it cast for blocks that will be discarded during the fork switch will be “overwritten” by the votes it casts for the new blocks at those block positions. In other words, at any given time, every node will only consider one vote by a specific block producer at a specific block position.
Better monitoring of the state of the P2P network and blockchain
Another interesting aspect of the OBI protocol is that it allows for much better monitoring of the status of the Hive P2P network when it is experiencing connectivity problems than was previously possible. Every node in the network (block producer or regular hived node) is tracking the current head block of every block producer it is connected to, effectively knowing how many block producers are connected to its network and which forks they are on.
Improving irreversibility by counting blocks as implicit votes
Blocks that build off a block are also votes for a block’s irreversibility. So rather than simply counting block_validity_votes, nodes will also count the number of subsequent blocks that build off the block (to the extent that these new “block votes” don’t overlap with block producer votes they have received). This is basically how DPOS 1.0 treated subsequent blocks as votes for the previous blocks.
In a well-connected network, this should not result in any speedup in the time it takes for a block to become irreversible, because votes will be cast much more rapidly than new blocks will be built off the block, but it should improve irreversibility time when the network is experiencing outages that cause some votes to be missed.
Longer witness scheduling to be able to determine the next block producers
For optimal performance, the OBI protocol should maintain constant knowledge of the next 21 scheduled block producers, so that nodes know which block producers are casting legal votes.
In the current code, the next set of block producers are selected once every 21 blocks. So, for example, after 18 blocks have been produced in the round, only the next 3 block producers are known. If a node only knows the next 3 block producers, it can’t get a ¾ majority of the next 21 block producers.
To enable all blocks to be potentially irreversible without waiting for more blocks to be generated, the OBI code also modifies the witness scheduling algorithm by scheduling a 2nd round beyond the round that is currently producing blocks. This means that nodes will know at least the next 21 block producers at all times, and as many as the next 42 block producers, so any given block can potentially receive a sufficient number of votes to be become irreversible without waiting for another block to be generated.
Longer schedules to reduce chance for irreversible forks during a big voting slate change
Another benefit of the above change is that votes to replace the current set of block producers will take longer to take effect and this helps reduce the chance for an irreversible fork.
In the current protocol, it can take between 1 to 21 blocks before a witness vote change affects which block producers are elected to product a block. For example, let’s assume that the nodes have just scheduled the next 21 witnesses to produce blocks. New witnesses can be voted on, but the currently scheduled witnesses will still be the ones to produce the next 21 blocks. Only after the round is finished will any new witness be able to produce a block. So, in the shortest case, a bunch of witness votes could be included into the last block in the round, and the entire slate of top 20 witnesses could be replaced in the next block. This could potentially lead to problems for One-block Irreversibility if such votes were cast in the last block of a round and the network forks during this time, leaving a split network where some nodes received the vote transaction(s) that changed the witness slate (and therefore start using a different set of witnesses to determine a ¾ majority than the other side of the fork). In such a problem case, both forks could get a ¾ majority of “their” witnesses, and the two sides of the split wouldn’t be able to regain consensus without manual intervention (an irreversible split).
Fortunately, to address problems of this type, the OBI protocol has a longer set of scheduled witnesses, so there will always be a guarantee of at least 21 blocks produced by the currently scheduled witnesses before newly-voted-in witnesses that were elected in the current head block can produce blocks.
As far as I’m aware, the One-block Irreversibility protocol will put Hive at the forefront in terms of transaction confirmation time.
Hive already had one of the fastest average blockchain confirmation times at 45 seconds, so you may be wondering while I think it’s so important to speed it up further. The biggest benefit comes for 2nd layer apps:
First, 2nd layer apps can now be more interactive with their users, since they have faster guarantees of irreversibility.
And second, HAF-based apps will benefit in terms of better performance, because HAF table views stitch together two types of tables (tables for irreversible data and reversible data). When most of the data is in the irreversible table, a HAF server will operate faster (because there’s more overhead required to maintain reversible tables).
Indeed, with OBI in play, it would not be surprising if many HAF apps don’t just elect to rely strictly on the irreversible data, since blocks will normally become irreversible within one second or less. And this will really speed up the performance of SQL queries for such apps, because the stitching together of data from two different tables is no longer required.
So, all-in-all, I believe the incorporation of One-block Irreversibility will have profound benefits for the scalability of Hive apps (and the potential growth rate of the entire ecosystem).