Although we are all busy with coordicide, there is always time for a bit of theory. The ideas here were developed during many good chats over beer with other Iota researchers.
What is a clock
A tempo is a state machine whose state are enumerated by \mathbb{N} and state i can only transition to state i+1 (note, this is my own terminology and not a standard definition). Each state change is effectively “beat” or a “tick”. For example, consider a watch counting seconds since a certain point in time. Each time the watch ticks, the state changes.
Generally, a tempo can be very irregular. For example, the number of super bowl wins by the Green Bay Packers is a tempo. The tempo moved from 0 to 2 in the 60s, and then to 3 in the 90s and then to 4 in 2010. If the Packers do not improve their defence, the tempo will stay at 4 for a long time. More generally, imagine a tempo which records the number of flashes of a light. The light can flash 1000 in one hour, and then remain off for a century. The point is that tempos need not be regular.
We call a tempo a clock when it changes state regularly (again this definition is not from any scientific literature). Since this definition is not rigorous, we give a few examples. First, a watch counting seconds is most certainly a clock. Second, the flashing light tempo discussed earlier is not a clock. However, this tempo would be a clock if the flashes were a poisson point process, i.e. the light flashes randomly but at a particular rate with high probability.
Clocks thus are a tempo that corresponds to real time, or more generally a clock is a machine for measuring time. Defining a clock more precisely requires healthy doses of physics and philosophy. Although I will now move on, the reader may ruminate over this point with a pint.
Lamport Clocks
In Lamport’s landmark paper on distributed systems, he discusses a way of partially ordering messages in a distributed system. Each processor counts the number of messages they have seen. He refers to this number as a Lamport clock. Then, each message may include a Lamport Timestamp which is the time on the Lamport clock when the message was created.
The Lamport timestamps give a partial ordering. The order is partial because multiple messages can have the same timestamp. Lamport shows that these timestamps have a very important property: if message B depends on message A, then the Lamport timestamp of A is smaller than B. However, this only is true in an honest setting because the timestamps are unenforced.
Now a Lamport clocks is a tempo but not generally a clock in our terminology established above. Messages can be generated at any variable rate especially if one of the processors is byzantine. Thus the tempo can be completely irregular.
Clocks and DLTs
Every DLT needs to control the rate that information is added to the ledger, otherwise it can be spammed and overwhelm all the processors in the network. However, the word rate is meaningless without some sense of time. Thus, any DLT needs a clock! We will justify this statement further but first we give some examples of clocks featured in DLTS.
- Proof of Work Clock: Any proof of work block chain is a clock. Indeed, the bitcoin white paper describes a blockchain as a timestamping algorithm. Any time a block is discovered, the clock “ticks”. Indeed, the difficulty is adjusted so that the discovery of blocks is always a poisson point process of constant rate.
- Committee clock: Suppose we have a “committee” of n processors who each release a message every few seconds. Any time k< n committee member release a message, the clock “ticks”. Since k is less than n, the entire committee does not need to be honest. This type of clock is used in hash graph and also in the Aleph 0 project. Actually, our DRNG is also a clock by this reasoning.
- Coordinator: Each time the coordinator releases the milestone, the “coordinator clock” ticks. Indeed, the milestones are used as a timestamp in many places in the current network. This is technically a committee clock with n=k=1.
- Normal clocks: Some proof of stake block chains effectively use normal clocks for their rate control. Consider EOS: blocks are supposed to be issued at certain intervals. After each block is issued, the following blocks are essentially “votes” on whether or not the block was issued at the time.
Various DLTs use these clocks to control the rate messages (aka blocks or transactions) are created and distributed in the network. In the current protocol, the below max depth criterion uses milestones to prevent large bursts of transactions which cannot be validated quickly.
The rate messages are created need to be bounded, otherwise the network will be congested or nodes will not be able to process all the information. On the other hand, a minimum number of messages must be created otherwise the network is effectively down. Thus message creation must be correlated in someway with real time.
Let me rephrase using our earlier terminology: in any DLT (with an effective rate control system) each node’s Lamport clock tempo becomes a real clock (by our earlier definition). Moreover, DLT Lamport clocks are byzantine resistant.
Many DLTs go a step further and find find ways to essentially enforce Lamport timestamps. Consider block chains. In theory, the block number is effectively the number messages, or blocks, that a miner has seen up to that point. Miners are effectively incentivised/forced to put their block at the end of the longest chain. But this is equivalent to issuing a block with the correct Lamport timestamp.
Clocks and Coordicide
This might all be very interesting, but who cares? Well, let us examine how clocks are used in coordicide. First, we cap each nodes’ message processing rate by \nu. We use a combination of scheduling algorithms and back off notifications to ensure that large backlogs do not form causing large latencies or message drops even during an attack. Thus each node’s Lamport clock will tick away roughly at rate \nu with no problems. This is great news!
Coordicide also has timestamps measured in seconds which are enforced through FPC voting. These timestamps replace many of the roles that milestones provide in the current protocol, such as the below max depth criterion and snapshots. Also, seconds are used to measure difference in arrival times of conflicts, per the FCOB rule. Thus coordicide essentially uses two clocks: a normal clock and a Lamport clock.
I recommend however that we eventually only use Lamport clocks, and use Lamport timestamps instead of real time timestamps. Furthermore, I think delays between conflicts should also be measured in terms of the Lamport time. Note, Hans had a very similar idea last spring when we first discussed timestamps.
Besides natural elegance, here are some reasons why this would be a very good idea:
- Currently, if no one issues any messages for 30ish minutes, the network will die because no one can issue a transaction with a valid timestamp that satisfies the below max depth rule. Now this would only happen during some global internet outage, and the community can come up with an ad hoc solution in this case. However, using Lamport time, the protocol would easily survive these outages because everyone’s clocks would simply pause during a network outage.
- Currently DDOS attacks complicate certain aspects of the protocol. Indeed, we take these delays into account when setting levels of knowledge and judging timestamps conflicts. However, DDOS attacks would not affect a Lamport clock: a node will simply pause their clock till they come online again. Assuming the node receives all the mesages in the right order, each node would measure the correct arrival times and delays in Lamport time even durring a DDOS attack. As a result, the timestamp window could shrink to a much smaller value.
Let me be clear: we would only use Lamport time in a future version of the protocol and not the first version of coordicide. Indeed, we currently do not have time to sort out all the necessary research questions to implement this now. For instance we would need to know the following:
- What is the network delay in “Lamport” time?
- Are my claims here actually true? If a node is DDOSed, will it still measure the correct delays in Lamport time upon recovery?