Adaptive PoW without sequence numbers

luigi.vigneri · March 17, 2020, 11:50am

The collision problem

In its original formulation, the adaptive PoW algorithm (aPoW) requires to add sequence numbers to each transaction. In this way, nodes can verify whether the issuing node had performed an appropriate PoW. The details of the algorithm can be found in this IOTA Café post and in the Coordicide white paper.

Sequence numbers, however, bring up the problem of collisions, where a node creates several different transactions with the same sequence number. While collision detection is trivial for this attack, the damage provoked is not as easy to recover: indeed, an attacker may temporarily exceed its throughput quota, potentially leading to DoS attacks or discrepancy in the nodes’ ledgers. In a few words, this attack may have serious effects to the network.

aPoW without sequence numbers

In this post, we propose to get rid of sequence numbers. In this approach, a transaction is (not uniquely) identified by the pair {ID, timestamp} (the timestamp is in the signed part of the transaction and it declares when the transaction has been created). Our claim is that a node can correctly determine the amount of PoW to compute without the need for sequence numbers.

Let’s consider the following example:

assume you have 4 transactions labeled 14, 15, 16 and 17 as the PoW they carry. These transactions have been respectively created at time 1, 2, 3 and 4.
assume the time window has length 5.
assume that for every transaction in the time window, the algorithm asks to increase the PoW (i.e., \gamma = 1).

Now, you can fall in one of the four following scenarios:

Transactions are received in the correct order.

1st arrival tx 14: PoW needed 14 - OK
2nd arrival tx 15: PoW needed 15 - OK
3rd arrival tx 16: PoW needed 16 - OK
4th arrival tx 17: PoW needed 17 - OK

Transactions are received in reverse order.

1st arrival tx 17: PoW needed 14 - OK (PoW higher than what is needed)
2nd arrival tx 16: in this case the PoW needed depends on the timestamp of the tx received. In fact, nodes keep track of all txs sent by each node and sort them accordingly to their timestamp. Since tx 16 has been generated at time 3 (<4), then the PoW needed is still 14 - OK
3rd arrival tx 15: same analysis as for the previous case - OK
4th arrival tx 14: PoW needed 14 - OK

Attacker issues txs with invalid PoW.

In case txs are received, an invalid PoW (i.e., lower than needed) is trivially detectable and invalid txs are immediately dropped.
Let’s assume that a node receives a tx with PoW equal to the PoW needed. However, the node cannot know (immediately) whether older txs have been issued before the timestamp of such tx, which would make the PoW of the tx not sufficient. In this case, in order not to slow down the network, the node will forward anyway the tx to the next blocks (e.g., congestion control, below max depth criterion, signature validation). In case it receives new txs with older timestamp, then the node will invalidate such a tx.

Attacker issues two transactions with the same timestamp.

We will consider these transactions as two different transactions. In order to determine the correct PoW needed, we sort them by tx hash.

Time window choice

The choice of the time window is crucial in the correct functioning of the algorithm. Our claim is that the time window must be kept small for two main reasons:

Transaction burst can be captured.
Implementation is easier: faster sorting, smaller buffers.

However, it is fundamental to keep this time window at least larger than the network delay.

Drawback of this approach

While this approach definitely solves the collision problem, it leads to temporary inconsistencies where some txs could bypass, for a short time, the rate control filter. However, the network is able to correctly identify the valid transaction in one network delay time frame. We strongly believe that the damage of this temporary inconsistency is negligible.

hans.moog · March 18, 2020, 11:38am

How do you treat the situation, where somebody issues a new transaction with a lower timestamp than a previously accepted transaction that has enough PoW, but that would invalidate the earlier (but according to the timestamp later) tx.

Couldn’t you roll back earlier txs like this?

william.sanders · March 20, 2020, 7:01am

I actually have a similar question to Hans. The timestamp window is 30 minutes. So if I issue transaction x with time stamp t that has just the right amount of proof of work, then I have 29ish minutes to issue a transaction y with time stamp t-1. Transaction y would cause x to be invalidated.

william.sanders · March 20, 2020, 7:27am

Oh wait, i think I understand now. So we have the time stamp window W, but also the adaptive proof of work window w. The proof of work of a transaction is a function of the number of transactions who arrived within time w and whose timestamp is less than the transaction.

In other words, fix a function f\colon \mathbb{N}\to [0,2^{256}], and then for a transaction x, set
A_x=\{y|t_{\mbox{arrival}}(x)-w\le t_{\mbox{arrival}}(y)\le t_{\mbox{arrival}}(x)\}
B_x=\{y|\mbox{timestamp}(y)\le\mbox{timestamp}(x)\}

And then the proof of work difficulty of x is given by f(|A_x\cap B_x|)

In this case, we only need to weight time w for a transaction to be invalidated.

hans.moog · March 20, 2020, 2:20pm

It can still happen that some people would not see the transactions within that time windows and therefor discard different transactions.

luigi.vigneri · March 20, 2020, 2:25pm

Billy’s reasoning is correct. I will try to make sure to clarify it better - this was meant to be explained briefly in point 3.

It can still happen that some people would not see the transactions within that time windows and therefor discard different transactions.

Yes, I agree. However, the algorithm is based on the assumption that network delay is bounded. If this is true, every node will receive all transactions within this time window. If we set w > network delay, then - in principle - this should not be a problem.

hans.moog · March 20, 2020, 2:31pm

But that is not true, nodes can go offline for a moment, be ddosed or just have a longer delay because of having a bad connection.

We should really stop using the assumption, that the network delay is bounded at all times.

luigi.vigneri · March 20, 2020, 3:01pm

That’s a broader discussion, which I also kinda agree. However, we’re building most of the Coordicide modules based on the bounded network delay assumption (e.g., FPC) so it would be difficult to get rid of that at this stage. Also, it’s not trivial to evaluate the actual reliability of this assumption. I believe the best thing we can do is to keep going in this direction and deal differently with the scenario where the assumption is not met: for instance, what to do in case a node falls out of sync? how to resync the node? how to detect a node is ddosed?

william.sanders · March 23, 2020, 6:45am

Well I guess to allay Hans’s fear, the window would have to be a few minutes, but that is problematic for other reasons.

Here is another suggestion: the baseline proof of work could be fairly high, so that this mechanism will be difficult to trigger. Thus, it will be harder to knock a DDOSed node out of sync.

luigi.vigneri · March 24, 2020, 10:07am

As mentioned in the post, having a large time window may lead other unwanted side effects (burst not captured, large buffers).
An effective solution would indeed be to play with the algorithm parameters:

the base difficulty d_0 sets the time needed for a casual user to perform the PoW;
the increase rate \gamma sets the throughput: this can be translated in an optimization problem where we want to find \gamma such that throughput(\mu_{max}) - throughput(\mu_{min}) is minimized. From some fast calculations, we can get some closed-form mathematical expressions to model this.

A potential improvement to the algorithm may be to use progressively diminishing \gamma. Intuitively, this should further reduce the gap between fast and slow devices.