Mana epochs and confirmations

william.sanders · April 4, 2022, 11:11am

Problem

One of the main goals of coordicide is to have a dynamically available, partition tolerant core protocol. However, as recent conversations has shown, the current notion of confirmation is not partition tolerant. Indeed, if we have two forks of the tangle, the order the forks are downloaded affects the notion of confirmation.

Suppose we have forks A and B. Suppose that before the fork that nodes N_1,\dots, N_n have all the mana. The consider that in fork A, the mana is transferred to nodes M_1,\dots,M_m. Now consider the hypothetically extreme case where the N_i and M_j are distinct, and assume that nodes M_i operate only in the fork A, and very few, if any, N_i operate in A. Currently, mana is booked on confirmation, in a “current mana vector”. In fork A, the mana transfers will be confirmed (because the protocol is dynamically available, and confirmation will not halt), and the messages in A will be confirmed by the nodes M_j. Meanwhile, in fork B, the mana will be held by the N_i which will confirm the messages in B.

Now, when the forks merge, the nodes in B will not confirm the mana transfers to the M_j, and thus none of the messages in fork A will appear to be confirmed. Similarly, the nodes in A, will not see the messages in B confirmed. Therefore the fork will not be resolved, since no one can agree on which one is correct.

Recall, that partition tolerant means that the protocol can always choose between forks, not that these forks will be merged.

Proposed Solution

I propose that we return to mana based epochs. Specifically, the confirmation for each message in a certain epoch would be judged by the mana as computed with respect to a predefined epoch. Let me explain more precisely.

Fix a message M. Recall the following the vectors (recall that each indexed by the space of node IDs)

\overrightarrow{\rm{supp}}(M) is the supporters of M
\overrightarrow{\rm{mana}} is the mana vector
\overrightarrow{\rm{act}} is the list of active nodes computed using Sebastian’s activity window
\overrightarrow{\rm{act}}\times\overrightarrow{\rm{mana}} active mana vector: is the pointwise product of these vectors. A value is 0 if the node is non active, and for all active nodes, the value indicates the amount of mana the node has.

Then, we have
\rm{AW}(M)=\rm{comp}_{\overrightarrow{\rm{act}}\times\overrightarrow{\rm{mana}}}(\overrightarrow{\rm{supp}}(M))=\frac{\overrightarrow{\rm{supp}}(M)\cdot (\overrightarrow{\rm{act}}\times\overrightarrow{\rm{mana}})}{||\overrightarrow{\rm{act}}\times\overrightarrow{\rm{mana}}||}
where the dot is the normal dot product. We mark a message as confirmed when this percentage exceeds a certain threshold

I propose that we modify the formula as follows:
\rm{AW}(M)=\rm{comp}_{\overrightarrow{\rm{act}}\times\overrightarrow{\rm{mana}}(M)}(\overrightarrow{\rm{supp}}(M))=\frac{\overrightarrow{\rm{supp}}(M)\cdot (\overrightarrow{\rm{act}}\times\overrightarrow{\rm{mana}})}{||\overrightarrow{\rm{act}}\times\overrightarrow{\rm{mana}}(M)||}
where \overrightarrow{\rm{mana}}(M) is the mana vector of the computed at the end of epoch i-N where i is the epoch of M, and N is some parameter.

I claim that this notion of confirmation should be t invariant under the order that messages were received. This seems a bit complicated to prove, and so I omit the proof and hope that my claim is true, leaving this as an open problem

Problem 1
The first central problem (besides that the above claim needs verified) is the computability. We used something similar to last spring, but it was computationally impractical because of the exponential moving average.

However, I think this can be fixed by simply removing the exponential moving average. Specifically I propose the following general algorithm:

When we initialize the mana vector for Epoch n+1, we copy the mana vector from Epoch n
When a message in epoch i becomes confirmed, we iterate through all mana vectors j\ge i and apply the mana changes

So is this practical from an implementation point of view?

Problem 2

One of the strength’s of this solution is that it allows us to “unconfirm” confirmed transactions. This can happen when the mana vector changes during an epoch. This presents some complications to the protocol.

First, how do we do the unconfirmation? Everytime a transaction is confirmed, do we have to iterate through all the messages to see where the confirmation flag changes?

Second, if we unconfirm a message what do we do? I suppose

Un-mutate the changes in the ledger state
Revert the changes in the mana vectors
Change the eligibility status of messages in future conte.

The second is potentially expensive: for every mana change, we have to search for confirmation status’s to change, and which causes new changes in mana, which causes new changes in confirmation…

Number 3 presents a challenge to TSA and the scheduler: how does the fluctuating eligibility affect these modules.

These issues however might be endemic to any solution with reversible confirmations.

sebastian.mueller · April 6, 2022, 6:58am

The proposal seems conceptually similar to our initial proposal in considering the exponential moving average (imagine that the main weight comes from the last epoch). Any smoothing reduces the chances to have a situation that you call “Confirmation problem”.

sebastian.mueller · April 6, 2022, 7:03am

The impossibilty of merging the forks is not clear to me. I think it depends on the way that you try to fork. For instance, if you reset the mana vector to the most common ancestor of the two forks, and re-run the the Tangle from there, all nodes should come eventually to a the same conclusion, due to the CRDT property of the Tangle.

william.sanders · April 6, 2022, 11:00am

That is exactly what we propose to do here: look at the latest epoch where the mana differs, and then rerun the mana to the present. Currently, in goshimmer we do not do this.

piotr.macek · April 11, 2022, 7:22am

As discussed, the proposed solution has some problems. I was thinking that maybe it would be possible to keep track of all the forks of the Epoch Commitment Chain (ECC) and for each of those forks we would store a separate set of:

Active nodes
Mana vector
Tangle time (confirmed time or median time)
Tips (to easily attach to the correct fork)

A node also marks messages as “confirmed” within the fork if a message meets the criteria to be “confirmed” in relation to the active nodes and mana vector on that fork. This doesn’t mean that the message is considered confirmed by that node.

A node considers “truly confirmed” only the messages that belong to the fork with the most weight. A node shall attach only to tips belonging to the fork with the most weight (what does that mean?).

There are two scenarios that should be considered:

Forks that are created in an environment without partitions
Those forks should die quickly as they would most likely be a result of either malicious behavior or some honest nodes not receiving all the messages yet. Those forks should be orphaned quickly as majority will quickly pick up the correct one. But before they are orphaned, a node will keep track of all the currently active nodes, mana vectors, tangle time and tips on that fork.
Forks that are created in an partitioned network.
This scenario is a little more tricky, but doesn’t require any special actions. Suppose that the network is split in 2 separate partitions for some time, during which multiple commitments were issued and confirmed locally on each partition. Then partitions are merged and nodes from each partition start solidifying and booking. When solidifying the other part subtangle, which contains different epoch commitments, the node recognizes this as a ECC fork and starts to keep track of all the active nodes, tangle time, tips and mana vector on that fork. We don’t need to walk the other part of subtangle as we use the replay property of the tangle as we use during regular syncing.

The only difference between the scenarios above, is that in the partitioned scenario, messages arrive with a significant delay and multiple commitments were issued. However the approach for the scenarios is the same.

Some things I’m not sure about:

How easily can we recognize forks? The proposed solution should work under the assumption that ECC forks can be recognized quickly to be able to quickly start keeping track of all the things for each fork.
What does it mean that a fork has the most weight? It’s possible that a malicious node is aware of more than 1 partition and issues messages to both of them. We should consider him a supporter of which branch?

In conclusion, this resembles the OTV voting on conflicts where we also allow multiple parallel realities to exist at the same time and only accept the one with enough weight. It seems that commitment conflicts could be looked at the same way as we look at regular double spend conflicts.