Double Voting protocol

Currently two main voting protocols are primarily discussed for the voting layer in the Coordicide solution: CA and FPC. We propose a method to combine these two in serial to draw on the benefits of both.


Cellular Consensus, also called Cellular Automata (CA) :

We assume the CA holds the opportunity to achieve high convergence rates and fast convergence time. However, due to the rigid structure of the voting network it is possible that the voting gets stuck in meta-stable states, where sub-groups of nodes (pockets) hold a different opinion than the rest of the network.

The following figure illustrates this: Nodes in the main cluster voted blue (true or 1), while the ones in the pocket voted red (false or 0). This state is stable because all nodes in red have more neighbors to within the pocket, than to the main cluster.

Several counter measures have been proposed in order to overcome this potential barrier, including applying “magnetic” fields or annealing. Initial simulations show that meta-stable states are very unlikely to occur, which may be e.g. due to asymmetric message propagation. However, the risk may be increased in the presence of well-positioned malicious actors. More specifically, the rigid structure of the underlying voting network may increase the possibility of eclipse attacks.


Fast Probabilistic Consensus (FPC) :

In the FPC protocol nodes apply a random majority consensus. This requires that nodes can query at random a sufficiently large subgroup of the network (quorums). Due to the random sampling it is more difficult for an attacker to achieve an eclipse attack. FPC also introduces an additional mechanism which requires a distributed random number generator (dRNG). This dRNG supplies the nodes with a common but random threshold and, thereby, increases the resilience of the protocol against splitting attacks. On the other hand, FPC may require a noticeably increased message overhead compared to CA in order to facilitate the quorums for a sufficiently large amount of rounds if all nodes participate.


Aims :

  • Querying a random quorum at the conclusion of the CA protocol, nodes may detect that their current opinion does not agree with the rest of the network.
  • FPC offers higher eclipse protection and it could be performed in addition. I.e. only nodes with the wrong opinion continue with FPC to overcome the local meta-stable states or eclipse attacks → small increase of message complexity.
  • Remove the requirement for CA to come to agreement between all nodes, since it may be sufficient to come to a pre-consensus, i.e. achieve a super-majority or super-minority in the network.

Risks :

  • it is not clear, whether CA has a lower or higher likelihood for an integrity failure. If an integrity failure would have occured already in the CA protocol level, the FPC would automatically inherit this failure.
  • Message complexity increased
  • Could inherit attacks from both protocols. Unknown additional attack vectors due to combination.

Protocol :

Assuming that the positive effects from a combination of the two protocols CA and FPC outweigh the negative impacts, we can combine those two protocols in series. The currently suggested method is the following:

  • Node triggers or receives information about double-voting on a voting object.

  • CA is conducted until the maximum CA-round (or maximum heartbeat).

  • The node waits till m_1 FPC-rounds passed. In this time window the node will not respond to queries, since early queries may indicate malicious behavior. This is to reduce the likelihood for nodes to start voting on FPC while other nodes are still voting with CA, which may happen due to asynchronicity.

  • Within the next m_2 FPC-rounds the node triggers FPC if it receives an FPC query. m_1 and m_2 can potentially be combined into one parameter.

  • After the m_2 FPC-rounds, if the FPC vote is not triggered through the query from another node, the node triggers FPC. This is an additional buffer to overcome asynchronicity issues.

  • According to subjective stopping rules the node concludes with a finalized opinion. We add one additional rule: If, for a given node, the measured mean opinion in the first FPC-round exceeds a certain threshold the protocol concludes. Otherwise the FPC protocol gets activated in full, i.e. the node needs to run FPC for at least another L-1 rounds.

Some details on the synchronization of the protocol are yet unclear to me:

  1. How is the CA started? Its safety strongly relies on the heartbeat function. What is the heartbeat function if not every neighbor is taking part in the CA? Does it make sense to introduce an initial step, where you inform your neighbors that you want to vote on something?
  2. When does the FPC starts? After a node has seen m_1 CA rounds?
  3. In the second last point, I don t see how a FPC is triggered at a node that did not take part in CA nor received a FPC query. This will happen with a very small probability (and only if it is lost in space) and probably we could just assume that the node does not vote at all.

We could do some simulations now. Essentially everything is already implemented in the FPC simulator. We just have to choose the network topology accordingly. A candidate for a worst case scenario is to do CA on a ring lattice and then do FPC where each node has a list of (1-\alpha n) random nodes.
A (hopefully) more realistic scenario is to replace the ring lattice with a random regular graph.
One might add message loss in the CA part and the MVS in the FPC part.

I have the impression that the nice property of the CA of not being able to lie is lost in the step from CA to FPC. Should we add that the opinions in the FPC have to be justified as well. For instance, in the first FPC round, a node has to justify the last CA round. The protocol could improve if in FPC you always include your neighbors (like in addition to your k neighbors you ask 2k random nodes). In this way your neighbors keep an eye on you.

It’s unclear if this would help: how would you know if my neighbors were chosen in a “truly random” way, or just hand-picked? I can always chose a set of my “friends” (or simply nodes with some given opinion) to “justify” anything.

Re 1.) good question, i am not sure if there is a document where this is written out in more detail?
Re 2.)

My understanding is that since in CA every node continues till the max heartbeat, all nodes more or less finish at a similar time. m_1 is in this proposal measured in FPC-rounds because the dRNG sets the pace. The important bit is that the FPC should start after most if not all nodes finished CA. Hence this delay between (m_1,m_1+m_2]
Re 3. ) If a node missed CA it has not participated in creating the pre-consensus. Maybe this could be the case in times of congestion and when the node has not solidified a tx in time. But in this case the node might have different concerns than the opinion on this tx. E.g. what do neighbors in CA do if you do not respond for the full CA voting time max_heartbeat? Maybe you would get dropped?

In addition to the clusters that could form to hide detection, as Serguei describes, I am worried this effectively removes or weakens the eclipse protection that is offered in FPC by choosing a new neighborhood every round.

I have two answers for this. (1) This is a problem (see my thread). (2) When anode sends out its 1st round opinion, it is implicitly requesting other nodes to send their opinion as well.

Overall, it seems here that we inherent the weaknesses of both CA and FPC. First, we do not have the speed of CA. Second, when it comes to liveness and integrity, this protocol seems to default to the worse one. And for safety, the protocol is no more safe than FPC.

Thus, it seems that if the above protocol works, then FPC would work as a stand alone.

I would tentatevly disagree with the above points for the following reasons.:

  • Re. the speed, FPC is only started for nodes that are opposed to the super-majority in the system (possibly eclipsed or errors in CA). Only in the (possibly unlikely) case of a large proportion of nodes coming to a different opinion in CA, the FPC is started in full and for many nodes… so indeed in most scenarios the protocol should be much faster than FPC for most nodes…
  • Re. integrity : indeed FPC is weak if there is no super-majority, which intuitively seems to get resolved by the CA protocol. (Assuming the network cannot be split easily through a well-positioned adversary - but then agreement failure is the bigger worry anyways).
  • Re. safety / agreement: Simulations on FPC showed that the agreement failure depends mainly on the proportion of the adversary and little on the initial mean opinion. However, if CA concludes with a super-majority, nodes are less likely to have a \eta<\tau_0. It would be interesting to study FPC if a proportion p_0^* of the p_0 nodes that have initially 1 is fixed. I would indeed assume it will perform better.