FCoB implementation spec

This is the spec I’ve details for my FCoB simulation. Source in github:
https://github.com/iotaledger/res-sims-ds/tree/fcob

Please comment for any clarifications or corrections.

Spec

While this is a multi-node multi-strategy environment, the specifications here are described in terms of FCoB only.

High level flow

  • A simulation starts by initializing several FCoB nodes.
  • Each node issues transactions at rate lambda.
  • Communication between nodes is delayed according to values of an H matrix.
  • Every once in a while, the simulation groups a few transactions as coming from the same address.
  • Whenever a node receives a transaction, it:
    • Check if the transaction is solid.
    • Check if the transaction solidifies existing transactions (recursively).
    • Waits c time, then adds the transaction into its voting pool.
  • Every votingRoundTime, a central entity sends out a global random number to all nodes (with zero delay).

FPC voting

  • Whenever a transaction is added to the node’s voting pool, it receives its initial like status, which is true if the node has not seen another transaction from the same address.
  • Whenever a node is queried with a list of transactions, it returns a list of the same length of like statuses - true, false, or `undefined if it has not seen the transaction, or has not yet added it to its voting pool.
  • Whenever a node receives a global random number, it concludes the previous voting round, and initializes a new one. For readability, this is presented in the reverse order:
    • Initialization of a voting round:
      • An askingRatio is set to 99.9% if there are less than 10 nodes in the simulation, and to 10 / numberOfNodes otherwise.
      • The node sends to each node, with probability askingRatio, the list of transactions in its voting pool.
      • The node collects responses for this round (old responses for previous rounds are discarded).
    • Conclusion of a voting round:
      • For each transaction included in the round:
        • A cutoff value is calculated from the new global random number, in the range of [a, b] for the initial round, and [beta, 1-beta] otherwise.
        • The new like status for the transaction is set to cutoff < t / (t + f), where t and f are the amounts of true and false votes, respectively. (Note that in JavaScript x < 0/0 is false, therefore if there are no true or false responses about a transaction, it is deemed unliked).
        • If the transaction has goes through m0 + l rounds, and had the same status for l rounds, the status is deemed final, and the transaction is removed from the node’s voting pool.

Tip selection

Whenever a tip selection is needed:

  • The genesis is set as the current candidate.
  • In a loop:
    • A set is calculated, of all approvers of the candidate which are liked, solid, and has solidification time at most D time apart from the candidate’s.
    • If that set is empty, the candidate is the selected tip, and the process terminates.
    • If that set includes only one transaction, that transaction is set as the candidate (This is not required, but improves efficiency).
    • If that set has more than one transaction:
      • Cumulative weight is calculated for each transaction in the set.
      • The probability of each transaction in the set is calculated to be g^siblingNumber * e^(-a*CW - b*solidificationDiff) (normalized).
      • A weighted random memeber of the set is picked as the candidate.

Where does this come from? I thought “number of nodes to query” was a parameter of FPC (although it’s possible I made that up)

Pretty sure it wasn’t an FPC parameter, so I’ve just thrown that in out of nowhere. Since most simulations I’ve ran had less than 10 nodes, in practice it means that almost always all nodes are queried. This will change according to any spec we’ll come up with, or if we’ll want to run bigger simulations in which that number matters.

Quoting from scripture:

In the prototype we decided to not send any information about the voting round to which an opinion belongs to. I.e. any opinion received is considered the latest one. I am not sure if this is what is meant by this. Would you see any issues with this ?

For simulation purposes it might make sense to stick closer to the mathematical model, i.e. apply a uniform random selection. This would mean to include self query and multiquery of the same node.

In the prototype we decided to not send any information about the voting round to which an opinion belongs to. I.e. any opinion received is considered the latest one. I am not sure if this is what is meant by this. Would you see any issues with this ?

Would an attacker have motivation to delay their responses? On round 1, the attacker does not respond, round 2, the attacker does not respond, but then on round 3, he sends the responses from round 1,2, and 3. And then he repeats this pattern. On rounds = 3 (modulo 3), the attacker would a much larger affect. But perhaps this is a silly idea.

Of course, a node shouldn’t take into account eventual responses from earlier rounds (since the whole idea is that an opinion is issued before the corresponding X-value becomes known). So, we need to handle the situation when, in a given round, not all queries are responded in time. One can, for example, query a bit more than k nodes and use only k responses, or just average the responses that were effectively received before the deadline, or use a combination of these methods (I don’t think it matters much).

This seems like a drastic change. Does this mean that with high likelihood (approx 50%) you will ask less than k nodes? I think the FPC paper suggests to query exactly k nodes. I am not sure if the theorems hold if this condition is not fullfilled.

As mentioned today on the call, I’ll be modifying the FPC paper to deal with the situation when not all queries may be responded. But it can be dealt with in several ways; please don’t take the FPC paper as a dogma.

How do you generate this H matrix ? do you generate a randomly connected graph of the nodes ? This could become a topic of interest, when the delay by the H-matrix becomes an issue in relation to the length of the voting round.
Edit: actually the delay H is not related to the network of the nodes, since the query for opinion does not happen via gossip.

@alon.gal Ahhh… That’s not the scripture I’ve based my work on. The derivative document by both Popov and Clara was it.

A couple. It might enable several kinds of attacks.

You do have to validate that answer you accept relate directly to queries you have sent, and even if you do that, enabling late answers enables an attacker to pick a strategic time to send their respond. This could mainly make eclipse attacks easier, but maybe enable other types.

Yeah, that’s an even better attacking method than the one I’ve mentioned in my last comment!

And it’s not silly. An attacker would do that while replacing 3 with l - the number of consecutive rounds required. After l rounds it would vote l times yes, and after the other l times it would vote l times no, getting a much better chance to get you stuck without a final opinion.

It would be in a real implementation. In a naive 100% honest simulation as the one I’m running, there isn’t much difference.

I would, however, change it before producing proper results, but that would also require some better specification of that k, which is not a constant parameter of the protocol, but should be some more thought out value that results from a few things, including a node’s “guess” as to the size of the network, and the actual network topology around it (something that does not exist in a sim without peer discovery).

Let me understand what the difference here is in our definitions.
The way I intended to suggest to deal with a delay is simply to ignore an opinion if it misses the round where the node is querried.
Assume we select a quorum of nodes (in our models of size k) and only consider their opinion in the round we query them. Any response from a node that is not part of the quorum of the current round is not considered (it could be malicious behaviour so it shouldn’t be considered). Also the node should only consider the last opinion of a node. If by any chance you querried the same node 3 times and all opinions arrive in the last round you would only consider the last opinion. If a node does not reply in time the opinion is formed on the basis of the remaining answers in this round. This way the rounds can be isolated to a certain extend from each other.
However the discussion here is we assume that we would count the opinion of the same node several times? What would be the pro/cons of applying this rule ?