Side Tangle Attack

Problem
A big side tangle is being created that is being stitched to the tangle at a certain time by transaction x.
When getTransactionsToApprove is called and a tip is selected that approves transaction x then the transactions of the sidetangle are also approved indirectly.
In order to validate that the ledger is in a consistent state, the entire sidetangle must be traversed. It is important to observe that the network itself continues to relay transactions. Only the the tip selection mechanism is being blocked.

Possible Solutions

  1. During the consistency check, if after a predetermined amount of transactions we didnā€™t reach the end of the subtangle or we didnā€™t encounter a milestone just quit the calculation and declare the subtangle inconsistent.
    The disadvantage of this approach is that we may be disqualifying valid transactions.
    If a strong node actually did the work of verifying the big subtangle and confirmed it with a honest transaction x, other nodes will orphan x and its subtangle even though though they are perfectly valid.

  2. The reason that we are normally not doing long calculations is because we have checkpoints (aka milestones :slight_smile: ). Milestones trigger IRI to create internal snapshots of the state of the tangleā€™s ledger. So when we check for consistency we can stop analyzing transactions below the latest milestones.
    So the idea is to use checkpoints that are not milestones. The new checkpoints wonā€™t confirm anything, but they will create internal ledger states (only the state confirmed by milestones will be the ā€œrealā€ state). With this approach we will stop analyzing the side tangle once we reach the first checkpoint. Then we can check consistency by comparing the internal states.
    This approach doesnā€™t have disadvantage of the first solution.
    The disadvantage here that it far more complicated than solution (1). There are also many open questions I have in my here about this approach, and probably more questions will come in the future.

1 Like

Another solution, that me and Gal think should be discarded, but Iā€™ll mention it here for completenessā€™ sake, was proposed by mstx on GitHub. The solution was to spawn a ā€˜slowerā€™ background process by gTTA if it reaches a certain number of transactions that would then take over. So basically a timeout that spawns another process. The issue is that itā€™s still possible to DoS the network by making nodes run a ton of these/a lot of nodes run these. It possibly just moves the bottleneck from one node resource to another.

1 Like

Maybe Iā€™m oversimplifying this, but, as far as I understand it, the computational overhead comes from these challenges:

  1. need to find latest referenced milestone when walking
  2. ledger consistency checks.

IRI currently only stores the milestone that approves a transaction (i.e. checkpoint).
1 is significantly easier to fix than 2.

About 1.

Looking at transactions that IRI has accepted already (i.e. PoW is correct), they have three separate states:

  • valid
  • unprocessed
  • inconsistent bundle (i.e. signatures are incorrect) / inconsistent ledger state (i.e. double spends or whatever)

Both inconsistent states carry forward to all of their approvers.

I propose changing the IRI database scheme:

  • add a referencedMilestone field (= max(milestone referened via branch, milestone referenced via trunk))
  • add a consistencyState field (boolean, but should use a bitmask here to note waste as much storage and allow for future optimisations).

referencedMilestone allows for fast belowMaxDepth calculation at the expense of 8 bytes per tx.

We currently do have the validity field on a transaction. However, this does not carry: it only reflects the atomic bundle (i.e. bundle hash & signatures are valid). (Itā€™s also not used anywhere outside of BundleValidator)

By adding this metadata to a transaction in the local database, we can abort the graph traversal significantly earlier.

Who sets these fields?

Thereā€™s two separate instances here:

  • a tx arrives that is connected to the tangle that we already know (without ā€˜holesā€™ in between)
  • a tx arrives whose connection to the tangle has holes

In the former, we can simply infer the state from trunk/branch.
In the latter, we need to write this information the next time we walk the tx.
The biggest challenge here is that walks are currently triggered by scheduled processes and not driven by the network.

About 2.

This ties into 1., however there are further optimisations possible which warrant quite a bit of thought:
instead of just storing trunk and branch, the transaction/bundle should also store a reference to next trunk&branch that change value (or rather: the tip of the relevant bundle that contains the next value change).

This would optimise against zero-value sidetangles, but be fairly useless against sidetangles that are consistent & move funds around.

We currently have the snaphshot variable in each tx that keeps track of the index of the milestone that approved it.

I donā€™t really understand consistencyState field completely. Suppose I have two subtangles that were never merged. Now I put a tx on the first subtangle that is a double spend according to the second subtangle, what will be the value of consistencyState field for this new transaction and how will it be calculated?

By the way, your idea of storing the next trunk&branch that change value can help with the local snapshot project (out of scope for here, you can discuss it with Hans in private).

Yes. But you donā€™t have a variable that tells you which milestone is the latest a tx references. However you need this for belowMaxDepth.

This state field just tells you whether the state referenced via trunk/branch and this tx is consistent or not.
(i.e. looking into the past not the other way round)

Not really, because this doesnā€™t prove that no transactions were left out.

I am getting (and I think even liking) the referencedMilestone field. It will enable us to use belowMaxDepth :slight_smile:

I am not quite attached to the consistencyState field. I think you will be just make nodes do more calculations that they donā€™t need to be doing. Currently checkConsistency is only called during tip selection or when checkConsistency is called via API.

I donā€™t know if we should change this behavior. If you want to change it, we should discuss this some more.
I also donā€™t think it will help much when you attach 2 subtangles.

The wallet calls this as well quite regularly. At least the old one did, not sure about trinity.

Depends on if you attach this subtangle all at once I guess :thinking: but yes warrants further thought & discussion

@gal.rogozinski @anon45419615 What do you think about validating signatures proactively rather than as a reaction to API calls? If I understand correctly, this is where the process freezes, if all the signatures were already verified the computation would not be so expensive.

This could be done upon receiving a new tx, like PoW verificatiton, or in a separate worker thread.

Yeah, thatā€™s pretty much what I meant by this statement.
It would move IRI from a lazy to a more eager computation with the benefit of spreading out cpu usage more evenlyā€¦

If I understand correctly there are 2 suggestions:

  1. Expand the definition of belowMaxDepth such that lazy transactions are blocked. A lazy transaction is one which does not reference a recent milestone.
  2. Proactively validate signatures to avoid dealing with a CPU burst during tip selection.

I personally vote for 1.

I also want to suggest a small variation: instead of determining laziness by comparing to the current system clock, the condition should look at the difference between the two tips. For example if the first tipā€™s newest referenced milestone #100, the second tipā€™s NRM must be in the #75-#125 interval. This removes the dependency on the system clock.

As far as I see it, the tipā€™s referenceMilestone propagates to an approver Tx upon attaching. The newest refenceMilestone index between branch and trunk is inherited by the new tip. After the walk, when tracking back, checking the consistency of the selected tip, if at any step towards the milestone, the gap between referenceMilestone index is > maxDepth the tip is then considered to be lazy and therefore invalid.

I understand that this is out of scope for the currentā€™s patch discussion but: post COO, how could we evaluate if a stitching transaction joins two tangles that have been branched too far back in the past? In other words: how can we detect a time delta in Tangleā€™s terms? Timestamping transactions is clearly not an option.
An idea would be to recur a limited (dynamic) amount of "Past Setā€™ā€™ transactions (to use an analogous Future Set terminology) until finding a common approvee. This value would dynamically change in function of lambda, and possibly even dependant on the nodeā€™s resources availability.

post COO, how could we evaluate if a stitching transaction joins two tangles that have been branched too far back in the past?

You could use the concept of ā€œvirtual milestonesā€ i proposed in the local snapshot topic.

A virtual milestone is a random transactions that starts (or ends) with a defined number of characters (like 5 9ā€™s) in its tx hash. Since the tx hash is equally distributed in the space of possible hashes we should see a transaction like this every n transactions (depending of how many characters we choose).

Once this transaction becomes ā€œconfirmedā€ (by the tips) it becomes a virtual milestone, that all honest nodes can agree on (even without a coo).

This would not affect any of the consensus protocol but would allows to have regular referencable points in the tangle that can be used as a metric for the age of transactions, the same way milestones are right now for some of the algorithms.

A virtual milestone is a random transactions that starts (or ends) with a defined number of characters (like 5 9ā€™s) in its tx hash. Since the tx hash is equally distributed in the space of possible hashes we should see a transaction like this every n transactions (depending of how many characters we choose).

Cool idea! Only problem I have with it is that FPGA attackers might try to alter the ā€œsense of timeā€ of nodes by solving this virtual milestone PoW.

You could maybe adjust the length of the required character sequence by making it adopt to the tangle activity, by keeping track of how ā€œoftenā€ we see those milestones (aka how many virtual milestone candidates appear in comparison to non virtual milestone ones or how long the paths between those virtual milestones are). TLDR: if too many virtual milestone candidates appear we increase the difficulty)

I donā€™t see how this could possibly work in practice: if you increase the difficulty of the virtual milestone PoW then regular nodes wonā€™t be able to issue virtual milestone anymore due to resource constraints. They will take FPGA milestone clock ticks as a reference of time. The attacker just needs to emit an intermittent / inconsistent hashing power stream to constantly alter nodesā€™ perception of time, and therefore rendering the mechanism void. It will be interesting to simulate how fast a nodeā€™s perception of time could be altered from an attacker perspective though: if nodesā€™ could adjust their perception of time gradually, it would be possible to absorb this fluctuations and deal with side-tangles.

Dynamic MWM is already on the discussion table, but all proposed solutions imply a limited Txs / unit of time to deal with the finite scalability of IOTA due to network bottlenecks of nodes for instance. I fail to see a virtual milestone solution that doesnā€™t constraint the PoW to a maximum unit of time either.

It depends what you want to use those snapshots for, if its used for defining the maxdepth of some algorithms (like where you stop calculating) they could easily just consider a bigger amount of snapshots and adapt instantly or smoothed by a moving average to avoid spikes.

If you want to use them as a time reference i agree that its too easy to attack

I think solving this problem post-Coo is interesting, but letā€™s try to keep the scope (itā€™s a very different problem). The currently proposed solutions to the ā€œeasyā€ Coo case are still incomplete.

After internal conversations with @alon-e and @alon.gal :

We want to implement emergency measures that wonā€™t affect the db schema.
Later we will implement the real solution that will involve a db schema change.
We will go for solution (1). We will simply add a limit on the number of traversed txs and cache all the txs that that were travesed during the timeout while checking for belowMaxDepth.
The cache will be used to quickly recognize the attack next time it occurs.

Cache will be implemented as a static field on WalkValidator for now.

2 Likes

Couldnā€™t this timeout also be caused by a high number of valid signatures that move funds (and thus high time spent doing signature validation)?