Tip pool cleaning

1. Some theory

Definitions

We define the following quantities:

  • Arrival rate λ: mean number of messages generated per unit of time (e.g., second).
  • Service rate μ: scheduling rate at each node.
  • Propagation time h: time needed for a message to be delivered to all nodes.
  • Number of tips L: mean number of unreferenced messages.

Number of tips

In case the system is stable, that is the mean arrival rate λ is smaller than the service rate μ, we can infer the mean number of tips according to the Tangle white paper: specifically, the mean number of tips L=2λh is directly proportional to the arrival rate λ and to the propagation delay h.

Generalized formula

It must be noted that the above formula can be generalized when a message references k other messages instead of 2, and becomes L=kλh/(k-1). Additionally, it is possible to extend the previous formula when λ>μ. Since our scheduler is work-conservative, the output rate from the scheduler is trivially given by min{λ,μ}: in other words, if λ>μ (congested situation), then the arrival is capped by the scheduling rate and the rest of the spamming traffic is just backlogged in the outbox buffer. The above formula, then, becomes L=k*min{λ,μ}*h/(k-1).

Finally, more sophisticated analysis with respect to the propagation delay h for different classes of traffic (e.g., honest vs spam) can be found in [cullen2019] and in [penzkofer2021].

2. Tip pool size in IOTA DevNet

Currently in the DevNet

In the current implementation, the number of tips in the tip pool is often very large (or exploding) in spamming periods. This is somehow expected as the new data flow moves the removal of malicious traffic from the outbox buffer (previously referred to as blacklisting) to being orphaned in the tip set. However, using Uniform Random Tip Selection, the tip set has no protection against spam as each node approves tips without taking into account how the Tangle is growing. Two mechanisms are important to mitigate this behavior:

  • The AIMD rate setter, because it allows to react to random messages from low-mana nodes which waste valuable bandwidth.
  • The Time Since Confirmation (TSC) condition as it allows to temporarily “forget” parts of the Tangle which seem to be orphaned.

The latter, however, may not be reactive enough, especially if the threshold is set to 2/5 minutes.

Problem

A mechanism to “clean” the tip pool is require to remove messages which will be approved with very low probability or that should not be approved at all. The policy used to choose the “best” messages is matter of debate but it should obey the following three requirements:

  • Be easy to compute.
  • Exclude tips from 0 or low mana nodes in times of congestion.
  • Lead to very low orphanage rate (for “honest” traffic).

In the current version of the protocol, messages are removed from the tip pool after 29 minutes (to match the BelowMaxDepth criterion).

3. Proposal

Non-Uniform Random Tip Selection

Move to a non-uniform random tip selection, which chooses newer tips with higher probability.

However, non-uniform random tip selection is a “dangerous” option because it risks to affect the equilibrium properties (from a game theoretical perspective) of the tip selection algorithm.

Time-based Outbox and Tip Pool Cleaning

Theoretically, an attacker’s queue in the outbox could be full forever if, after the attack, the malicious node behaves honestly. In practice, this delays the traffic of the attacker, increasing the h value and consequently the tip pool size. An option is to drop old (according to the timestamp inside the message) spam messages that are left in the outbox. A spammer is identified by the fact that its reputation-scaled queue length is large compared to the other nodes which should be close to 0 if they behave honestly (except during resynchronization).

Similarly, we can have some reasonably cutoff where we remove a message from the tip pool if it is too old.

Size-based Tip Pool Cleaning

In this proposal, there is a fixed number of messages in the tip pool. Scheduled messages are added to the eligible set in the same way we do it right now. However, only the “best” x out of them can be chosen by the tip selection algorithm. A good value for x can be estimated according to the theory (see Section 1).

We propose to sort messages by timestamp: when the tip pool gets full, the oldest message according to the timestamp are removed. Delay in getting scheduled is probably the best way to incorporate Sybil protection in the tip pool, so for this size-based tip pool cleaning approach, message age is an optimal metric to evaluate how good a tip is. This basically results in a dynamic TSC condition where, depending on how congested the network, the cutoff time to get into the best x tips will change.

Alternatively, it could be proposed to sort tips by TSC but this would probably be too computationally expensive.

Time-based Outbox and Tip Pool Cleaning
I would propose one improvement to this approach:

Dropping messages with large delay should be invoked only when a certain part of the total buffer size is occupied. Maximum delay we can expect for a given total buffer size can be estimated and based on this estimation we could set the delay threshold. This would guarantee that buffer will be emptied and that messages with acceptable delay are scheduled.

I think that simply dropping delayed messages when the buffer is not full (or partly occupied) could block the scheduler for a while as messages in the future cone of the delayed and dropped message wouldn’t be marked as ready until the message is confirmed. In this case the delay could be due to network or other lags and it could be a single message.

When dropping messages from nodes that are actually filling up their queues, we know that we can expect a stream of delayed messages and that is dangerous for the tip pool size.