Old (and dust) tx merkle tree bitmasked snapshot compression

This idea is inspired https://github.com/Wollac/protocol-rfcs/blob/milestone-merkle-validation/text/0012-milestone-merkle-validation/0012-milestone-merkle-validation.md as a potential solution for dust attacks and ‘old transaction’ pruning.

The idea is to generate an ordered list of all unspend transactions at a fixed interval (I). This list contains all unspend transactions between I and I-1. Next we generate a merkle tree from this list that includes the root of the previous interval.

Each transaction/leaf corresponds to a single bit in a bitmask. This requires nodes to only store a merkle root and a leaf length bitmask for each Interval period.(like snapshotting).

Transactions that fall into this category are required to provide extra merkle proof on transacting but can be processed like regular transactions. The extra information is the merkle proof that corresponds to an unspend bitmask, nodes can help providing proof between intervals but the last root->leaf path must be presented. Upon spend each node flips the bit so that any future generated transactions are considered invalid.

Perma nodes could help in recovering the required ‘proof’. Because not the entire tangle needs to be stored, but just the list of hashed transaction content this data can stay quite manageable. Each colored coin could even have it’s own merkle snapshot bitmask.

I believe such a mechanism could allow for single IOTA payments while keeping the ledger state small by moving ‘non-active funds’ into a merkle tree compressed bitmask.

An obvious downside is that spending from old transactions requires larger proofs and that the complete merkle tree path between snapshots intervals needs to be part of synchronisation.

1 Like

This is a very interesting idea. Ill keep it in mind!

Hmm

So this means that

  1. The wallet should keep merkle trees so it can spend old UTXOs
    OR
  2. The wallet should rely on some permanode

This sounds a bit hard though
Once an old UTXO is spent a new merkle root will have to be computed…
Not sure how to go about it…

This is actually not true. the node just removes the flag from the bitmask.

1 Like

Thanks,
Didn’t read carefully enough apparently.

I still don’t understand something. I suppose that each node will built a different merkle tree, so the proof that will satisfy one nodes won’t satisfy the others.

So when you gossip the tx around the network, you will have to supply different proofs to different nodes…

This sounds complicated

It happens to the best of us :slight_smile:

You would have to have a canonical way to build the proof. You can do this by totally ordering the transactions by timestamp, and dividing the timestamps up into epochs (which is something that I think has to be done anyways, but thats a different conversation)

This is similar to utreexo

Good find, seems to be similar indeed.

Using the ideas discussed above, I propose a fairly specific implementation that basically to sweep dust. Ill write in the language of Chrysalis, but its easy to generalize to coordicide.

Merkle tree
First, I will summarize merkle trees would make all this more clear.

Suppose we have an ordered list L=\{x_1,\dots,x_8\} where each x_i is a hash. Note that 8 is a power of 2. We can now create a merkle tree.
Untitled%20Diagram%20(1)
Each vertex in the tree has an associated hash. Let h be the hash function. The entries are computed recursively.

  • y_1=h(x_1x_2)\quad y_2=h(x_3x_4)\quad y_3=h(x_5x_6)\quad y_4=h(x_7,x_8)
  • z_1=h(y_1y_2)\quad z_2=h(y_3y_4)
  • Merkle root =h(z_1z_2)

If I know the merkle root, anyone can prove to me the inclusion of any x_i. For example, to prove x_3 is in the set, I state its location (leaf 3), and then provide the proof x_4, y_1,z_2. I can verify the proof by observing that h(h(y_1h(x_3x_4))z_2) is the merkle root.

Subsets and Merkle trees
Next we want to compress subsets of L into a merkle tree. The picture is just the same, but any x_i not included in the subset is replaces with an empty set.

Consider the subset L'=L\setminus\{x_3,x_4\} i.e. the set L with x_3,x_4 removed. We now have the new merkle tree
Untitled%20Diagram%20(3)
In this tree, most of the values are the same as before. The exceptions are

  • y'_2=\emptyset
  • $z’_1=y_1
  • New merkle Root =h(z_1'z_2)

Essentially, as we combine branches, \emptyset and \emptyset combine to be \emptyset, a hash x and \emptyset combine to be x, and x and y combine to h(xy) as usual. (I think there are other variants, but this seems best for our context.)

Using the merkle root, we can still compute proofs of inclusion in the same way as above.

Note, that through this method, we can create a merkle tree from a set of any size by padding the end with empty sets till we get a power of 2.

Deleting Elements
Consider the first merkle tree for the set L. The proof of inclusions for x_3 and x_4 are respectively x_4, y_1,z_2 and x_3, y_1,z_2. Notice, that from these proofs, we have all the information to deduce the new merkle root for the set L' since y_2'=\emptyset, z_1'=y_1 and the the new merkle root is h(z_1'z_2).

This is always the case! For any subset L'\subset L, and for any subset X\subset L', if we have the proof of inclusion for every element of X, we can deduce the new merkle root for L'\setminus X, without knowing the contents of L' or L.

This is the critical fact which allows us to use merkle proofs in a data base.

The actual proposal
Set a minimum threshold \theta, and any output with that balance (either colored or uncolored) is called dust. In short, if dust is older than a certain time period, it swept from the database into a merkle root. In order to access these funds, a user must present a proof, from which the merkle root can be updated, as outlined above. Transactions without the proof, will be considered invalid.

However, since we deal with validity, everything must be objective, and so we lay this out carefully in detail.

Every 10,000 milestones defines an epoch (this is roughly 1 day). An output is said to be in epoch i if it mutated the ledger in a milestone in epoch i. At the end of each epoch j, a node does the following.

  1. Canonically order all the unspent dust outputs in epoch j-1. (There are many orderings. It doesnt matter which is chosen, however all nodes must agree.)
  2. Creates a merkle tree for those dust outputs, and then computes the merkle root \mu_{j-1}. This is the merkle root for unpsent dust outputs for epoch j-1.
  3. The node now deletes all these epoch j-1 dust outputs (unless its a perma node).
  4. Let X_i be the dust outputs in epoch i with i<j-1 consumed by a transaction in epoch j-1. Compute the new merkle root \mu_i that removes X_i from the epoch i unspent dust outputs, using the steps outlined in the previous section.

Suppose the next milestone will be issued in epoch j. Suppose an unconfirmed transaction consumed a dust output of epoch i with i<j-1. This transaction will be considered valid if and only if it contains a proof that it is contained in the merkle root \mu_i. This means that all dust has at least one epoch to be spent before it is swept away.

For example, suppose we are in epoch 500. All the dust in epoch 500 and 499 is free to be spent. However, if I want to spend dust in epoch 490, I must present a proof with respect to the merkle root \mu_{490} computed at the end of epoch 499.

To spend old dust, a wallet must obtain the proof from a permanode (or a dusty node), or continuously store the proof and ask a node to be updated about the changes to proof while the node is doing Step 4.

All the old dust is compressed to 1 hash per epoch (about 1 hash a day). This means that after 10 years, dust storage is size
(32 \mbox{bytes})(365)(10)=144\mbox{KB}

This proposal can also be done completely with timestamps.

A complicated optimization
In the above proposal, the memory taken by the merkle hashes grows linearly. Alternatively, we can store all the merkle roots into a merkle tree. The merkle tree can be update in Step 4 above. However, we would need to save some extra hashes to be able to compute the next merkle tree. I have some other things to do, so I wont layout the exact algorithm, but this approach would only needs to store at most \log_2 j of the hashes, which is at most 12 hashes over 10 years.

Another variant that only works for chrysalis and not coordicide is to build the merkle root for each epoch out of the mutated milestone merkle roots.

Downsides

  • When the epoch changes, some transactions there were valid become invalid because they need proofs. This could be an attack vector to attack the ctps, but only while the epoch changes. The tip selection could be modified to mitigate this problem.
  • Short term dust attacks are possible. If an attacker can add 1 GB of dust a day, then this approach limits the dust to 2 GB. Local rate control measures can prevent this.
  • If an attacker dusts at 100 tps for 1 day, then each proof from that day requires about 25 hash operations to verify, I think. If my math is wrong, and its many more times this, an attacker can spam long proofs for everyone to verify.
  • How does this approach work with people who need dust?

Just wondering here William, do you still intend to use the bitmask itself? Or are you proposing to to keep changing the merkle tree and that the spender’s proof is an update function on said merkle tree? How would this work if you want to include the previous root in the new epoch ?

Happy this is being picked up!

I would dispense with the bit mask since it is not necessary. The hashes of the UTXO stored in each epoch can be ordered in several different ways. Either the hash itself can dictate its own location (using the cantor walk from the root with 0=go left, 1=go right), or the order they mutated the ledger state (in chrysalis).

The proof of inclusion will change with each spend because you need to record that the output was consumed. I think the method outlined above is the easiest way to do that.

I wouldnt do that, and there is two reasons why. First, its not too expensive to just store the merkle root for each epoch. Its about 1 hash a day, which I think is ok.

Second, if you dont want to store the merkle root for each epoch, attaching the old merkle root as leaf in the next one tree is not a good idea, because then the proofs grow linearly. Suppose I want to prove a UTXO from Epoch 1 when it is Epoch 100 now. Then I need the proof of inclusion into the first epoch. Then I need the proof of inclusion of the 1st epoch in the second epoch. Then the third in the 4th etc. This means that my proof would be as long as the sum of the depth of all the merkle trees of all the epochs. Each epoch I think will have depth between 10 and 30, so this is between 1000-3000 hashes! This is way to long.

Instead, it makes sense to me to build the all the merkle roots into a new merkle tree. If the number of epochs is not a power of 2, you will need to save more than 1 hash, in order to construct the merkle tree including the epoch, but thats ok.

Oh also, its not official were adopting this, but I am advocating it :slight_smile:

Including the previous epoch doesn’t need to be linear and can easily be log(n-epoch) by also including epoch mod 10, 100, 1000, 10000, 100000 etc.(or some other factor) This keeps the proofs from growing endlessly but you maintain a hard hashchain (this is essentially what I do with AION’s timewarping). You can proof(hash based) existence of any (non-value) data this way. By just providing a path of transactions to 1 TX that ever contained dust and then the historical proof becomes log(n-epoch).

If all nodes will keep the epoch chain you will only need to generate your generate your proof just once. Selective permanodes might not even be needed then.