Using the ideas discussed above, I propose a fairly specific implementation that basically to sweep dust. Ill write in the language of Chrysalis, but its easy to generalize to coordicide.
Merkle tree
First, I will summarize merkle trees would make all this more clear.
Suppose we have an ordered list L=\{x_1,\dots,x_8\} where each x_i is a hash. Note that 8 is a power of 2. We can now create a merkle tree.
Each vertex in the tree has an associated hash. Let h be the hash function. The entries are computed recursively.
- y_1=h(x_1x_2)\quad y_2=h(x_3x_4)\quad y_3=h(x_5x_6)\quad y_4=h(x_7,x_8)
- z_1=h(y_1y_2)\quad z_2=h(y_3y_4)
- Merkle root =h(z_1z_2)
If I know the merkle root, anyone can prove to me the inclusion of any x_i. For example, to prove x_3 is in the set, I state its location (leaf 3), and then provide the proof x_4, y_1,z_2. I can verify the proof by observing that h(h(y_1h(x_3x_4))z_2) is the merkle root.
Subsets and Merkle trees
Next we want to compress subsets of L into a merkle tree. The picture is just the same, but any x_i not included in the subset is replaces with an empty set.
Consider the subset L'=L\setminus\{x_3,x_4\} i.e. the set L with x_3,x_4 removed. We now have the new merkle tree
In this tree, most of the values are the same as before. The exceptions are
- y'_2=\emptyset
- $z’_1=y_1
- New merkle Root =h(z_1'z_2)
Essentially, as we combine branches, \emptyset and \emptyset combine to be \emptyset, a hash x and \emptyset combine to be x, and x and y combine to h(xy) as usual. (I think there are other variants, but this seems best for our context.)
Using the merkle root, we can still compute proofs of inclusion in the same way as above.
Note, that through this method, we can create a merkle tree from a set of any size by padding the end with empty sets till we get a power of 2.
Deleting Elements
Consider the first merkle tree for the set L. The proof of inclusions for x_3 and x_4 are respectively x_4, y_1,z_2 and x_3, y_1,z_2. Notice, that from these proofs, we have all the information to deduce the new merkle root for the set L' since y_2'=\emptyset, z_1'=y_1 and the the new merkle root is h(z_1'z_2).
This is always the case! For any subset L'\subset L, and for any subset X\subset L', if we have the proof of inclusion for every element of X, we can deduce the new merkle root for L'\setminus X, without knowing the contents of L' or L.
This is the critical fact which allows us to use merkle proofs in a data base.
The actual proposal
Set a minimum threshold \theta, and any output with that balance (either colored or uncolored) is called dust. In short, if dust is older than a certain time period, it swept from the database into a merkle root. In order to access these funds, a user must present a proof, from which the merkle root can be updated, as outlined above. Transactions without the proof, will be considered invalid.
However, since we deal with validity, everything must be objective, and so we lay this out carefully in detail.
Every 10,000 milestones defines an epoch (this is roughly 1 day). An output is said to be in epoch i if it mutated the ledger in a milestone in epoch i. At the end of each epoch j, a node does the following.
- Canonically order all the unspent dust outputs in epoch j-1. (There are many orderings. It doesnt matter which is chosen, however all nodes must agree.)
- Creates a merkle tree for those dust outputs, and then computes the merkle root \mu_{j-1}. This is the merkle root for unpsent dust outputs for epoch j-1.
- The node now deletes all these epoch j-1 dust outputs (unless its a perma node).
- Let X_i be the dust outputs in epoch i with i<j-1 consumed by a transaction in epoch j-1. Compute the new merkle root \mu_i that removes X_i from the epoch i unspent dust outputs, using the steps outlined in the previous section.
Suppose the next milestone will be issued in epoch j. Suppose an unconfirmed transaction consumed a dust output of epoch i with i<j-1. This transaction will be considered valid if and only if it contains a proof that it is contained in the merkle root \mu_i. This means that all dust has at least one epoch to be spent before it is swept away.
For example, suppose we are in epoch 500. All the dust in epoch 500 and 499 is free to be spent. However, if I want to spend dust in epoch 490, I must present a proof with respect to the merkle root \mu_{490} computed at the end of epoch 499.
To spend old dust, a wallet must obtain the proof from a permanode (or a dusty node), or continuously store the proof and ask a node to be updated about the changes to proof while the node is doing Step 4.
All the old dust is compressed to 1 hash per epoch (about 1 hash a day). This means that after 10 years, dust storage is size
(32 \mbox{bytes})(365)(10)=144\mbox{KB}
This proposal can also be done completely with timestamps.
A complicated optimization
In the above proposal, the memory taken by the merkle hashes grows linearly. Alternatively, we can store all the merkle roots into a merkle tree. The merkle tree can be update in Step 4 above. However, we would need to save some extra hashes to be able to compute the next merkle tree. I have some other things to do, so I wont layout the exact algorithm, but this approach would only needs to store at most \log_2 j of the hashes, which is at most 12 hashes over 10 years.
Another variant that only works for chrysalis and not coordicide is to build the merkle root for each epoch out of the mutated milestone merkle roots.
Downsides
- When the epoch changes, some transactions there were valid become invalid because they need proofs. This could be an attack vector to attack the ctps, but only while the epoch changes. The tip selection could be modified to mitigate this problem.
- Short term dust attacks are possible. If an attacker can add 1 GB of dust a day, then this approach limits the dust to 2 GB. Local rate control measures can prevent this.
- If an attacker dusts at 100 tps for 1 day, then each proof from that day requires about 25 hash operations to verify, I think. If my math is wrong, and its many more times this, an attacker can spam long proofs for everyone to verify.
- How does this approach work with people who need dust?