Hi Hans!
I’m fully aware that you are already oversubscribed with your other projects, so I very much appreciate you taking the time to reply. I appreciate your answer, it definitely helped and provided some interesting insight. As you probably expected though, I do have a few comments
Let me get a few obvious things out of the way first - the Coordicide WP does (to the best of my knowledge) not really mention any of the other ‘fundamental changes’ you mentioned (e.g. UTXO, Signature schemes, reusable addresses). Similarly, the inner workings and performance characteristics of the coordinator are not well published (again, to the best of my knowledge). It is therefore a bit hard to reason about the effects of individual changes. The best I can go with is a “big O” intuition, which I still believe generally holds independent of the optimizations.
Let me state a few assumptions/observations.
(1.) IoT devices in scope have an approximate capacity of a Raspberry Pi 3 / 4.
The power usage of a Rpi 4 under load is about 5-10 Watts, which is a reasonable upper bound of what an end-user might be willing to spend; and is a reasonable upper bound of what could still be considered “green”. As more compute power requires more electrical power, assuming Raspberry-Pi levels of compute seems like a good assumption. Note: In the rest of this text, I’m using the RPi (4) as an example of ‘what 5-10 Watts of compute buys you’ - independent of whether this is 100% CPU on a RPi4, or 10% on a Core i7.
(2.) The majority of IOTA full-nodes are will run on IoT devices.
This seems pretty intuitive given IOTAs focus on IoT and the marketing material (“smart city mesh networks”). Mesh networks would not be reasonable when light nodes are the majority of the network (and there would be no advantages in using a DLT anymore - esp. for data transactions, AWS IoT would be cheaper and faster). I found little explicit statements on this (“Low-power devices are able to issue transactions and take part in the consensus.” from IOTA 2.0 DEVNET - Nectar Release being the closest), so I it might make sense to make this an assumption.
(3.) The maximum throuhput of the network is limited by the the speed of some absolute majority (or even supermajority) of nodes.
This is due to the need to build consensus, and the visibility requirements described in my initial post.
So, assuming that the assumption above are mostly accurate (please correct me if/where I’m wrong), let me attempt to explain why I seriously doubt the projected post-coordicide TPS numbers.
(a.) Using Avalanche as the baseline
I know, Avalanche is not IOTA, but there are significant similarities in some of the structure and they have somewhat detailed benchmarks, making it a decent starting point.
According to their benchmarks, Avalanche achieves about 3500 TPS in their Geo-replicated scenario (without attackers). A Raspberry Pi is significantly slower than the c5.large AWS instance they use. Assuming a factor of 10 (see math below, it’s likely worse), we’re already at 350 TPS. Now, Avalanche is conceptually significantly simpler than IOTA - maintaining mana, Autopeering incl. hash-chains, DRNG for FPC need to be accounted for. I think it is safe to assume that this will cost at least another 50%, for ~175 TPS. Now this still doesn’t account for misbehaving nodes (on purpose or accidentally), high-mana node overloads, timeouts and crashes etc., and the fact that for real-world applications you need a sync persistence layer etc., and now you’re likely at 100 or below.
(b.) Using existing software & benchmarks on IoT devices as baseline.
This is a weak/fuzzy point, but
-
The tangle will need some kind of persistance layer for UTXO / transactions. Postgres / PGbench R/W will get you something like 400 Read/Write TPS on a Raspberry Pi: Pi4 ARM 64 Vs 32 Benchmarks - OpenBenchmarking.org.
-
On a Raspberry Pi 4, I get about 40 TLS (HTTPS) connections a second. The official nginx blog gets about 450 per CPU on a server CPU, which mirrors the 10x observations above (www dot nginx dot com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/ - sorry, link-limit for new users).
I think it is a safe assumption that the persistance layer performance will look similar to the Postgres TPS, and the load introduced by network/CPU/crypto and maintaining the data structures load will at least in the same order of magnitude as a TLS connection setup; and that the upper bound(!) for tangle TPS is therefore somewhere between 40 and 400.
(c.) Bandwidth constraints
As others have pointed out, a 100MBit line can carry about 7500 Transactions at current size. But that’s maxing out 100MBit, and doesn’t account for gossiping - for fast transaction propagation, you need to send each TX to plenty neighbors (asking “do you have this?” first will create significant transaction overhead, too - it is likely to be more efficient to over-broadcast).
So let’s assume you gossip each transaction to 8 neighbors, then you get ~1k TPS, on a 100MBit uplink. If we assume 10MBit upload we get 100 TPS, if we assume that we want to have half of that bandwidth for Netflix etc., we get 50 TPS. If you add the auto-peering (list of nodes, reputation, salt/hash chains) and mana overhead, and you’ll end up at significantly less than that.
You can argue “not every node needs to broadcast that much”, but you still need at least the same order of magnitude in fan-out for the ‘average’ node. Also, 10MBit up-link is not even common - My “”“high-speed”"" hotel-room internet in Sydney downtown gives me 7MBit down/1MBit upload, my apartment here 2 years ago gave me < 12mbit/500kbit. I lived in Manhattan last year with 12/2 MBit.
(Also, some connections are metered - if your contract is for 200GB/month, at 1000 TPS you’d fill that up in a day).
(d.) Using the current system performance as a comparison
Now, I’ve stated before that I can’t know performance characteristics of the coordinator, so this is more of a question to you than an assessment on my part. You mention improvements in the signature scheme and reusable addresses, but neither of these sounds like they could give (orders-of-magnitude) performance boosts. I am not aware of how the account-based model works in detail, so I can’t say much about that
(i) I would however be surprised if the savings through UTXO and the algorithmic improvements you mentioned, do not get completely eaten up by the complexity overhead of FPC/Autopeering/Mana/etc.
(ii) I am actually also surprised by the “250” number given for IRI, as the network currently does not seem to exceed 15, even under pressure/spam? Is there another bottleneck somewhere?
Note that even if assumptions (1.) and (2.) were not true, points (c.) and (d.) still hold. Also, if you got this far, thanks for reading!
I hope this helps explaining the doubts that I have. Or maybe I missed something? In any case, if you have the time (and motivation ;)) to reply I am very much looking forward to it, but I am aware you are quite busy at the moment, so please don’t feel pressured.
In any case, thanks again!
Cheers
Math for Raspberry Pi vs c5.large
Math: Avalanche uses c5.large, which is a Xeon Skylake at 3.6Ghz dual-core, conservative (hopefully) equivalent on Phoronix is Xeon E3 1220 v5 divided by 2 (because quad → dual core).
Highlight: x264 encoding is ~20x faster on xeon (200/2 vs 5) and 17x faster on i5 (175/2 vs 5)
vs i5: Lame mp3: 125 secs vs 10 secs (Pi 3: 600 seconds). Flac: 100 vs 5.
It should be noted that I’ve assumed a Raspberry Pi 4 as the “standard IoT device”, but a RPi4 is already on the high end of ARM SBCs, and a RPi3 for example is already 3-5 times slower.
Raspberry Pi benchmarks: www dot phoronix dot com/scan.php?page=article&item=raspberry-pi4-benchmarks (the link limit strikes again)
Xeon Skylake benchmarks: www dot phoronix dot com/scan.php?page=article&item=intel-skylake-xeon9
Skylake i5: www dot phoronix dot com/scan.php?page=article&item=intel-6600k-linux&num=4