You are absolutely right, the coordicide white paper is not talking about any of these additional changes but solely focuses on the consensus parts. In addition, it is already pretty outdated and we will release a new updated version soon.
A lot of the mentioned “additional changes” are not directly related to coordicide but are things that would introduce breaking changes to the network. Since breaking changes are always a bit tricky since everybody needs to upgrade (i.e. exchanges, libs, users, wallets …) we try to minimize the impact by doing only a single big update which then contains all of these changes at the same time. This way we will not have to go through the hassle of forcing everybody to update multiple times.
Now, let’s talk about the “big O notation” to compare some of the algorithmic optimizations. Currently to “validate” a transaction, we have to analyze the whole past cone of every transaction. Since every transaction directly or indirectly references an exponential number of other transactions, this is an extremely heavy operation. We are essentially doing a “binary search” in the past cone of a transaction to check if the used funds exist in this past cone and if they haven’t been used before. This is a huge bottleneck (even with checkpoints that are used to “cache” previous ledger state calculations by using a memory-time trade off). This accordingly translates to O(n) where n is the amount of directly or indirectly referenced transactions.
The new ledger state is able to validate transactions in O(1) so it is a MASSIVE optimization. Benchmarks show that IRI can do around 350 ledger operations per second in a high throughput scenario (the walks to the next checkpoint get longer and longer) while the new ledger state can do around half a million ledger operations on a normal desktop CPU + SSD (independently of the amount of TPS).
Note: Postgres / PGbench are not good benchmarks as they are using a much more complex logic than a K/V store based on LSM trees like badger.
A similar drastic optimization comes in from the new signature scheme. Currently most transactions in IOTA use “level 2 security” which translates to a signature size of around 3kb. The new signature scheme reduces the size of the signature to 64 bytes while still maintaining quantum security. This is a nearly 98% reduction in signature sizes.
I can not really say anything about the “quality” of algorithms and data structures used in Avalanche as their code is not open source, yet but even though we are able to optimize not just the size of messages and the computational- and IO-overhead of the algorithms your observations about the constraints of IoT hardware are still valid.
If you want to be able to allows IoT devices to take part in the network, then the performance numbers gathered using traditional desktop or even server hardware obviously do not translate 1:1 to numbers on these devices.
This is the reason why “sharding / slicing” plays such an important role. I am not sure if the sharding article is publicly accessible, yet as sharding is still considered to be somewhat “confidential” (https://govern.iota.org/t/slicing-sharding-the-iota-tangle/245) but sharding is absolutely essential for IOTA to reach its goal of allowing IoT devices to take part in the network.
I can maybe summarize it with a few words without giving away too much about “how it works”. The tangle will use a form of state sharding where the whole network is pre-sharded and nodes can individually decide how much of the “bigger picture” they want to “see and process” while still preventing double spends or requiring some form of complicated inter-shard communication process. It is totally fine for a hardware constrained node to only be able to process a “relatively small amount” of TPS as it only limits the radius of perception of this particular node.
I am pretty sure that some of the “building blocks” of sharding are public already (i.e. Merkle "Proofs of Inclusion") and we even started implementing it in goshimmer, so maybe you are already able to derive how it works by looking at the code