Ava is no ETH-killer

If you're building a Dapp today, there can be good reasons to look at launching on a platforms besides Ethereum. ETH has high fees, it's time to finality is long, and there's MEV to worry about. But if you're going to build a production Dapp on another platform, please please, don't use Avalanche.

Avalanche has 3 significant shortcomings, any one of which could be enough to doom a platform.

Even in theory, it can't maintain liveness without centralization.
In practice, it may not even guarantee safety.
It doesn't solve the state-growth problem.

Alive, in theory

O(darn)

According to its whitepaper, Avalanche provides the following guarantee about liveness:

P3. Liveness (Strong Form). If f ≤ O( √n), then the Snow protocols terminate with high probability (≥ 1−ε) in O(log n) rounds. (Blogger's Note: the "f" in this statement refers to the number of adversarial nodes.)

In vanilla distributed systems research, this is a perfectly reasonable guarantee to provide. Unfortunately, it's not a good guarantee for a cryptosystem that wants to be meaningfully decentralized. Why not? I'm so glad you asked!

Imagine an Ava network with 25 nodes. According to the whitepaper, that network can tolerate √25 = 5 malicious nodes without experiencing a liveness failure. For the mathematically inclined, that's 20% malicious nodes (or stake, or whatever). That guarantee is significantly worse than the one provided Bitcoin (which keeps liveness up to ~51% malicious nodes), but is sort of in the ballpark of BFT protocols which tolerate 33% byzantine nodes before experiencing a safety or liveness failure. Unfortunately, the story doesn't end here.

As Ava scales up, its liveness guarantee gets proportionally worse. At 100 nodes, an Ava network can only tolerate 10% of its nodes acting maliciously. At its current scale (about 1000 nodes), only 3% can be malicious without jeopardizing liveness. At the "tens of thousands of nodes" scale that its inventors promise on Twitter, an adversary controlling even 1% of stake can bring the network to a halt. And, unlike most proof-of-stake protocols, Ava doesn't have a mechanism to identify and slash the adversaries. Game over.

Non-zero Probability of Success

But, as I can already hear the Ava defenders protesting, the whitepaper provides another liveness guarantee:

P2. Liveness (Upper Bound). Snow protocols terminate with a strictly positive probability within \(t_max\) rounds.

Unfortunately, the statement that an algorithm "terminates with strictly positive probability" is meaningless as a guarantee about a system's behavior. After all, the probability that the computer you're reading this on will spontaneously re-assemble itself into a sculpture of Satoshi Nakamoto is also "strictly positive". Non-zero probability of your Dapp working is not a guarantee you should be comfortable with. I know it seems like I'm just being pedantic here. I'm really not trying to be. This is deeply important.

As the fraction of nodes (or stake, or whatever), controlled by the attacker exceeds that √n upper bound, the performance of Avalanche gets poorer and poorer. Exactly how quickly performance degrades is not clear from the whitepaper - it says that the slowdown is polynomial when f exceeds √n but becomes exponential as f approaches \(\frac{n}{2}\) - but even a polynomial increase in communication is really bad.

Remember, we're talking about thousands of nodes here - if each node has to send and receive a small polynomial number of messages (say, \(n^2\) ), that translates to millions of messages per decision. That won't work. Even if the slowdown is polynomial, a decentralized Ava network is dead in the water.

Danger Close!

As it turns out, there's an even more worrying development on the horizon. Sarah Jamie Lewis of Open Privacy recently published (and formally verified) an attack on Snow-family consensus protocols which purports to break the safety and liveness guarantees of Ava's underlying consensus algorithm.

Through probabilistic modelling (sic) we formally verify an adversarial strategy that forces correct nodes to choose between safety and liveness even when f < O( √n).

If you develop on Avalanche, this should make you really nervous. Avalanche claims to provide strong guarantees of safety and liveness as long as the attacker controls less than √n nodes. If those guarantees can really be broken, even in a contrived setup, then there's something wrong with the whitepaper. Remember, a single error contaminates the entire security proof. If the Ava whitepaper is 99% correct, it's 100% wrong.

I want to be clear: this doesn't necessarily mean that Ava is not secure. It means that we don't know whether Ava is secure or not. That's ok - research takes time. But it's not ok to build a production app on a system that might be fundamentally broken. Not if there's another option.

Tradeoffs: The Good, the Bad, and the Downright Weird

This brings me to my final point. When you're designing a complex system, you're often forced to accept a tradeoff between two desirable properties. Like all blockchain projects, Ava chose a set of tradeoffs that its designers found compelling - it traded away some security guarantees to secure faster consensus. But here's the rub: consensus was never the bottleneck to begin with! The real bottleneck in modern blockchains is state growth.

In Ethereum, each read or write into the database incurs seven or eight random disk accesses (because the database uses a trie structure internally), and each state access incurs seven or eight random database accesses in the course of traversing the state trie.

For those of you who haven't spent time a lot of optimizing computer programs, that's what we in the business call "really bad". Random disk accesses are SLOW. In the time it takes to do a single random read, a CPU might be able perform 10,000 computations. This is why your full node can take a long time to sync even though the CPU is mostly idle. Sync times are typically dominated by disk accesses.

But remember how I said that each database access takes seven or eight writes, and each state access takes seven or eight database accesses? It turns out that both of those numbers grow with the size of a blockchain's state. Specifically, they're each about log(s) where S is the state size. So the time it takes to process a transaction grows with \(log^2(s)\). The next time Ethereum state size doubles, the time it takes to process a given transaction will increase by roughly 30%. This is why Ethereum keeps the block gas limit low - it needs to limit both block processing times and state growth.

As long everyone can process every candidate transaction, coming to consensus is relatively easy. There are already dozens of consensus protocols which offer fast finality. But if state size grows too quickly, processing transactions on commodity hardware becomes impossible. It's processing blocks that's the bottleneck, not coming to consensus. (And in case you were wondering, the Avalanche-go client relies on Ethereum's Geth - so all of these limitations really do apply to Ava.)

So, to recap, Ava trades safety and/or liveness guarantees for speed of consensus, but consensus was never the bottleneck. That's like taking the engine out of a motorcycle to give it more cargo space. It's not illegal, just... not a good tradeoff.

Conclusion: Building on Ava Considered Harmful

Not everyone wants to build on Ethereum. That's good and healthy. We live in a multi-chain world. But please, for the love of all that is holy, stop treating Avalanche like a production system. Ava is a cool distributed systems research project, but it's not a good place to build mission critical applications. Its safety and liveness guarantees are much weaker than those of other projects, and it might have a fundamental consensus flaw. Besides, even if it works, it doesn't solve the scaling problem.

Author's Note: Did I get some things wrong? Almost certainly. If you find one, please reach out on Twitter. I'm happy to issue corrections or retractions as necessary.

P.S. - To the Folks at Ava Labs

Like all blockchains, Ava has its share of evangelists - and that's ok. But please be careful making claims like this on Twitter. Ava is not live with 49% honest nodes unless you define "live" as "having non-zero - but arbitrarily small - probability of advancing". At best, you're just going to confuse a lot of newcomers. Ava isn't another IOTA, but this kind of rhetoric is how you would turn it into one.