Solana released its report on the outage from this week. Full details in the tweet link, but the tl;dr is:
- Bug discovered in devnet deployment, fix was waiting to be deployed to mainnet
- Bug related to changing the JIT cache from “ExecutorsCache” to a new implementation “LoadedPrograms”
- The deploy-evict-request cycle of a legacy loader program triggered an infinite recompile loop in the JIT cache.
tldr; there was a bug in v1.17 that lead to an infinite loop when processing a block
a leader on v1.16 (which didnt have the bug) processed a block with the v1.17 bug and propagated it to the rest of the cluster (where 95% was running v1.17) which lead to no voting and a stall https://t.co/lDUEAoMR7l
— x19 (@0xNineteen) February 9, 2024