A monolithic chain where everyone executes every transaction is inherently not scalable. Indeed, this is why almost every major ecosystem is building for a multi-chain world.
As we laid out in our previous post, ecosystems differ in how they envision a multi-chain world. The two approaches that attract the most activity today are those of Ethereum and Cosmos.
In a nutshell, Ethereum envisions a Rollup centric future. Rollups tend to be expensive and less flexible than L1s, but they can share security with each other. In contrast, Cosmos is an ecosystem of interoperable sovereign L1s known as Zones. While Zones can be cheaper and more flexible than Rollups, they can’t share full security with each other.
Celestia combines the best of these two worlds. As a wise anon once said, “Celestia’s vision is a marriage of Cosmos’ sovereign interoperable zones and a rollup-centric Ethereum with shared security.”
If the above chart doesn’t fully make sense to you don’t worry. We will unpack all of it in this post as we dive deep into Celestia’s paradigm-shifting modular blockchain design. We will spare the first half of the post to answer the “hows” and the second half to address the “whys” of Celestia. If you are familiar with how Celestia works we recommend that you skip to the second half of this post where we list its 8 unique properties. You may be surprised to find out that Celestia has deeper, stronger implications than it seems to have on the surface.
To understand the “hows” of Celestia we must first define its problem statement. Celestia was born in search of an answer to the following question; “what’s the least a blockchain can do to offer shared security for other blockchains (ie. rollups)?”
Typically, consensus and validity are referred to as one and the same. However, it is quite possible to regard these notions as separate; validity rules determine which transactions are considered to be valid whereas consensus allows nodes to agree on the order of transactions that are valid.
Just like any L1 blockchain, Celestia implements a consensus protocol (Tendermint) to order transactions. However, unlike typical blockchains, Celestia doesn’t reason about the validity of these transactions nor is it responsible for executing them. Celestia treats all transactions equally; if a transaction is paying necessary fees, it accepts, orders, and replicates it.
All of the transaction validity rules are enforced on the client-side by rollup nodes. Rollup nodes monitor Celestia to identify and download transactions that belong to them. They then execute them to compute their state (for example to determine everyone’s account balances). If there are any transactions that rollup nodes consider to be invalid they simply ignore them.
As you can see, as long as Celestia’s history remains unchanged, rollup nodes that run software with the same validity rules can compute the same state.
This brings us to an important outcome. Rollups don’t need another chain to perform any execution to share security. Instead, all they need is to agree on a shared history of ordered transactions.
While decoupling execution from consensus sets the foundation for Celestia’s unique abilities, the scalability level that Celestia achieves can’t be explained by decoupled execution alone.
The obvious advantage of decoupling execution is that instead of everyone executing all transactions by default, nodes have the freedom to execute transactions related to their application(s) of interest. For example, nodes of a gaming app (app-specific rollup) don’t have to take interest in executing txs of a DeFi app.
That said, the scalability benefits of decoupling execution are nonetheless limited because they come at the expense of composability.
Let’s imagine a case where two apps wanted to exchange some tokens with each other. In that case, the state of each app would be dependent on each other; to compute the state of one app, a node would have to execute txs related to both.
Indeed, the number of transactions to be executed would have to increase with each new app joining these interactions. In the extreme, if all apps wanted to interact with each other we’d be back to square one of a monolithic chain where everyone downloads and executes every tx.
So how does Celestia achieve unmatched scalability and what does decoupling execution from consensus have to do with it?
Scalability is often described as increasing the number of transactions without increasing the cost to verify the chain. To see where the scalability bottleneck lies we briefly revisit how blockchains get verified.
In a typical blockchain, consensus nodes (validators, miners, etc.) produce blocks which are then distributed to the rest of the network consisting of full and light nodes.
Full nodes, with high available resources, fully verify the contents of received blocks by downloading & executing all transactions in it. In contrast, given their limited resources, light nodes (99% of users), can’t verify the content of these blocks but only keep track of the block headers (summary of block data). Light nodes, therefore, work under a much lower security guarantee than full nodes; one where they always assume the consensus is honest.
Notice that full nodes don’t make this assumption. Contrary to popular belief, a malicious consensus can never trick full nodes into accepting an invalid block because they will notice the invalid tx (e.g. a tx for double-spend or an invalid mint) and stop following the chain.
The most notorious scalability bottleneck in the blockchain space is known as the state bloat. As more transactions occur, the state of the blockchain (information necessary to execute txs) grows and it becomes more costly to run a full node. This results in an undesired situation where the number of full nodes starts to decrease and light nodes start to increase, centralizing the network around consensus nodes.
Since most chains value decentralization, they want their full nodes to be run on consumer hardware. This is why they limit the rate at which their state grows by enforcing a block/gas size limit.
The invention of fraud/validity proofs effectively removes this bottleneck. These are succinct proofs that light nodes can efficiently execute to verify that contents of the blocks are valid w/o having to execute transactions in them. The solution derives its strength from the fact that any single node with the full state of the chain can generate these proofs. This is extremely powerful because it means that light nodes can operate under nearly the same security guarantees as full nodes while consuming orders of magnitude fewer resources.
Below is a simplified fraud-proof example. In a fraud-proof, full nodes provide light nodes with just enough data for them to autonomously identify an invalid tx. The first step of this proof requires full nodes to show light nodes that a particular piece of data (for example, the tx claimed to be invalid) belongs to the block body.
This is rather straightforward because Merkle Trees can be used to do just that. By using a Merkle Tree, full nodes can efficiently prove to light nodes that a particular transaction is included in the block w/o requiring them to download the whole block.
However, while proving the inclusion of a tx is trivial, proving the absence of a tx is not. This is problematic because as we will see in the next section, proving the absence of a tx is equally important as proving the inclusion of a tx for fraud/validity proofs to effectively work.
For full nodes to generate fraud/validity proofs in the first place, they must be able to compute the state — account balances, contract codes, etc.
This requires full nodes to download and execute *all* transactions. But what if a malicious consensus releases block headers yet withholds some tx(s) in the block body?
Under such an attack scenario, full nodes will easily notice that data in the body is missing and therefore reject following the chain. However, light nodes who only download headers will continue to follow it as they won’t notice any difference.
Note this problem applies to both fraud and validity proof-based solutions because, without access to full data, honest full nodes can’t generate fraud/validity proofs. In case of a data withholding attack
In either case, light nodes won’t notice a problem and will unintentionally fork away from full nodes.
The data availability problem is a very subtle problem by nature because the only way to prove the absence of a tx is to download all txs, which is exactly what light nodes want to avoid doing due to their resource restrictions.
Now that we have identified the problem, let’s see how Celestia addresses it. Earlier when we distinguished between validity and consensus we mentioned that Celestia doesn’t care about the validity of transactions. What Celestia does care about, however, is whether block producers have fully published the data behind the header or not.
What makes Celestia extremely scalable is that this availability rule can be autonomously enforced by resource-limited light nodes. This is done through a novel process known as data availability sampling.
DAS relies on a long-existing data protection technique known as erasure coding. While the way Celestia implements erasure coding is beyond the scope of this report, it’s important to know the fundamentals in play.
Applying erasure coding to a piece of data extends it in a way where the original data can be recovered from a fixed fraction of the extended data. For example, a piece of data can be erasure coded to double in size and can be fully recovered from *any* 50% of the extended data. By erasure coding blocks in a specific way, Celestia enables a resource-limited light node to randomly sample a number of fixed small-sized chunks of data from the block and have high probabilistic guarantees that all the other chunks have been made available to the network. This probabilistic guarantee owes its assurances to the number of nodes participating in the sampling process.
One way to think about DAS is as a game where a malicious block producer tries to hide data in the block w/o getting noticed by light nodes. Block producers publish the header. Based on the data root committed in the header, each light node starts to request random chunks from the block (along with their corresponding Merkle proofs attesting to the inclusion of data in the block).
The game results in two outcomes:
1. Data has been made available -> The malicious block producer releases chunks from the block as light nodes request. The released chunks propagate through the network. While each sampling light node only samples a small number of chunks, given that they collectively sample chunks exceeding 25% of the erasure-coded block, any honest full node in the network will be able to recover the original block from the broadcast chunks. With the full block now available to the network, all light nodes will eventually see that their sampling test succeeds and be convinced that the full data behind the header has indeed been made available to full nodes.
By autonomously verifying that data has been made available, light nodes can now fully rely on fraud/validity proofs because they know that any single honest full node can generate these proofs for them.
2. Data has been withheld -> Malicious block producer doesn’t release requested chunks. Light nodes notice that their sampling test fails.
Notice that this is no longer a serious threat to security because malicious consensus can no longer trick light nodes into accepting a chain that full nodes have rejected. Thus a block with missing data will appear as a liveness failure for full and data sampling light nodes. In such a case, the chain can be safely recovered through the ultimate security mechanism of all blockchains — social consensus.
To sum up, in either case, full & data sampling light nodes will end up following the same chain and therefore be operating under practically the same security guarantees.
A key property of DAS is that the more data is collectively sampled, the same probabilistic availability guarantees can be provided for larger sums of data. In the context of Celestia, this means blocks can be safely made bigger (i.e. support a higher tps) with more nodes participating in the sampling process.
There is, however, a tradeoff inherent in DAS. For technical reasons which we will not cover here, the block headers of data sampling light nodes grow in proportion to the square root of the block size. As such, light nodes looking to have almost the same security as a full node will experience O(√n) bandwidth costs where n is the block size.
With regards to scalability, there are two dominant factors in play;
either of which can pose a limit on Celestia’s DA throughput.
Below we share the current estimates as studied by the Celestia team considering the first factor in play.
Importantly, the block size can go much higher than what’s shown here because DAS can be performed by a large audience with limited resources. Even smartphones can participate in the sampling process and contribute to the security and throughput of Celestia. In fact, here is an example of a smartphone contributing to the security of Celestia!.
Realistically we expect the number of sampling nodes to be fairly correlated with user demand. This is very exciting because it defines Celestia’s blockspace supply as a function of demand. This means, unlike monolithic chains, Celestia can offer low stable fees as the user demand grows.
Now let’s zoom in on the second factor; the size of the light node block header which grows in proportion to sqrt of the block size. While this may appear as a limiting factor the increased resource requirements are likely to be offset by improvements in network bandwidth over time.
Notice also that DAS offers a multiplicative effect on bandwidth improvements. If the bandwidth capacity of the average light node grows by X, Celestia’s DA throughput can safely grow by X^2!
Finally, unlike Moore’s Law of Computation which is estimated to end sometime in the 2020s, Nielsen’s Law of Internet Bandwidth seems likely to continue to be true for the next several decades. Thus by keeping computation fully off-chain, Celestia can make full use of the exponential improvements in network bandwidth.
When all is taken into account, Celestia can be expected to practically support any potential user demand for the foreseeable future while keeping the $$ cost of verification fairly constant. By letting go of execution and introducing DAS, Celestia can mimic scalability properties of the most scalable decentralized protocol ever known to the Internet – BitTorrent.
Now that we’ve covered how Celestia works, let’s examine the benefits of modular blockchains. Reimagining blockchains as a modular stack has implications beyond pure DA scalability. Below we present 8 unique design properties of the modular Celestia stack that may not be immediately obvious.
Rollups today operate as baby chains to Ethereum. This is because they post their headers on Ethereum and their fraud/validity proofs get executed on-chain. Their canonical state is therefore dictated by a series of smart contracts on Ethereum.
This is important to realize because it means rollups must by default have an on-chain governance mechanism. However, on-chain governance includes risks such as lower voter participation, vote-buying, centralization, etc. Due to these complexities, on-chain governance hasn’t been adopted as a preferred governance method for most blockchains.
Rollups on Celestia operate quite differently. As we saw earlier, Celestia doesn’t make any sense of the data it stores and leaves all the interpretation to rollup nodes. The canonical state of a rollup on Celestia is therefore independently determined by nodes choosing to run a particular client software. Indeed this is exactly how L1 blockchains are commonly operated today.
Therefore, rollups on Celestia are essentially self-sovereign blockchains. Nodes are free to hard/soft fork by upgrading their software and choosing to make sense of the underlying data in a different way. For example, if a rollup community is contentiously debating over a change of block size or token supply, opposing camps can update their software to follow different validity rules. You’ll notice that this feature is even more exciting than it seems, as we reflect on its deeper implications.
In the L1 blockchain space, contentious hard forks are often regarded to be very risky because forking chains end up diluting their security. Hence, forks are often avoided at all costs, stifling experimentation.
For the first time in blockchain history, Celestia brings the ability for chains to fork w/o any meaningful worries on security dilution. This is because all forks will end up using the same DA layer without relinquishing the security benefits of Celestia’s consensus layer. Imagine how much smoother the Bitcoin block size debates, or the Ethereum DAO fork could have been resolved had blockchains operated like this from the very beginning.
We expect this to accelerate experimentation and innovation in the blockchain space to a level beyond what can be imagined on today’s infrastructure. The below visualization is taken from a thread that perfectly illustrates this point.
Another force that will particularly accelerate the pace of innovation in VM space is Celestia’s execution-agnostic nature.
Unlike Ethereum rollups, rollups on Celestia don’t necessarily have to be designed for fraud/validity proofs interpretable by the EVM. This opens up the VM design space on Celestia to a far greater developer community and exposes it to high competition.
Today, with the likes of Starkware, LLVM, MoveVM, CosmWasm, FuelVM we already witness an emergence of alternative VMs gaining traction. Custom VMs can innovate across all aspects of execution; supported operations, database structures, transaction formats, software languages, etc. to achieve optimal performances while addressing specific use cases.
While Celestia doesn’t directly scale execution per se, we expect its execution-agnostic nature to lay down the foundation for a highly competitive VM market in search of highly functional, scalable execution.
If there is one trend that hasn’t changed in crypto over years it’s how much easier blockchain deployment is getting.
In the early days, a decentralized network couldn’t be bootstrapped without PoW hardware; a bottleneck that was eventually removed by the introduction of PoS. Along with PoS, maturing developer tools like Cosmos SDK have made it much easier to ship new blockchains. Despite advancements, however, the overhead of bootstrapping a PoS consensus still remains to be far from ideal. Developers have to source a new validator set, make sure they have a widely distributed token and deal with complexities of consensus, etc.
While Polkadot parachains and Ethereum rollups remove this bottleneck, the former remains expensive to deploy while the latter remains expensive to operate.
Celestia appears as the next evolution of this trend. Celestia team is implementing a specification for ORUs using Cosmos SDK known as Optimint. This tooling, along with others, addresses a future need where any chain can be deployed w/o devs having to worry about the overhead of consensus or expensive deployment/operation fees. New chains can be deployed in a matter of seconds and have users securely interacting with them from day one.
Ethereum plans to unroll its sharding plan in stages over the next few years. According to this, it will have data-only shards which rollups can only use to post data. This will naturally result in cheaper rollup fees as the data capacity of the base layer will be increased. However, this doesn’t mean Ethereum lets go of its stateful execution environment on L1.
Ethereum has an enshrined execution. To run a fully verifying rollup node on Ethereum one also has to take interest in executing Ethereum’s L1 state. However, Ethereum already has a gigantic state, and execution over this state is by no means a cheap task. This gigantic state imposes an ever-growing technical debt for rollups.
To make matters worse, the same unit (ie. L1 gas) used to throttle L1’s state size is also used to meter historical data of rollups. As such, anytime there is a spike of activity on L1, all rollups’ fees rise with it.
In Celestia’s modular blockchain stack, active state growth and historical data are treated completely separately as they should be. Celestia’s blockspace only stores historical rollup data which is measured and paid in bytes, and all state execution is metered by rollups in their own independent units. Because activities are subject to different fee markets, a spike of activity in one execution environment can’t deteriorate user experience in another.
One way to make sense of the whole L1 vs L2 debate is to view them all as some chains & bridges.
Broadly speaking, bridges come in two forms; trusted and trust minimized. Trusted bridges rely on the consensus of the counterparty chain whereas trust minimized bridges can be secured by any single full node.
In order for chains to form trust minimized bridges they need 2 things; (i) the same DA guarantee (ii) a way to interpret each other’s fraud/validity proofs.
Because L1s don’t satisfy the former condition of shared DA, they can’t form trust minimized bridges with each other. The best they can do is to rely on each other’s consensus to communicate which necessarily means reduced security.
On the other hand, rollups communicate with Ethereum in a trust minimized way. Ethereum has access to rollup’s data and executes their fraud/validity proofs on-chain. This is why rollups can have trust-minimized bridges to Ethereum that can be secured by any single rollup node.
Chains with trust-minimized bridges can be regarded as clusters. Celestia lays the foundation for chains to form clusters with each other. However, it doesn’t force them to. Chains on top of Celestia are free to be standalone or have trusted and trust minimized bridges with each other in a broad bridging design space.
Contrary to common belief, fraud and validity proofs don’t have to be executed on-chain to take effect. They can also be distributed over the p2p layer (as shown above under the Cosmos Cluster) and be executed on the client-side.
Blockchain governance is slow. Improvement proposals often take years of social coordination before getting implemented. While this is desired for security, it significantly slows down the pace of active development in the blockchain space.
Modular blockchains offer a superior way for blockchain governance where execution layers can independently act fast and break things while the consensus layer can remain resilient and robust.
If you look at the history of EIPs, you’ll notice that a significant portion of proposals is related to execution functionality and performance. They often involve things like pricing of operations, adding new opcodes, defining token standards, etc.
In a modular blockchain stack, these discussions will only involve participants of the respective execution layer(s) and will not trickle down to the consensus layer. This in turn implies that there will be a lot fewer problems to be solved at the bottom of the stack where progress is necessarily slow due to the high bar for social coordination.
It’s quite common for decentralization to have different meanings for different teams.
Many projects value highly decentralized block production and mimic PoW’s ability to have a decentralized block production in a PoS setting. Algorand’s random leader election, Avalanche’s sub-sampled voting, and Ethereum’s consensus sharding are among notable examples in this regard. These design choices assume a low resource requirement on block producers to achieve a highly decentralized block production.
While these are valuable technologies, it’s hard to argue whether, in reality, they bring any meaningful decentralization than what’s otherwise possible.
This is because block production has a tendency to become centralized due to economies of scale external to the protocol with factors like resource pooling, and cross-chain MEV being important catalyzers. Empirically, despite the tech, stake/hash ends up following a Pareto distribution.
Aside from these, there is a more important point that often gets missed on this topic. The most important factor for decentralization is block verification and not production.
As long as actions of a smaller group of consensus nodes can be audited by a very large number of participants, blockchains will continue to operate as the trust machines that we love them for.
This has been the core thesis of Vitalik’s recent endgame article where Vitalik states “So what’s the result? Block production is centralized, block validation is trustless and highly decentralized, and censorship is still prevented.”
Similarly, while Celestia assumes high resource requirements for block producers it assumes low resource requirements for verifiers and thereby achieves a highly decentralized, censorship-resistant network.
Clearly identifying scalability bottlenecks of blockchains has helped the Celestia team to make the simplest design choices.
While Ethereum has DAS implementation at the end of their sharding roadmap, Celestia prioritizes it and explicitly chooses not to go down the overly complicated consensus sharding route.
Similarly, instead of implementing a new fancy consensus protocol, Celestia chose to go with plain old Tendermint with mature tooling and a wide developer/validator support.
We think these design choices will make Celestia stand out over time and be more appreciated when Celestia hits the market at a time when rollups are increasingly lookout for cheap data availability solutions.
Celestia is pioneering a completely new blockchain design. While we believe this is a superior model to existing solutions there remains some unexplored challenges.
The first challenge we foresee has to do with determining appropriate block sizes. As we explored throughout this post, Celestia’s block sizes can be safely increased with the number of data sampling nodes in the network. However, data sampling is not a Sybil-resistant process. Hence, there is no verifiable way to determine the number of nodes in the network. Furthermore, since nodes that participate in sampling can’t be explicitly rewarded by the protocol, assumptions with regards to sampling have to rely on implicit incentives. The process to determine and update target block size will be governed by social consensus which is a new challenge in consensus governance.
Another challenge ahead has to do with bootstrapping network effects on Celestia. Obviously, a specialized DA layer without execution doesn’t serve much purpose. Unlike other blockchains, Celestia will therefore be relying on other execution chains to kickstart user activity. To this end, one of the initial use cases of Celestia will be to serve as an off-chain DA solution for validiums on Ethereum (ie. Celestiums). Celestiums are the lowest hanging fruit for Celestia kickstart activity on its blockspace.
Another project under work is Cevmos; a Cosmos SDK chain with built-in EVM specialized for rollup settlements only. Rollups on top of Cevmos will be posting their data to Cevmos which will then post it to Celestia. Just like Ethereum today, Cevmos will execute proofs of rollups to serve as a settlement layer. The goal of Cevmos is to allow Ethereum rollups to natively launch on Celestia without changing their codebase.
Finally, there is a limitation we foresee that has to do with Celestia’s native token utility. Just like any other chain, Celestia will have a fee market, where its native token will accrue value from demand for Celestia’s blockspace. However, because Celestia doesn’t perform state execution (except for a very tiny state execution for PoS related activities), unlike most chains, its token’s utility as a liquidity source in DeFi and other verticals will be somewhat limited. For example, unlike ether which can freely move between rollups and Ethereum in a trust minimized way, Celestia’s native token will have to rely on trusted bridges to be ported over to other chains.
We believe modular blockchains are a paradigm shift in blockchain design and expect their network effects to become increasingly more apparent over the course of the next few years. Particularly with the expected launch of Celestia’s mainnet in 2023.
By decoupling execution from consensus Celestia not only achieves Bittorrent style scaling and decentralization, but also offers unique advantages including trust minimized bridges, sovereign chains, efficient resource pricing, simpler governance, effortless chain deployment, and flexible VMs.
As the first specialized DA layer, Celestia does less. And by doing less, it achieves more.