I. Introduction
Since its creation in 2015, the Ethereum blockchain has experienced five cold and heat. Five years’ time has not only turned the Ethereum protocol 1 as a concept into reality, and made this protocol more mature and specific, but also exposed the characteristics and trade-offs of such a design. These trade-offs, as a design challenge, naturally attract and continue to attract countless ingenuities, trying to improve the usability of Ethereum; among them, the most effort and imagination are brought together to improve “scalability”. A series of programs.
What this article wants to point out is that just as the Ethereum paradigm faces design trade-offs, all these scalability solutions also face trade-offs; and to evaluate the worth and unworthiness of these trade-offs, we need to return to Ethereum itself and return to Ethereum. The real problems and real needs of Ethereum nodes and Ethereum users. The “status” perspective, as a perspective for understanding Ethereum itself, can help us clarify the design of these solutions and reveal our gains and losses.
This article will start by explaining the meaning of “state”, revealing the ultimate spear and Achilles heel of Ethereum, and then discussing various directions for improvement. “Statefulness” is the source of the “composability” of Ethereum smart contracts, but it is also the biggest weakness of the Ethereum network. From this, we can see which programs that are often mentioned will affect “composability” and which “scalability” has more obvious meaning.
2. Ethereum as a paradigm
(1) Statefulness and composability
What is “status”? State is the specific situation of a system at a certain moment. Take the blockchain that realizes cryptocurrency as an example. The state of a blockchain at a certain moment is the distribution of assets of all addresses on the blockchain (A address has 10 coins, and B address has 100 coins. , Etc.) (In fact, what we have are just some states, our minds and social consensus, interpreting these states as “assets”).
From this perspective, each set of blockchain protocols can be roughly divided into two parts, one is the consensus mechanism, and the other is the state transition rules (sometimes called “consensus rules”); the former defines the block generation rules , It indicates when all nodes participating in the blockchain need to update the state of the blockchain database locally (for example, as a node of the Ethereum blockchain, whenever a job that meets the difficulty requirements is received The node will update the latest state of the local blockchain database if the amount is proved; and the state transition rule defines what transaction is valid (“an account cannot spend money beyond its own balance”), also Defines how the node should update the status when processing the transaction (“This transaction means that account A transfers 5 coins to account B, then the balance of account A is reduced by 5, and the balance of account B is added by 5”).
For Bitcoin, its consensus mechanism is “PoW + Satoshi Nakamoto Consensus”, and its state transition rules are based on UTXO. For Ethereum, its block generation mechanism is “PoW+Ghost rules”, and its state transition rules are based on accounts.
So, what is it that makes Ethereum a breakthrough innovation?
We often hear that what makes Ethereum special is that it “introduces a Turing complete programming language and supports programmability” and so on. In fact, this statement is not accurate2. Because allowing the use of more complex programming languages does not mean anything, in fact Bitcoin can also be programmed; allowing Bitcoin to use the solidity programming language does not get an Ethereum. What makes Ethereum really special is its “rich statefulness”3: It allows one contract to call another contract, and, except for the block capacity itself, does not impose any restrictions on the number of levels of such calls.
Contract B can call contract A and change the state of contract A according to the code disclosed by contract A; contract C that calls contract B can also indirectly call contract A and change the state of contract A… Thus, a state, although saved in In the A contract, its control logic can be so overlapped and continuously accumulated; if the state is understood as an asset, this means that the right to use the asset (conditions of use) can be continuously controlled more strictly and more complicatedly; this means In theory, the update logic of a state can be infinitely close to financial contracts in real life (because all financial contracts can be superimposed and combined through simple logic). This property that allows the accumulation of arbitrary layers of control is the most critical, but how to program this control is of secondary importance.
In addition, Ethereum also allows users to write state to the blockchain, making these states a part of the global state, and requires nodes to update the state according to the logic of the contract. As a result, a contract can disclose its state to all other accounts on Ethereum, and the aforementioned rich state is really useful.
That’s right, now we have collected three attributes like collecting dragon balls:
(1) On-chain calculation paradigm: the contract can require nodes to perform calculations according to their own logic;
(2) Global state: The state of the contract can become a part of the global state and is open to all other accounts;
(3) Statefulness: Contracts can be called mutually, and there is no limit to the number of stack layers, so the control logic can be accumulated layer by layer;
Now we can summon the strongest spear of Ethereum: “composability” (currency Lego)!
The on-chain calculation method allows us to have all kinds of contracts; the global state allows these contracts to access each other’s state; rich state allows the combination of contracts to be infinitely diverse. So we can not only have the stablecoin DAI, but also the lending market, the lottery application, the application that automatically donates the proceeds of the lottery, and the application that automatically rebalances the savings ratio between different lending markets… …
(2) Status data explosion problem
“Composability” is so good that it doesn’t seem to be true, right? That’s right, the trinity of the above three attributes is actually a double-edged sword.
The state transition process of Ethereum can be abstracted as: the state transition function takes the old state and transaction list as input and outputs the new state. This means that the full verification node of Ethereum must maintain the latest state of the Ethereum blockchain locally, so that it can execute the state function and verify the validity of a block with the result (at the same time it is also reached with other Ethereum nodes. consensus).
The contradiction is: for the contract and the developer, the state of the contract is stored on the Ethereum node as part of the global state of Ethereum, and the state update of the contract is calculated by the Ethereum node (and it is also calculated by the initiating calculation request). The user pays), this “serverless” architecture is very comfortable; however, these states will be permanently stored in the full verification node of Ethereum as long as one payment is made. Although each update needs to be paid, the node cannot avoid local storage The status data will continue to accumulate and expand.
The expansion of state data is a problem because it will bring higher and higher hard disk (random) read and write burdens to full verification nodes. State data is not like block data. Block data is static and does not require frequent reads and writes after persistent storage; however, each additional block of state data has to be read and written many times; and as the amount of state data increases , The burden of reading and writing will become heavier. In the past few years, we have often heard people say that the full node of Ethereum is difficult to deploy. This is one major reason. Some time ago, Infura’s free Ethereum node service collapsed4, causing many services that rely on Infura to collapse. This is a wake-up call for everyone. It turns out that the maintenance of Ethereum nodes is so difficult, and everyone would rather choose to trust others.
This problem is not easy to solve. In the past few years, Ethereum’s multiple hard fork upgrades have increased the gas consumption of opcodes that access the state, precisely in order to curb the creation of new states in the contract at an economic cost. But this is obviously only a temporary solution, because the logic has not changed fundamentally. The state data must continue to be stored at the Ethereum node, but the user who creates the state data only pays once. It has also been suggested that in order to change this, it is necessary to introduce a certain “state rent” mechanism, requiring the state-storing contract to continuously pay rent, otherwise the availability of the contract will be terminated. However, this mechanism has unimaginable complexity. On the one hand, it is difficult to determine a reasonable method of collecting rent, and on the other hand, it is difficult to determine a reasonable payment object. Therefore, research on the state rent mechanism also stagnated in 20195. There are also projects (such as Nervos) that try to define the size of the available state space by the amount of currency held, so the size of the state data will always have an upper limit, which avoids the problem of state inflation, but it also changes the original asset Economic attributes.
So far, in addition to “statelessness”, I have not seen a satisfactory and fundamental solution to this problem, and “statelessness”, in reality, we face many challenges. We will talk about this later.
All in all, global state, on-chain calculations, and rich state, even if the contract on Ethereum has obtained composability, it also puts the Ethereum network in danger of being centralized; just like the Lord of the Rings in the novel “Lord of the Rings” , Can not only summon powerful power, but also swallow the user himself. I am worried that Ethereum will bear such a heavy burden for a long time.
Next, we will understand the design and trade-offs of various scalability solutions from a “state” perspective.
3. The development direction of Ethereum
In this chapter, we will analyze the four development directions of Ethereum: Layer-2 scheme, sharding, statelessness and Rollup scheme. This classification is completely unreasonable, because rollup is a subset of the Layer-2 scheme; and statelessness is the pre-sharding technology; even juxtaposition of them is unreasonable, because the Layer-2 scheme hardly needs Change the bottom layer of Ethereum, and sharding and statelessness have such requirements. This is just for the convenience of narration and understanding.
(1) Layer-2 plan
The idea behind the Layer-2 solution comes from a simple but very precise intuition: the reason why Ethereum faces a throughput bottleneck is because the bandwidth, computing power, and the ability to maintain status data of the nodes that make up the Ethereum network It is limited, and it is difficult to improve; simply require the nodes of the entire network to process more transactions per unit time, and the operating requirements of the nodes will inevitably rise, which undermines decentralization; however, from the perspective of usage, it is not All state needs to be placed on Ethereum, and all state calculations do not need to occur on Ethereum; we can store the intermediate state (or all states) of a contract in other places, and user interaction (That is, the status update) does not happen on the Ethereum blockchain; only when the user thinks it is necessary to settle a certain state, the state is sent to Ethereum, and Ethereum will confirm it.
In a word: if we can’t make the network reach a consensus on more transactions within a unit time, then improve the connotation of a single transaction.
The classic Layer-2 solution “state channel” most thoroughly embodies this idea: when two users participating in a channel lock their funds into the contract, subsequent transactions between the two will not be sent to the chain. Use other communication tools to exchange signed messages with each other, and thus reach a consensus on the state in the channel (so no matter how many messages are sent and received between them, how much state data they form, they will not become a burden on Ethereum); until two People think that there is no need to interact anymore (or it is necessary to temporarily settle once), so they send the jointly approved state and the signatures of the two to Ethereum, and then Ethereum updates the state of the contract, and according to this state is two People settle funds.
If you regard the Layer-2 scheme as a design pattern of the contract, you will see more clearly-the Layer-2 scheme chose not to use the global state. Another contract cannot know in real time what is the state of a Layer-2 contract (how much money each user ABC has), because these states are not on the chain, and therefore, a Layer-2 contract cannot interact with other contracts. Combined.
Even so, the Layer-2 scheme has also exchanged for extremely valuable things: faster transaction speed (although the confirmed intermediate state is not as safe as the main chain state), lower handling fees, and smaller main chain node burden .
But why hasn’t the Layer-2 program produced any fruit in the past few years? Because before the emergence of the Rollup scheme, other schemes, including state channels and Plasma, could not prove that the funds locked in their own contracts are as safe as those locked in state contracts (neither frozen nor stolen) . In the state channel scheme, if you do not monitor the blockchain at all times, your counterparty can “steal” your money by submitting the old state to the main chain; in the Plasma scheme, you often need to rely on the operator To provide you with proof of your own status, so it is difficult for the operator to defend itself.
This is completely different from the experience of using state contracts on the Ethereum mainnet. Excluding the code risk of the contract, the money you deposit in the contract will not be stolen unless someone launches a 51% attack to cancel your transaction; unless someone has been censoring your transaction through a 51% attack, you must be able to Withdraw your own funds. (In the following article, we will see how the Rollup solution solves this problem.)
(2) Fragmentation
Another intuition for improving Ethereum and increasing the throughput of Ethereum is: Ethereum’s limited throughput is due to the fact that each full node must process all on-chain transactions; if we can make each node only process a part of the transaction, it’s different The nodes of the group process different transactions separately (in parallel), then the throughput of the entire system per unit time is equal to the sum of the processing capacity of a single group of these groups of nodes; that is, the burden of a single node does not increase, but the overall The throughput of the system is improved (and the smaller the packet, the larger the throughput multiplier). This is called “sharding”.
Regarding the fragmented architecture, an interesting question is its exact definition. In the classic, non-fragmented blockchain architecture, full nodes must (1) repeat all calculations; (2) maintain all states; (3) propagate all blocks/transactions. Some people think that if only one is broken, it is even fragmented; but from a more rigorous perspective, all three must be broken to achieve the theoretically derived throughput improvement effect6. Different definitions have very different effects. Here, the definition I adopt is that a node at least does not need to maintain all the state of the entire system.
But another more interesting question is, if the front-end technology required by the sharding architecture can be realized, how significant this architecture can be.
In the sharding architecture conceived by the research team of the Ethereum Foundation, the state of the entire system is divided into several parts, and each part is updated in parallel. That is to say, one shard cannot understand the other in real time. The internal state of the piece. When contract A on shard A tries to call contract B on shard B, it cannot be assumed that the processor owns the state of shard B. Therefore, the processing result must wait for shard B to complete the state update before returning . Therefore, cross-shard transactions must endure the most unbearable cost in transaction processing: latency. The point is that this kind of time delay cannot be abstracted with monetary costs. Because Shard B does not know that a transaction that needs to call itself has occurred on Shard A, it can only wait for a trusted communication layer to broadcast the transaction for itself.
The higher the value that can be created per unit of time, the more intolerable the delay; and this means that if there is a centralized shard of a DeFi application, these applications will not schedule processing on other shards through cross-shard transactions. Ability, because it has no meaning at all, and cannot meet the delay requirements of DeFi applications. Similarly, this also means that the single-shard processing capacity of this DeFi application cluster is the upper limit of its processing capacity (there is no difference from a single blockchain). Building houses and roads in deep mountains and old forests cannot solve the problem of people’s living space in large cities.
(Some people think that these “idle” sharding throughput can be used by a niche application. I think that the Layer-2 solution can achieve the same effect, and it has fewer changes to the underlying layer and is more secure.)
(3) Statelessness
Statelessness is the only upgrade direction that directly faces the problem of state data expansion7.
In the current Ethereum protocol, the transaction itself does not carry the information of the state accessed by itself. It is precisely for this reason that the node processing the transaction must maintain the state data as a precondition for executing the state transition function. For this reason, the expansion of state data is a problem (increasing the burden on nodes).
The key to statelessness is to allow the transaction/block itself to be accompanied by the information of the accessed state. Therefore, a block is verifiable by itself, and the node that does not need to process the transaction has state.
In fact, statelessness is to change the verification method of Ethereum blocks by changing the structure of Ethereum blocks. There is no need to maintain status data, which eliminates the need to read and write hard drives, and the block verification speed can be faster. In addition, each node may not maintain state data at all, or maintain state data of certain contracts according to its own needs.
However, to be honest, statelessness currently faces many design challenges. Specifically: (1) Statelessness requires that the block/transaction attach a witness to the accessed state. The scale of this part of the data may be very large. The current block data size of Ethereum is about 20-40 KB. But the size of the witness data may be at the MB level (depending on the size of the amount of state accessed); (2) Only nodes that maintain all states can assemble witnesses, so who will provide state for ordinary users? (Actually, this can be regarded as a change in the assumption of Ethereum’s operation: all the original full nodes can participate in mining; but after the realization of statelessness, it will not be anymore); (3) How to price the gas consumption of transactions ? Especially because of the timeliness of witness, it is not possible to determine how much calculation is spent in assembling the witness based on the opcode.
It is precisely because of these difficulties that the full nodes of Ethereum may have to operate in this mode of maintaining all states for a long time. However, statelessness is definitely the most exciting direction among the current improvements to the Ethereum protocol. Because it faces the core problem of Ethereum directly and tries to solve this problem with a lot of money. In addition, the research on the use of state data in the Ethereum protocol also nourishes other research directions, such as synchronization methods8.
I prejudice that the future of Ethereum, even if it is not stateless, is a scheme inspired by statelessness.
(4) Rollup program
The Rollup scheme is a Layer-2 scheme. Its special feature is that it will publish the transactions used in each state update on the Ethereum blockchain.
Like other Layer-2 solutions, the Rollup solution also stores the state off-chain and does not require Ethereum nodes to calculate the new state of the contract; however, all transactions that will change the state of the contract are published as data, which means Therefore, any third party can use these public data and public rules to calculate the state of the contract (although a contract on Ethereum cannot use these states).
As mentioned earlier, when the Lyaer-2 contract chooses to move the state calculation outside the Ethereum chain and shields its own contract state, it introduces risks to the user: the user does not know that the operator of the Layer-2 contract will An invalid state will not be sent to the chain to be finalized by the blockchain. If the operator can do this, it is equivalent to directly stealing the user’s funds; in addition, the user does not know the operation of the Layer-2 contract Instead of reviewing their own transactions, they will freeze their funds.
There are two ways to solve the problem of theft of funds. One is to ensure that every state transition is valid, that is, every time the state root of the contract is to be updated, let Ethereum perform a verification procedure for the integrity of the calculation. Only when the verification is passed, the contract is allowed to update the state root. This is the idea of zk-rollup; another idea is to require people to attach a deposit when requesting to update the contract state root. If the submitted state root is invalid, The person who reports this state root can get the deposit of the original submitter, which is optimistic-rollup. But the latter kind of thinking has a prerequisite: the whistleblower must have a way to obtain the state before the state transition, otherwise it will not be able to generate the error proof of the transition.
There is only one solution to the problem of locked funds: weaken the concept of “operator” as much as possible, so that anyone can submit a transaction to Ethereum to update the state of the Layer-2 contract. But this comes back to the question: if the submitter does not have the status of the contract, how can he prove the validity of his status access and let the contract release?
Finally, the Rollup solution solves this problem by “publishing matching transaction data at every state transition”. Therefore, although the Rollup contract does not disclose the state on the chain, anyone-including the users of the rollup contract-can reconstruct the internal state of a rollup contract based on these public transaction packages. This means that, if properly designed, the funds in the rollup contract can be as safe as the state-saving contract on the Ethereum blockchain (that is, the contract we use now)!
Taking zk-rollup as an example, the on-chain verification procedure ensures that a rollup contract cannot update a wrong state root, just like Maker DAO cannot confiscate your DAI without your permission; at the same time, suppose it also discloses calculations With the complete construction method, you can initiate a state transition directly to the contract on the chain at any time to withdraw your own money. This is exactly the same as a normal, stateful contract: if there is no 51% attack, the state of the rollup contract cannot be rolled back; if there is no continuous 51% attack, it cannot prevent you from getting your money back.
Optimistic-rollup relies on some cryptoeconomic assumptions, so it is slightly weaker: in addition to the 51% attack rollback state, the attacker can also inject the wrong state root into the contract through a 51% review attack that lasts for a period of time; or can bet For one, bet that everyone who has calculated the latest state of the contract will not observe an error (the probability is extremely small). But Optimistic-rollup can also provide strong non-custodial, and you can get your money back at any time.
In other words, if a user is willing to deposit money into a stateful contract (such as compound, maker dao, uniswap), then there is no reason not to deposit into a rollup contract (if a qualifier must be added, it is a zk-rollup contract). With zk-rollup, the layer-2 solution can already provide users with the greatest degree of financial security that contracts on the Ethereum blockchain can provide. Obviously, only by doing this can mass adoption be possible.
All Layer-2 schemes can be regarded as a compromise towards statelessness: the Layer-2 scheme itself is stateless (for other contracts, the internal state of the Layer-2 contract is inaccessible), and its internal No matter how complex the state is, it will not increase the burden on the Ethereum node; at the same time, when updating the Layer-2 contract (the state root), the role of Ethereum is more verification, that is, verifying that the state root update is valid, rather than itself To calculate this state root. However, in the rollup era, the Layer-2 solution proved that it can be as secure as a stateful contract, and its promises could become reality.
Four. Conclusion
To sum up, in this article, I explained the source of Ethereum’s “composability”, and whether various hotly debated scalability solutions sacrificed this composability, and what they got in exchange. Readers may notice that when I reason and evaluate, I put a lot of emphasis on “what we have achieved/what we have achieved” and “people’s behavior shows what they need”. Yes, it is from this perspective that I explain the charm of the rollup scheme (in fact, this may be one of the core motivations for me to write this article). In my opinion, this kind of thinking tendency can make our thinking starting point more reliable, avoid guessing the needs of users, and avoid investing in building castles in the sky.
In the circulation of history, I see that people choose certain things among the optional things, which makes me have to think that these things are important; and if certain technologies do not increase people’s optional things, At the expense of what people actually choose, there is no reason to have confidence in these technologies.
Comments
Post a Comment