scroll down

Scaling Polkadot: Does it come at a price?

The second in a three-part contextual blog series examining scalability efforts for Polkadot. Here, we explain how Polkadot maintains a strong emphasis on security and resilience while striving for efficiency.

Andrei Sandu
Core Developer @ Parity Technologies
April 25, 2025
5 Min Read

We've already explored the rollup scalability problem and how Polkadot has delivered one of the most innovative ways to solve it. In this post, we continue by investigating the trade-offs made by Polkadot and a few other Web3 projects. Let's start with a simple question.

What is the Cost of Scalability?

Polkadot has positioned itself as a leader in scalability, but does this come at a cost? Are there trade-offs that Polkadot makes to enable rollups to achieve higher throughput and lower latency?

Scalability, resilience, and security are interwoven into the Polkadot architecture.

While high throughput and fast transaction speeds are desirable, and are key to achieving scalability, they should not be achieved at the expense of two other critical properties - resilience and security. Nor should the interoperability of the network be unduly affected.

Adding more validators increases the number of resources available for rollups, and improves both the resiliency and the economic security of the network. The issue is whether scalability affects the other Web3 fundamental properties of Polkadot.

For a wider context one should also consider understanding the properties described in this post by Gavin Wood as existential for true Web3 systems.

Resilience

Does Polkadot's architecture, which relies on a network of validators and its central Relay Chain, potentially centralize certain aspects of the network? Does it introduce single points of failure or control that could compromise its decentralized nature?

Rollups are run by sequencers that connect to the Relay Chain using the "collator" protocol. This protocol is fully permissionless and trustless. It allows anyone with internet access to connect to a small group of Relay Chain nodes and provide a rollup state transition to be verified using the resources of a single core. One requirement needs to be fulfilled - what is provided needs to be a valid state transition. If it is not, the rollup state does not advance.

Vertical scalability trade-offs

Rollups can scale vertically by leveraging Polkadot's multi-core architecture. This new functionality is introduced by the Elastic Scaling feature. During design, we discovered that rollup resilience can be affected, because the rollup block validation is not enforced on a particular core.

The protocol for submitting the blocks to the Relay Chain is trustless and permissionless, so anyone can submit blocks to be validated on any of the cores assigned to a rollup. Attackers can simply take an earlier valid block and use it to spam the validators assigned to the other cores of the rollup, wasting the resources of the rollup and reducing its throughput.

We want to maintain the resilience of the rollup and its effective use of Relay Chain resources without compromising these key properties.

Can sequencers be trusted?

It would be very simple to solve the problem by making the protocol permissioned - white lists, or just simply trusting the sequencers that they behave in a way that doesn't harm the rollup liveness.

But, in Polkadot, we cannot make any trust assumption about sequencers, because we want to retain the trustless and permissionless nature of the system. Anyone must be able to use the collator protocol to submit the state transitions of a rollup.

No compromises

The solution which was implemented pushes the problem forward to be solved entirely by the rollup state transition function. The rollup logic (aka Runtime) is the source of truth for anything that needs consensus. So it is only natural that the state transition function must commit (via its output) to the exact Polkadot core to be used for validation.

Doing so, we make no trade-off on the resilience or security of the rollup. The core assignment is guaranteed to be correct at Polkadot level by re-executing the rollup state transition as part of the availability process and through the ELVES crypto-economic security protocol.

Before any rollup block is written in the Polkadot DA, a small group of validators (typically 5) verify the validity of a rollup block after receiving its "candidate receipt" and proof of validity (PoV) from a sequencer. The PoV contains the rollup block and the corresponding storage proof that will be passed to the parachain validation function during re-execution on the Relay Chain validators.

The output of the execution is a set of commitments which include the core selector that determines the core index.

diagram.png

Validators check if the core index is equal to the core they are assigned to. If it is not, the rollup block is dropped.

This ensures that the system remains fully trustless and permissionless while malicious actors (including more privileged actors like rollup sequencers) cannot control the core where the blocks are validated. Rollups retain their resilience when they use multiple cores.

For the more technical eye, there is a full technical description in RFC103.

Security

Does Polkadot's focus on rollup scalability lead to any compromises in security? Are there potential vulnerabilities that could be exploited by malicious actors?

The security of Polkadot rollups is a concern of the Relay Chain. The rollup is just responsible for its own liveness which can be achieved with just a single honest sequencer. 

Polkadot extends its full security over rollups through the ELVES crypto-economic security protocol. It verifies all of the computations happening on all cores without making any assumptions or introducing constraints for rollups that make use of an arbitrary number of cores. 

This means Polkadot rollups scale with zero compromises on security.

Generality

Does rollup scalability constrain the programmability or applications that can be built on top of them?

Polkadot's rollup model allows Turing-complete computations to be executed in a WebAssembly environment, as long as they are completed within two seconds. Using Elastic Scaling doesn't add any restriction on the range of computations, it only increases the quantity of computations that rollups can perform over a period of six seconds. 

Complexity

What about additional complexity that could make it more difficult to develop and maintain applications on the network? Could this complexity lead to unforeseen issues or bugs?

This trade-off is largely unavoidable. Higher throughput and lower latencies don't come for free. In fact, all else considered, a higher degree of complexity is the only acceptable trade-off to be made. Rollups can seamlessly scale up and down by using the interfaces of Agile Coretime. To preserve the same level of resilience in all scenarios they also need to implement a few RFC103 requirements.

In practice additional complexity depends on the use cases, as rollups need to define their own strategy for acquiring and assigning more resources, which can depend on on-chain and/or off-chain variables and triggers.

A best-case scenario is when the rollup runs with a constant fixed scaling factor, for example three cores, or when the number of cores very rarely needs to be adjusted and it can be adjusted manually from an off-chain context.

Simpler methods also exist, such as monitoring certain transaction load in the "mempool" on the nodes.

More automation, for example using historical data to provision more resources ahead of time by using the coretime service through XCM, increases the complexity of the implementation and significantly increases the complexity of testing that needs to be done.

Interoperability

While Polkadot enables interoperability between different rollups, does this reduce their scalability? Are there limitations or trade-offs that developers need to be aware of when building scalable rollups that interact with other Web3 applications?

While Elastic Scaling can drastically reduce block time and increase throughput, it doesn't affect in any way the throughput of messaging between different rollups. Cross-rollup messaging is a feature of shared security, with its transport layer implemented in Polkadot. The amount of blockspace dedicated to sending and receiving messages remains constant for each rollup regardless of the amount of cores assigned to any of them.

One future upgrade on Polkadot, off-chain messaging, aims to improve this by using the Relay Chain as a control plane rather than a data plane for messaging. Once this is implemented, the vertical scalability enabled by Elastic Scaling will drive an increase in the maximum messaging throughput that can be achieved between two rollups.

What trade-offs do other protocols make?

It is widely known that it is possible to trade decentralization and security for a boost in all performance metrics. But, judging by Nakamoto coefficients, it is surprising to see that Polkadot's competitors achieve inferior performance even with a much lower degree of decentralization.

Solana does not use sharding in the way Polkadot and other blockchains do, and Ethereum plans to. Instead, it achieves its scalability through a single-layer, high-throughput design that relies on mechanisms like Proof of History (PoH), CPU-level parallelism, and a leader-based consensus model. Solana's theoretical maximum TPS number  is claimed to be 65K.

What is very interesting about this approach is the leader schedule which is done ahead of time and is publicly verifiable:

  • At the start of each epoch (~2 days, or 432,000 slots), the network generates a schedule assigning slots to validators based on their staked tokens.
  • Validators with more stake get more slots, proportional to their share of the total active stake (e.g., a validator with 1% of stake gets ~1% of slots).

Knowing all the block producers in advance enables focused planned DOS attacks against Solana, exposing the network to the risk of frequent outages.

PoH and parallel processing place very high hardware demands on the validators, which has led to centralization. Furthermore, the amount of staked tokens is correlated with more block authoring opportunities, leaving small validators with minimal stake to zero or just a handful of slots. This causes even more centralization and exacerbates potential outages due to DDOS.

Solana maximizes its TPS by compromising decentralization and resilience, evident in its Nakamoto coefficient of 20. In contrast, Polkadot, with a Nakamoto coefficient of 172, is significantly more decentralized.

While centralization inherently introduces security risks, there are even worse trade-offs that can be made by sacrificing security up front.

For example, let's look at TON.

The claim is 104,715 TPS. It was achieved using 256 validators in a private test net with high quality hardware and perfect networking. By comparison Polkadot achieved 128K TPS on a public and decentralized network.

What is interesting is that the consensus protocol is not secure by design. It simply allows whole classes of attacks by revealing the identity of shard (subnet) validators ahead of time. This design choice is documented in the TON whitepaper:

Screenshot 2025-04-25 at 12.42.56.png

TON trades off security to achieve its high TPS numbers. Knowing validators of one shard in advance enables DA bandwidth optimizations, but this can be exploited. Without any gambler's ruin property, malicious actors can wait long enough until a "task group" is entirely controlled by an attacker. Even if they are not controlling the entire "task group", it is still possible to just DOS the honest validators, allowing the rest to approve an invalid shard state transition.

By comparison, in Polkadot, each rollup block is validated by randomly assigned validators which are only revealed at the very last moment, when re-execution is started. This minimizes the chances of honest validators being censored.  The protocol will escalate if a validator is censored by randomly selecting new validators to re-execute.

It is important to note that attackers can only know if they control all the validators assigned to a rollup block only after starting the attack. Just a single honest validator can raise a dispute, leading to the attacker losing a significant stake.

Avalanche scales through a multi-network architecture, splitting its ecosystem into the primary network and customizable subnets. The primary network is composed of three interoperable chains secured by one validator subset.

  • X-Chain: DAG-based, used for asset transfers (~4,500 TPS)
  • C-Chain: EVM-compatible, for smart contracts (~100-200 TPS, Ethereum-like)
  • P-Chain: Manages validators and subnets

Theoretically, each subnet can achieve up to 5,000 TPS. Avalanche unlocks higher scalability by reducing the total amount of work performed by the system for a single shard. This is a good idea, and similar to Polkadot. The system cannot scale if all validators check everything.

However, a closer look at the Avalanche design tells us that it actually permits and motivates validators to only focus on the subnets they want to be a part of. On top of that, subnets can introduce additional requirements for validators which can include being located in a given country or a mandatory KYC check. This is a compromise on security and decentralization

In Polkadot all rollups benefit from the same level of security, In Avalanche there is not even a default minimum level of security or decentralization. Some subnets can be very centralized and permissioned, allowing them to theoretically hit ~5.000 TPS. Others can trade-off higher TPS for decentralization and resilience, but still there is no guarantee.

Ethereum's approach on scalability was a huge bet on scalability at the rollup level, rather than handling it directly. It is obvious that this doesn't actually solve the problem, but only passes it one level higher up the stack. 

Optimistic rollups

Many of these rollups are implemented as optimistic which are currently seen as either centralized (even to the extreme of running just a single sequencer), insecure, isolated or a combination of these. These rollups introduce high latency because they need to take an overly pessimistic view of the world and they need to delay enactment for days to give enough time for any unreported fraud proofs to be submitted.

ZK rollups

This implementation choice of rollups is largely constrained by the volume of data that can be handled in a single transaction. The computational demands of generating proofs, which are fundamental to security and integrity, is extremely high, and together with the "winner takes all" nature of proof generation increases the tendency of centralization of these systems. Also, to maintain a reasonable level of transaction throughput, ZK rollups often have to compromise and restrict the number of transactions included in each batch.  

This constraint can lead to network congestion and heightened competition for blockspace, particularly during periods of high demand. This can result in a surge in gas fees, making transactions on the ZK rollup network more expensive.  By comparison, Turing-complete ZK rollups are around 2x10^6 more expensive than the cryptoeconomic security protocol powering Polkadot cores (described in the ELVES paper).

The data availability problem associated with ZK rollups can exacerbate it as they still need to ensure that the full transaction data is available for anyone to verify. This often involves the need to integrate data availability solutions, which incur additional costs and contribute to even higher gas fees for users.

Conclusion

Polkadot distinguishes itself by maintaining a strong emphasis on security and resilience while striving for efficiency. The fundamental Web3 principles are non-negotiable. In the long run, the only projects that will stand the test of time are the ones that adhere most closely to those principles.