Bundler Integration Testing: UserOp Across ERC4337
Unlike traditional software where bugs often result in software crashes or system downtime, in the world of blockchain, such bugs can mean direct financial losses, unauthorized access to funds, or unintended distribution of tokens. With the absence of a central authority and the principle of decentralization, there is no "undo" button or a direct way to intervene should something go awry.
Smart contract testing ensures that the code behaves exactly as intended before it's cemented on the blockchain. It checks for logical flaws, and security vulnerabilities, and ensures the contract meets its specifications. In essence, smart contract testing isn't just a best practice; it's a crucial measure to protect stakeholders, uphold the project's integrity, and ensure trust in the decentralized ecosystem.
Introduction
With the advent of ERC4337 and modular smart contracts, tests must take into account the nuances specific to smart contract wallets and account abstraction. In particular, any module executing in the validation phase of the handleOps transaction must satisfy the constraints implemented by the bundlers for the public UserOp mempool, therefore there is a need for a testing environment which is able to verify compliance with these rules.
Restrictions in the Validation Phase
On a high level, these rules restrict the opcodes that can be invoked in the validation phase of the transaction, and also which storage can be accessed. A full list of these rules can be found in the ERC4337 specification.
The rationale behind these restrictions is to minimize (or in some cases eliminate) the dependence of the validity of a User Operation on non-account storage, thus preventing User Operations in a mempool from being invalidated en masse with a constant cost to the invalidator.
Importance of Compliance Testing
If a smart contract executing the validation phase (a validation module, paymaster or smart contract factory) does not respect these rules, bundlers servicing the public mempool may drop any UserOperations interacting with these contracts. In this situation, nobody would be able to use your smart contract unless these issues are fixed, or another mempool with relaxed restrictions is created.
While the latter is possible, it would require convincing bundlers to “trust” that your smart contract is safe and violates the restrictions with a good reason, or alternatively to run your own bundler. Neither option is ideal, therefore it is best to design your smart contracts to comply with these restrictions.
Now some of these rules are not obvious and may change with updates to the ERC. Therefore, there is a need for a testing setup that validates the smart contract execution against these rules and can be updated to newer versions of the ERC with minimum effort.
The rest of the article describes a testing setup by integrating the Infinitism Reference Bundler into hardhat integration tests.
Integrating the Bundler in the Testing Environment
Testing Smart Contracts that interact with 4337 Smart Contract Wallets generally involves creating UserOperations that call the contract being tested. While writing such tests in hardhat, the general way to execute the User Operation is to directly call Entrypoint’s handleOps from the test.
This test would capture all on-chain details including User Operation reverts, and works well for testing the business logic in the smart contract being tested. However, as noted earlier these tests fail to account for the restrictions placed by bundlers in the validation phase of the transactions.
Therefore, to test against these restrictions we can launch an instance of the bundler during test initialization, and submit all UserOperations to the entrypoint through the bundler. Any violations of the bundler restrictions would be returned as an error from the eth_sendUserOperation RPC call.
Which bundler should we use? The reference bundler maintained by the authors of the ERC4337 is ideal because it’s a minimal implementation without any external dependencies and can also be expected to keep up with any changes to the rules in the ERC.
Running Hardhat tests on an external network is nothing new, it can be simply done by including --network local with the command that starts the tests, where local is configured in the hardhat config file to point to the geth node. Seems pretty simple right?
There is but one problem - to perform these checks and enforce these restrictions, the bundler utilizes the debug_traceCall RPC with the Javascript tracer to know which opcodes have been called and which storage has been accessed in the validation phase of the transaction. Turns out that Hardhat’s node does not support this! So even if we were to use the bundler to submit these user operations using the bundler, it would just skip these checks.
So how do we solve this? The bundler ‘s README.md suggests that Geth supports this RPC, and thus we should be able to run a single instance of it and use that as a substitute for hardhat’s node.
Building the Test Environment
Challenges with Geth
Replacing Hardhat’s node with Geth solves the compatibility issue with bundlers, however, it introduces its own set of issues. Remember that hardhat’s node is optimized for testing, it implements a variety of features and quality-of-life enhancements that make testing easy. Some of these features are:
- A default set of pre-funded addresses each with 10000ETH.
- console.log() support
- RPCs for account impersonation, rewinding and fast forwarding block.number and block.timestamp.
- Chain Snapshots
And more. A full list of custom behaviours implemented to enhance testing can be found here:
An ideal testing environment with geth would replicate much of the same functionalities with a similar API to ensure that the testing experience is as close to vanilla hardhat testing as possible.
For this article, we focus on replicating the following minimal set of functionalities:
- Identify the default addresses used by hardhat and ensure they are funded before the tests are executed.
- Chain Snapshots
The first is a non-negotiable requirement for testing - funds are needed to execute transactions. I’d argue that the second is also quite important, as snapshots can be utilized to ensure that every test in a suite starts from a known clean blockchain state. This is important for tests to be independent and deterministic. With these two available in the testing environment, most existing hardhat tests should work with this setup with minimal to no changes.
Based on all the information above, we can identify the following steps to create the bundler-geth testing environment:
- Setup Geth and obtain a local RPC endpoint to which transactions can be sent.
- Fund the default addresses used by Hardhat.
- Deploy the entrypoint on the Geth node.
- Launch the bundler and wait for it to start successfully.
- Execute the tests on the local Geth node.
The 3rd step is needed because the Infinitism bundler expects a valid RPC and a pre-deployed ERC4337 Entrypoint to be available during initialization. This also means that the Entrypoint address must remain the same across all hardhat tests, the alternative being restarting the bundler with a new entrypoint address in each test which would make test execution extremely slow.
Setting Up Geth
We use Docker for setting up the Geth Node, with the following Dockerfile based on the one found in the bundler repository.
This is straightforward, but there are a few things we can talk about:
- We recommend using the latest release of geth, as we recently found a bug with the implementation of the debug_setHead RPC which caused it to crash after the RPC is called. Further sections of this article will describe how this RPC is used to implement Snapshots for the test environment. The bug was fixed by the Geth team recently, more details can be found here: Calling debug_setHead crashes geth with SIGSEGV in dev mode. #27990
- Normally, geth requires it to be paired with a consensus client. However, to keep things simple we run geth in developer mode, which launches geth as a single-node Ethereum Test Network with no connection to any external peers. This makes it ideal for local testnets. More details on this mode can be found here: Go Ethereum Developer Mode.
Funding the Default Hardhat Accounts
We use the default account managed by the geth node as the funding address. This can be done as follows:
Chain Snapshots
We define a snapshot to simply be the state of the blockchain at a particular block. Therefore, a snapshot can be represented as
We can use the debug_setHead RPC provided by geth to roll back the chain from any block B>b to b .
Resetting the Bundler
It is a good idea to reset the bundler between tests to get rid of any leftover state such as ops in the mempool, counters based on SCW address etc. Conveniently, the bundler provides the debug_bundler_clearState RPC for this exact purpose.
Launching the Bundler
Official docker images for the infinitism bundler can be found here: Docker Hub Account Abstraction Bundler. We use a simple docker-compose file to manage the bundler and geth instances:
We recently contributed a fix (#134fix: debug_bundler_clearState clears MempoolManager.entryCou… Merged) in the implementation of debug_bundler_clearState RPC which is crucial for the snapshot functionality. At the time of writing this article, the fixes have been merged to the main branch but haven’t been published to the docker registry, it is therefore advisable to either manually build from the main branch or use the image mentioned in the docker-compose, which is a fork of the official image.
Writing tests with Bundler Integration
Once the environment has been set, a few things need to be kept in mind while writing tests that are compatible with the environment:
- All the normal tests and assertions from Chai still work as usual, so the core patterns for writing tests remain the same.
- To submit a user operation to the entrypoint, call the eth_sendUserOperation RPC of the bundler.
- To reset the chain to a specific snapshot (block) after each test, include the revert() call in the afterEach hook.
- To reset the bundler after each test, include the resetBundler() call in the afterEach hook.
- Ensure that the address of the Entrypoint contract does not change between tests, and is consistent with the provided address while launching the bundler. This can be done by deploying the entrypoint between launching Geth and the Bundler, and then in the tests instantiating the Entrytpoint contract with the same address.
Example
We created a class called BundlerEnvironment responsible for exposing all the functions related to funding, snapshots, bundler reset and user operations submission. An implementation of this class can be found here: bcnmy/scw-contracts.
The following is an excerpt from one of our tests, where we test that a rule-violating validation module should have its UserOperation rejected by the bundler. The validation module is programmed to call the TIMESTAMP opcode in its validation logic which is forbidden.
Notice that we expect the bundle to complain with the error {"message": "account uses banned opcode: TIMESTAMP", "code": -32502} which satisfies our goal - this is an error which would not have been caught in testing if we directly submitted the user operation to the entrypoint.
The implementation of the Validation Module for those curious:
Challenges and Testing Strategy
Since these tests depend on an external geth and bundler instance, and perform expensive operations like debug_setHeadand debug_bundler_clearState after each test, the tests run slower than normal Hardhat Unit Tests. Also, it remains to be seen how tests that depend on complex timing logic can be written in this environment.
Therefore, we follow the following approach to testing our Smart Contracts:
- Write all happy flow tests in the bundler environment.
- Write all negative flow tests (where UserOperations are expected to revert) in the normal hardhat testing environment.
The rationale behind this is that it does not make a lot of sense to test the validation rules on User Operations that are expected to fail because of violation of conditions in the business logic. Examples of such tests could be violating onlyOwner restrictions etc.
Also considering the fact that most tests typically test for negative cases, it makes sense to keep them on the faster, stable hardhat testing environment and only keep the happy path tests on the slower but more rigorous bundler environment.
For our codebase we orchestrate the whole process of setting up geth, bundler, deploying the entrypoint and running the tests via a simple bash script. Get the full suite of our bundler tests.
Conclusion
This article has explored the challenges brought by standards like the 4337 and explained the steps needed to set up good testing systems. As tech changes, the difficulties we face will change too. The article also stressed the importance of testing against ERC4337’s restrictions implemented by the bundlers in the validation phase and detailed the steps for setting up an environment to write such tests.
Wait what about Foundry?
While Anvil supports the debug_traceCall RPC, it currently does not support the JS Tracer. Also, based on our research so far, there is no way to run forge tests on an external node like we do here with hardhat.
Therefore, it doesn’t seem like this is possible in Foundry at the moment (please prove me wrong). An alternate way of doing this could be to set up a hybrid foundry-hardhat repository and re-write the happy path tests in Hardhat. Steps for setting up a hybrid repository can be found here:
References
_________________________________________
This piece is authored by Ankur Dubey. Follow him on twitter.