Certora vs Echidna: a case study on invariant testing in eBTC

Introduction

In this post, we'll be comparing the proving of properties in a smart contract system using fuzzing and formal verification tools. We'll be using the eBTC protocol by Badger DAO as a real-world case study in performing this analysis.

The eBTC project recently underwent a community audit on code4rena after having previously been reviewed by several auditing firms and solo auditors. As part of one of these reviews, @agfviggiano outlined properties of the system for a fuzzing campaign (you can learn more about his experience doing so here) which we'll be using to compare the tests created using Trail of Bits' Echidna fuzzer with the new tests using the Certora Prover. We'll be comparing runtime and number of SLOC for a test as rough measures of each tool's efficiency. The repo that contains the tests we use is available here.

It's important to note that these tools seek to accomplish similar goals through different means, namely Echidna uses advanced fuzzing to try a range of possible values input through function calls to test a property (for more on fuzzing with Echidna see here), whereas Certora uses formal verification to abstract the program into mathematical functions which are tested against a specification for all possible values that can be passed to the functions (for more on formal verification with Certora see here). But fundamentally both tools seek to show that a property defined in them holds for the system they're being tested on.

eBTC

eBTC is a collateralized asset soft pegged to the price of BTC in the form of an ERC20 token on Ethereum and is the core asset of the protocol we'll be analyzing. The collateral for eBTC is provided by liquid staking tokens (LSTs) representing Staked Ether (stETH). eBTC allows greater liquidity for holders of LSTs where they can lock up their stETH in the protocol contract, creating a collateralized debt position (CDP) and allowing them to mint eBTC. CDPs are required to be collateralized at a fixed minimum ratio determined by the protocol.

Properties

Properties Under Scrutiny

We'll be testing the following properties from the list of properties outlined in the fuzzing campaign which are available in this README:

AP-04 - the total collateral in the active pool should be equal to the sum of all individual CDP collateral
BO-07 - eBTC tokens are burned upon repayment of a CDP's debt
CDPM-06 - redemptions do not increase a CDP debt

These properties were chosen primarily for the broad overview they give of the general protocol behavior and for illustrating the differences between the two tools, in any true formal verification or fuzzing campaign you would want to ensure much greater coverage of the properties defined in the README.

The Contracts

The contracts we'll be looking at for the above properties are ActivePool.sol, BorrowerOperations.sol, and SortedCdps.sol. Here's a brief overview of what each contract does better to understand it in the context of our tests:

ActivePool (AP)- holds the collateral and EBTC debt (only accounting but not EBTC tokens) for all active CDPs
BorrowerOperations (BO)- handles all end-user interactions like opening, adjusting and closing CDPs; it also allows ERC3156 compatible flash minting of eBTC token
CdpManager (CDPM)- handles all CDP-related core processing like collateral & debt accounting, split fee calculation, redemption, etc. along with liquidations through delegatecalls to LiquidationLibrary

Testing Properties

We'll now look at testing the above properties starting with AP-04: the total collateral in the active pool should be equal to the sum of all individual CDP collateral.

Total Collateral Property (AP-04)

We first look at the fuzz test created using Echidna:

  function invariant_AP_04(
        CdpManager cdpManager,
        ActivePool activePool,
        uint256 diff_tolerance
    ) internal view returns (bool) {
        uint256 _cdpCount = cdpManager.getActiveCdpsCount();
        uint256 _sum;
        for (uint256 i = 0; i < _cdpCount; ++i) {
            (, uint256 _coll) = cdpManager.getSyncedDebtAndCollShares(cdpManager.CdpIds(i));
            _sum += _coll;
        }
        uint256 _activeColl = activePool.getSystemCollShares();
        uint256 _diff = _sum > _activeColl ? (_sum - _activeColl) : (_activeColl - _sum);
        return (_diff * 1e18 <= diff_tolerance * _activeColl);
    }

Looking at the output of this test we get:

We see that in 23h 54m and for 100068 calls the fuzzer failed to find any inputs that violated the property outlined above. This gives us a high degree of certainty that our code is correct but doesn't guarantee that for all input values, our system will behave as expected (as the 100069th call or any others may cause a violation), so for a more rigorous test we can use formal verification to ensure that for all possible values, there is no unexpected behavior.

We can then write this same test using CVL which tests the property for all possible values to ensure there are no unexpected states.

methods {
    function getSystemCollShares() external returns (uint256) envfree;
    function increaseSystemCollShares(uint256 _value) external;
    function transferSystemCollShares(address _account, uint256 _shares) external;
}

// we can assume that the sum of the cdps is a constant since the CDPM and BO only call AP after having changed the CDPs, 
// so in the context of the ActivePool contract, the sum of cdps doesn't change
function sumCdps(bool adding, uint256 systemCollateral, uint256 deltaCollateral) returns mathint {
    if(adding) {
        return systemCollateral + deltaCollateral;
    } else {
        return systemCollateral - deltaCollateral;
    }
}

// when any function is called in ActivePool that modifies systemCollShares the collateral in the CDPs have already been updated by BO or CDPM so it should be a constant
// so the operation in the pool needs to increase/decrease the systemCollShares by the difference between the systemCollShares and the total CDP collateral

/** Property 1a: 	
    The total collateral in active pool should be equal to the sum of all individual CDP collateral when increasing collateral
**/
rule increaseTotalCollateralIntegrity() {
    env e;
    mathint sum;
    address recipient;
    uint256 deltaCollateral;
    uint256 system_collateral_after;
    uint256 system_collateral_before = getSystemCollShares();

    require deltaCollateral <= system_collateral_before;

    // initially only calling increaseSystemCollShares for testing 
    sum = sumCdps(true, system_collateral_before, deltaCollateral);

    increaseSystemCollShares(e, deltaCollateral);

    system_collateral_after = getSystemCollShares();
    // this is same as system_collateral_after == system_collateral_before + deltaCollateral
    assert to_mathint(system_collateral_after) == sum;
}

/** Property 1b: 	
    The total collateral in active pool should be equal to the sum of all individual CDP collateral when decreasing collateral
**/
rule decreaseTotalCollateralIntegrity() {
    env e;
    mathint sum;

    address recipient;
    uint256 deltaCollateral;
    uint256 liquidatorRewardShares;
    uint256 system_collateral_before = getSystemCollShares();
    storage init = lastStorage;

    require deltaCollateral <= system_collateral_before;
    sum = sumCdps(false, system_collateral_before, deltaCollateral);

    // calling the first function that transfers collateral shares
    transferSystemCollShares(e, recipient, deltaCollateral);
    uint256 system_collateral_after_first_transfer = getSystemCollShares();

    // calling the second function that transfers collateral shares at the initial state
    transferSystemCollSharesAndLiquidatorReward(e, recipient, deltaCollateral, liquidatorRewardShares) at init;
    uint256 system_collateral_after_second_transfer = getSystemCollShares();

    // this is same as system_collateral_after == system_collateral_before - deltaCollateral
    assert to_mathint(system_collateral_after_first_transfer) == sum;
    assert to_mathint(system_collateral_after_second_transfer) == sum;
}

We split our property into two sub-properties for easier testing since we need to test if the system collateral increases as well as if it decreases when collateral is transferred into (property 1a) or out of (property 1b) the system, respectively. Since the effect of these operations on the total system collateral are opposites, we separate them as shown above to perform proper comparisons with the initial state in our assertion. We can further note that the only contracts that can increase or decrease system collateral are the CdpManager and BorrowerOperations due to the checks in the increaseSystemCollShares, transferSystemCollShares and transferSystemCollSharesAndLiquidatorReward functions that increase/decrease system collateral using the following function that acts as an access modifier:

function _requireCallerIsBOorCdpM() internal view {
        require(
            msg.sender == borrowerOperationsAddress || msg.sender == cdpManagerAddress,
            "ActivePool: Caller is neither BorrowerOperations nor CdpManager"
        );
}

Using this fact and the observation that CdpManager and BorrowerOperations only make calls to the functions in ActivePool after they've modified the SortedCdps linked-list we can make the assumption that the SortedCdps list will not change during these operations (since there's no call to functions that modify it) and so we can assume the total debt as a function of the sum of CDP debt to be a constant which we summarize in our spec using the following CVL function:

function sumCdps(bool adding, uint256 systemCollateral, uint256 deltaCollateral) returns mathint {
    if(adding) {
        return systemCollateral + deltaCollateral;
    } else {
        return systemCollateral - deltaCollateral;
    }
}

This greatly reduces the possible number of paths that the Certora Prover has to explore and so reduces the runtime and possible timeouts so that we can properly evaluate just the function of interest, using the assumption that other functions that modify the SortedCdps linked-list, do so correctly (ideally this would be defined and proven by a different property and test during an invariant testing campaign). With this assumption made we can run the prover using the following command from the root directory of our project:

certoraRun 2023-10-badger/certora/confs/AP04.conf

which gives us the following output from the prover and we can see that our property holds for all cases:

Comparing the two tests we can see that the Echidna test required 15 SLOC and the CVL test required 42 SLOC (35% more than the Echidna test). If we look at runtimes we see that the Echidna test required 23h 54m 1s whereas the CVL test only required 18s to verify due to the summary we used (.02% the time of the Echidna test).

Debt Repayment Property (BO-07)

Next, we look at the second property of interest in our list BO-07: eBTC tokens are burned upon repayment of a CDP's debt

The test we pass into Echidna looks like this:

function repayDebt(uint _amount, uint256 _i) public setup {
        bool success;
        bytes memory returnData;

        uint256 numberOfCdps = sortedCdps.cdpCountOf(address(actor));
        require(numberOfCdps > 0, "Actor must have at least one CDP open");

        _i = between(_i, 0, numberOfCdps - 1);
        bytes32 _cdpId = sortedCdps.cdpOfOwnerByIndex(address(actor), _i);
        t(_cdpId != bytes32(0), "CDP ID must not be null if the index is valid");

        (uint256 entireDebt, ) = cdpManager.getSyncedDebtAndCollShares(_cdpId);
        _amount = between(_amount, 0, entireDebt);

        _before(_cdpId);

        (success, returnData) = actor.proxy(
            address(borrowerOperations),
            abi.encodeWithSelector(
                BorrowerOperations.repayDebt.selector,
                _cdpId,
                _amount,
                _cdpId,
                _cdpId
            )
        );
        require(success);

        _after(_cdpId);

        eq(vars.newTcrAfter, vars.tcrAfter, GENERAL_11);

        // https://github.com/Badger-Finance/ebtc-fuzz-review/issues/3
        gte(vars.newTcrAfter, vars.newTcrBefore, BO_08);

        eq(vars.ebtcTotalSupplyBefore - _amount, vars.ebtcTotalSupplyAfter, BO_07);
        eq(vars.actorEbtcBefore - _amount, vars.actorEbtcAfter, BO_07);
        // https://github.com/Badger-Finance/ebtc-fuzz-review/issues/3
        t(invariant_GENERAL_09(cdpManager, vars), GENERAL_09);
        t(invariant_GENERAL_01(vars), GENERAL_01);
        // https://github.com/Badger-Finance/ebtc-fuzz-review/issues/4
        gte(
            collateral.getPooledEthByShares(cdpManager.getCdpCollShares(_cdpId)),
            borrowerOperations.MIN_NET_STETH_BALANCE(),
            GENERAL_10
        );

        if (
            vars.lastGracePeriodStartTimestampIsSetBefore &&
            vars.isRecoveryModeBefore &&
            vars.isRecoveryModeAfter
        ) {
            eq(
                vars.lastGracePeriodStartTimestampBefore,
                vars.lastGracePeriodStartTimestampAfter,
                L_14
            );
        }

        if (!vars.isRecoveryModeBefore && vars.isRecoveryModeAfter) {
            t(
                !vars.lastGracePeriodStartTimestampIsSetBefore &&
                    vars.lastGracePeriodStartTimestampIsSetAfter,
                L_15
            );
        }

        if (vars.isRecoveryModeBefore && !vars.isRecoveryModeAfter) {
            t(!vars.lastGracePeriodStartTimestampIsSetAfter, L_16);
        }
    }

The above checks use the repayDebt function from the TargetFunctions contract to perform various assertions about the system properties. For the property we're interested in the assertion we care about checks that the balance of the user's eBTC tokens is reduced appropriately:

eq(vars.ebtcTotalSupplyBefore - _amount, vars.ebtcTotalSupplyAfter, BO_07);

After running this test with Echidna, we get the following result:

This shows that for 100066 calls made by the fuzzer to the function with varying inputs none caused the assertions to fail.

We then similarly define our property in CVL to prove that there are no edge cases unaccounted for by the fuzzer and every time a user burns eBTC their balance is in fact reduced:

using EBTCToken as token;
using SortedCdps as sortedCdps;
using CollSurplusPool as collSurplusPool;

methods {
    function token.balanceOf(address) external returns (uint256) envfree;
    function token.totalSupply() external returns (uint256) envfree;

    function sortedCdps._ external => NONDET;
    function collSurplusPool._ external => NONDET;

    function _.locked() external => CONSTANT;
    function _.getOwnerAddress(address) external => PER_CALLEE_CONSTANT;
    function _.increaseTotalSurplusCollShares(uint256) external => NONDET;

    function repayDebt(bytes32, uint256, bytes32, bytes32) external;
}

/** Property 2: 
    eBTC tokens are burned upon repayment of a CDP's debt
**/
rule burnAfterRepayIntegrity(bytes32 _cdpId, uint256 _debt, bytes32 _upperHint, bytes32 _lowerHint) {
    env e;
    address caller = e.msg.sender;
    // get caller's eBTC balance before
    uint256 token_balance_before = token.balanceOf(caller);
    uint256 total_supply = token.totalSupply();

    // preconditions: ensure that the debt of a cdp is <= user's eBTC balance and balance is less than totalSupply
    // filters cases where the caller is the 0 address or the current contract
    require token_balance_before >= _debt && token_balance_before <= total_supply;
    require caller != 0 && caller != currentContract;

    // repay the debt
    repayDebt(e, _cdpId, _debt, _upperHint, _lowerHint);

    uint256 token_balance_after = token.balanceOf(caller);
    // assert that user's eBTC balance after should be changed by the amount of debt paid
    assert to_mathint(token_balance_after) == token_balance_before - _debt,
        "user's eBTC balance should decrease after burning";
}

In the above, we can see that we summarized calls to the SortedCdps and CollSurplusPool contracts all as NONDET , this once again decreases the number of paths the Prover needs to reason about and in this case is a fair assumption to make because we don't care about modifications to state variables made by calls to these contract's functions, we only care that a user's eBTC balance is decreased after they burn their eBTC tokens, so the use of the NONDET summary tells the Prover that calls to functions from these contracts can return any possible value. We also further summarized the locked, getOwnerAddress and increaseTotalSurplusCollShares functions to reduce the number of contracts that need to be compiled by the Prover and once again because they only modify state variables which we're uninterested in in our current test. We run our spec from the root directory using:

certoraRun eBTC-certora-vs-echidna/certora/specs/BO07.spec

After running the above rule we get the following output from the Prover which tells us that for every time the user repays debt for their CDP, the associated eBTC token with that CDP gets burned and removed from their eBTC balance, as expected:

Comparing the two testing methods we see that the Echidna test consisted of 56 SLOC whereas the CVL rule and its setup were 25 SLOC, 45% less than the Echidna test. It's worth pointing out however that the Echidna test includes tests for other properties that weren't of interest in this case so if reduced to only the property of interest would be shorter. Next, looking at run time we see that the Echidna test took 2m 51s and the CVL test took 3m 25s, meaning the Echidna test was 58% faster, but of course, this is only marginal in terms of time we have to wait for a test to complete.

Redemptions Don't Increase Debt Property (CDPM-06)

Below we have the Echidna test for our final property CDPM-06: redemptions do not increase a CDPs debt:

function redeemCollateral(
        uint _EBTCAmount,
        uint _partialRedemptionHintNICR,
        uint _maxFeePercentage,
        uint _maxIterations
    ) public setup {
        bool success;
        bytes memory returnData;

        _EBTCAmount = between(_EBTCAmount, 0, eBTCToken.balanceOf(address(actor)));
        _maxIterations = between(_maxIterations, 0, 1);

        _maxFeePercentage = between(
            _maxFeePercentage,
            cdpManager.redemptionFeeFloor(),
            cdpManager.DECIMAL_PRECISION()
        );

        bytes32 _cdpId = _getFirstCdpWithIcrGteMcr();

        _before(_cdpId);

        (success, returnData) = actor.proxy(
            address(cdpManager),
            abi.encodeWithSelector(
                CdpManager.redeemCollateral.selector,
                _EBTCAmount,
                bytes32(0),
                bytes32(0),
                bytes32(0),
                _partialRedemptionHintNICR,
                _maxIterations,
                _maxFeePercentage
            )
        );

        require(success);

        _after(_cdpId);

        gt(vars.tcrBefore, cdpManager.MCR(), EBTC_02);
        if (_maxIterations == 1) {
            gte(vars.activePoolDebtBefore, vars.activePoolDebtAfter, CDPM_05);
            gte(vars.cdpDebtBefore, vars.cdpDebtAfter, CDPM_06);
            // TODO: CHECK THIS
            // https://github.com/Badger-Finance/ebtc-fuzz-review/issues/10#issuecomment-1702685732
            if (vars.sortedCdpsSizeBefore == vars.sortedCdpsSizeAfter) {
                // Redemptions do not reduce TCR
                // If redemptions do not close any CDP that was healthy (low debt, high coll)
                gt(vars.newTcrAfter, vars.newTcrBefore, R_07);
            }
            t(invariant_CDPM_04(vars), CDPM_04);
        }
        gt(vars.actorEbtcBefore, vars.actorEbtcAfter, R_08);

        // Verify Fee Recipient Received the Fee
        gte(vars.feeRecipientCollSharesAfter, vars.feeRecipientCollSharesBefore, F_02);

        if (
            vars.lastGracePeriodStartTimestampIsSetBefore &&
            vars.isRecoveryModeBefore &&
            vars.isRecoveryModeAfter
        ) {
            eq(
                vars.lastGracePeriodStartTimestampBefore,
                vars.lastGracePeriodStartTimestampAfter,
                L_14
            );
        }

        if (!vars.isRecoveryModeBefore && vars.isRecoveryModeAfter) {
            t(
                !vars.lastGracePeriodStartTimestampIsSetBefore &&
                    vars.lastGracePeriodStartTimestampIsSetAfter,
                L_15
            );
        }

        if (vars.isRecoveryModeBefore && !vars.isRecoveryModeAfter) {
            t(!vars.lastGracePeriodStartTimestampIsSetAfter, L_16);
        }
    }

Where the primary line of interest to us which tests the property is:

gte(vars.cdpDebtBefore, vars.cdpDebtAfter, CDPM_06);

After running the test we get the following output:

We can see from this that after 100142 runs the fuzzer failed to find any cases where the assertion failed for the inputs passed in.

We can use the following rule to perform the same test in Certora on all possible inputs:

using SortedCdps as sortedCdps;
using PriceFeed as priceFeed;
using ActivePool as pool;

methods {
    function _.locked() external => CONSTANT;
    function _.getOwnerAddress(address) external => PER_CALLEE_CONSTANT;
    function _.increaseTotalSurplusCollShares(uint256) external => NONDET;
    
    function sortedCdps._ external => NONDET;
    function priceFeed._ external => NONDET;
    function pool.getSystemDebt() external returns (uint256) envfree;
}

/** Property 3: 
    Redemptions do not increase a CDPs debt
**/
rule redemptionsDontIncreaseDept() {
    env e;
    uint256 debt_to_redeem;
    bytes32 first_redemption_hint;
    bytes32 upper_partial_redemption_hint;
    bytes32 lower_partial_redemption_hint;
    uint256 partial_redemption_hint_NICR;
    uint256 max_Iterations;
    uint256 max_fee_percentage;
    uint256 system_debt_before = pool.getSystemDebt();

    // redeem collateral
    redeemCollateral(
        e,
        debt_to_redeem,
        first_redemption_hint,
        upper_partial_redemption_hint,
        lower_partial_redemption_hint,
        partial_redemption_hint_NICR,
        max_Iterations,
        max_fee_percentage
    );

    uint256 system_debt_after = pool.getSystemDebt();

    // total debt of CDPs after should be <= total debt before or else this would imply one of the CDPs has had its debt increased
    assert system_debt_after <= system_debt_before, "system debt after should decrease";
}

Because we've proven earlier that the total collateral in the active pool should be equal to the sum of all individual CDP collateral our rule uses the systemDebt variable to track whether a redemption has increased the total debt in the system, which if so would imply that the debt for any given CDP has also increased.

Looking at the results we see that our test passes, indicating that no redemption operation could end up increasing the systemDebt and by extension the debt of an individual CDP.

Comparing the sloc for the Echidna and CVL tests we see that the Echidna test is 65 SLOC, and the CVL test is 35 SLOC (54% less than the Echidna test). Looking at the runtimes we see a larger disparity in runtimes this time with the Echidna test lasting 3h 53m and the CVL test lasting only 2m 34s (1% the time of the Echidna test).

Conclusion

In conclusion, we can see that for the properties tested the CVL tests contain less SLOC for 2/3 of them, and similarly for 2/3 of the properties tested the CVL tests have shorter run times. We could take this as an indication that in this case (only for the tested properties) CVL is more efficient and effective at testing the system's properties as it tests specific inputs like the Echidna fuzzer and all possible cases using symbolic execution.

Nevertheless, we must be cautious in making such conclusions as before we can state that our system has been formally verified, we must be sure that all statements we are making in our tests are in fact true and aren't vacuous (don't actually tell us anything about the system behavior).

In other words, true formal verification has a much higher barrier to pass in order to guarantee complete correctness whereas fuzzing (while individual tests may take longer to run) can give us a quicker but less sure confirmation that our code behaves as expected as the creation of the test suite is often faster than a formal verification test suite with less need to provide summaries and other assumptions to avoid path explosion.