Contracts in Bitcoin

What is a contract? A meeting of minds where two or more parties come together with shared intentions to form a binding contract. The general concept is to limit my freedom in the future to expand my freedom in the present and is a fundamental building block of the economy.

As the internet developed, people were thinking about how to implement contracts; implementing contract law digitally does not seem to be viable. Nick Szabo articulated his solution:

New institutions, and new ways to formalize the relationships that make up these institutions, are now made possible by the digital revolution. I call these new contracts "smart", because they are far more functional than their inanimate paper-based ancestors. No use of artificial intelligence is implied. A smart contract is a set of promises, specified in digital form, including protocols within which the parties perform on these promises.

— Nick Szabo
Smart Contracts: Building Blocks for Digital Markets (1996)

Smart contracts predate Bitcoin by over a decade.

Objectives of contract design

Observability

Participants should be able to observe each other’s performance in that contract.

Verifiability

Participants can prove to an arbitrator that a contract has been executed faithfully or been breached.

Privacy

Knowledge and control of the contents of the contract should be distributed between parties only as much is necessary to execute and enfore the contract

Enforcability

We’d like to minimise the need to appeal to the legislature for enforcement

Script in Bitcoin

There’s no mention of the word "script" or "contract" in the Bitcoin Whitepaper, but we know we can script on Bitcoin…​

We define an electronic coin as a chain of digital signatures. Each owner transfers the coin to the next by digitally signing a hash of the previous transaction and the public key of the next owner and adding these to the end of the coin. A payee can verify the signatures to verify the chain of ownership.

— Satoshi Nakamoto
Bitcoin Whitepaper (2008) Chapter 2. Transactions

Taking a look at the first version of Bitcoin, 0.1, we see the function EvalScript:

//
// Script is a stack machine (like Forth) that evaluates a predicate
// returning a bool indicating valid or not.  There are no loops.
//
#define stacktop(i)  (stack.at(stack.size()+(i)))
#define altstacktop(i)  (altstack.at(altstack.size()+(i)))

bool EvalScript(const CScript& script, const CTransaction& txTo, unsigned int nIn, int nHashType,
                vector<vector<unsigned char> >* pvStackRet)
{

As Script was not mentioned in the Whitepaper, it seems possible that the scripting language was added on later on in the development process. Here’s what Satoshi said about script on the bitcointalk.org forum in 2010:

The nature of Bitcoin is such that once version 0.1 was released, the core design was set in stone for the rest of its lifetime. Because of that, I wanted to design it to support every possible transaction type I could think of. The problem was, each thing required special support code and data fields whether it was used or not, and only covered one special case at a time. It would have been an explosion of special cases. The solution was script, which generalizes the problem so transacting parties can describe their transaction as a predicate that the node network evaluates. The nodes only need to understand the transaction to the extent of evaluating whether the sender’s conditions are met.

The script is actually a predicate. It’s just an equation that evaluates to true or false. Predicate is a long and unfamiliar word so I called it script.

— Satoshi Nakamoto
bitcointalk.org

So script was the elegant solution to adding many special cases of transactions like, escrow, bonds, 3rd party arbitration, multi-signature etc. It makes Bitcoin incredibly flexible with only a small number of primitives.

Contracts on a Public Blockchain

Are contracts on public blockchains a good fit for the Objectives of contract design as Nick Szabo envisioned them?

The Good News

Observability

Good — participants can see all parts in a contract and see how the other party has acted.

Verifiability

Good — on a public blockchain ane everyone can see it.

Enforcability

Pretty good — contract is executed and rules are enforced by all nodes.

The Bad News

Enforcability

OK — Blocks only get confirmed every 10 minutes and finality is fuzzy.

Privacy

Terrible (pre-Taproot) — contracts might include trade secrets, medical data etc…​ We don’t want the public to see this embedded in the Blockchain forever.

Expressiveness

Difficult — adding new primitives, e.g. Timelocks or Sequencelocks (both now added) is hard and very slow.

Fungibility

Potentially troubling — if everyone can see all the details of that contract, the outputs of that contract are no longer fungible.

Scalability

Poor — putting everything on the blockchain is not scalable.

What parts of a contract should be on a blockchain

Contracts executed by explicity published code are really only using the blockchain for one thing — to get an immutable ordering of what order the transactions happen in. All that they really care about is that their transaction is not reversed and not double spent.

All of the extra details of the contract execution can be done by things that are not blockchains.

— Andrew Poelstra
Using chains for what they're good for (2017)

Post’s Theorem

  • Turing-complete languages define Computably Enumerable predicates

  • In the 1930’s Emil Post defined an Arithmetic Hierachy

    • \$Delta0\$ predicates have no unbound quantifiers (e.g. \$AAx < z EEy < s.t. x + y = z\$) + The unbound quantifiers are the "For all \$x\$" or "Each \$y\$" and these Delta 0 predicates have none.

    • \$Sigma1\$ predicates are those which have an unbound quantifier (e.g. \$EEx in N s.t. AAy > z, x < y\$)

      All of the quantifiers are unbound.

      The unbound quantifiers have an infinite search space (Sigma 1). This is why turing-complete languages can express non-halting programs.

  • Post’s Theorem states that computably-enumerable predicates are identical to \$Sigma1\$ predicates.

  • Validating a \$Sigma1\$ predicate can always be reduced to validating a \$Delta0\$ predicate with a witness.

  • Evaluating a \$Delta0\$ predicate always terminates.

With \$Sigma1\$ thinking I’m going to write a program and everyone on the blockchain is going to execute this program. Everyone will do the same thing because they are running it in the same environment.

With \$Delta0\$ thinking I’m going to run the program myself on my computer and I am going to generate some witness data. I am going to have everyone only validate that witness data instead of running the entire program.

…​It’s a change of attitude and I think it will influence how you design blockchain languages when you think of the problem this way.

— Russel O'Connor
Post’s Theorem and Blockchain Languages: A Short Course in the Theory of Computation (2017)

By using a turing-complete language you are effectively saying "Here’s my proposed state transtion, please everyone do this unbounded search", which could take forever! This is why Ethereum has a gas limit. Whereas in Bitcoin we don’t have a turing-complete language — we use \$Delta0\$ thinking — and those programs are always bound; we provide a witness to the bound program instead.

Therefore we can see that there is a tradeoff between computation and verification:

Privacy

If everyone has to compute everything, the contract is less private. If we just give people a witness, the contract is more private. The limit of this is a zero-knowledge proof where you know nothing about the contract.

Scalability

Having everyone compute everything is less scalable than simply providing a witness and having nodes verify it.

Fungibility

It’s less fungible if everyone has to compute.

Observability

Publically observable if everyone has to compute vs privately observable is we only have them verify it.

Verifiability

As above with observability.

Is this mental model similar to conventional programming? No. But smart contracts aren’t conventional programming, and blockchain isn’t a conventional computing environment (how often does history change out from under most of your programs?).

These design elements make for a clean, simple system with predictable interactions. To the extent that they make some things "harder," they do so mostly by exposing their latent complexity that might otherwise be ignored."

— Greg Maxwell
reddit.com

The History of Script in Bitcoin

How has script changed over the last 11 years?

Originally you’d take the scriptSig (SS) from the input and the scriptPubKey (SPK) from the output being spent, concatenate them and execute them as they were all written in the same script language:

script original
Figure 1. How bitcoin transactions were linked in v0.1

This was bugged in a way which anyone could spend anyone else’s money. This was fixed in 2010 by separating the execution of the SS and SPK:

script fix 2010
Figure 2. Script execution bug fixed

The scriptSig although called a 'script", is essentially just pushing some data onto the stack to be used by the SPK evaluation later.

How scriptSig and scriptPubKeys are processed
vector<vector<unsigned char> > stack;
if (!EvalScript(stack, scriptSig, txTo, nIn, nHashType))
    return false;
if (!EvalScript(stack, scriptPubkey, txTo, nIn, nHashType))
    return false;
if (stack.empty())
    return false;
return CastToBool(stack.back());

Pay to Script Hash (P2SH)

In 2012 another change was made which changed the form of the SPK, so that the SPK now contains only a hash commitment to the spending conditions (script), rather than the full set of conditions themselves.

script p2sh 2012
Figure 3. 2012 - Pay To Script Hash

This new format means that executing and validating the script is now a two-stage process:

  1. Check that the spending script provided by the spender matches the SPK commitment

  2. Check that the input data satisfies the spending script

There are a lot of advantages to this model:

  • Outputs are now a fixed size: 20 bytes for P2SH and 32 bytes for P2WSH.

    • It allows us to have a fixed address length.

    • It removes the requirement that the sender needs to know what the future spending conditions are. This means that you won’t know if you’re sending to someone else’s address which is encumbered by a 2-of-3 multi-sig — that’s none of your business!

The logic for processing P2SH addresses however gets slightly more complex:

{
    vector<vector<unsigned char> > stack, stackCopy;
    if (!EvalScript(stack, scriptSig, txTo, nIn, nHashType))
        return false;
    if (fValidatePayToScriptHash)
        stackCopy = stack;
    if (!EvalScript(stack, scriptPubKey, txTo, nIn, nHashType))
        return false;
    if (stack.empty())
        return false;

    if (CastToBool(stack.back()) == false)
        return false;

    // Additional validation for spend-to-script-hash transactions:
    if (fValidatePayToScriptHash && scriptPubKey.IsPayToScriptHash())
    {
        if (!scriptSig.IsPushOnly()) // scriptSig must be literals-only
            return false;            // or validation fails

        const valtype& pubKeySerialized = stackCopy.back();
        CScript pubKey2(pubKeySerialized.begin(), pubKeySerialized.end());
        popstack(stackCopy);

        if (!EvalScript(stackCopy, pubKey2, txTo, nIn, nHashType))
            return false;
        if (stackCopy.empty())
            return false;
        return CastToBool(stackCopy.back());
    }

    return true;
}

The top half is similar to what we had before: checking that the commitment is a valid commitment for the script that is being provided by the spender. The if statement comprising the bottom half is doing the script evaluation with the data on the stack, and making sure the data satisfies the script.

Segregated Witness

script segwit
Figure 4. 2016 - Segregated Witness

Like P2SH we have a commitment to the spending conditions in the output, but now we move the spending script and satisfactions to a seperate field called the "witness". This means that the witness does not need to be part of the txid which fixes some forms of txid malleation.

The Future of Contracts in Bitcoin

How are things going to change in Bitcoin?

Schnorr Signatures / Taproot

We hope that we will see Schnorr signatures soon introduced via a soft fork. These signatures have the property that we can add up public keys and add up signatures, and so with a Schnorr signature we can have a spending condition like 2-of-2, and by the time this reaches the blockchain it simply looks like a single public key and a single signature — no-one else knows that there were ever any special spending conditions associated with this output.

This is very beneficial for privacy and scalability; nobody knows about spending conditions and there’s only a single signature evaluation.

Since this presentation was given PR 19953 was merged into Bitcoin Core, which implemented schnorr signature verification (BIP340), taproot verification (BIP341), and tapscript verification (BIP342). These rules are currently in Speedy Trial signalling period as of writing.

MAST

MAST is also mentioned, but parts of this have been rolled into the finalised Taproot spec (the script trees), and parts left on the backburner for a while longer. Specifically, Cross Input Signature Aggregation (CISA) has been put on hold, whilst focus on bolstering privacy and efficiency of more complex transactions[1] with script trees took priority.

Adaptor Signatures

Adaptor signatures are enabled by Schnorr signatures, which allow complex contracts, e.g. atomic swaps, where complex conditionals can be encoded within the signature itself, which means that the script dissapears from the blockchain, only leaving behind a single public key and a single signature.