This post contains two parts:
- First we’ll start with a recap of a few key concepts about Bitcoin.
- Then we’ll delve into Bitcoin scripts and some of its quirks
Part 1: Key Concepts
UTXO
Bitcoin’s accounting system is UTXO (Unspent Transaction Outputs) based which means that a Bitcoin transaction technically doesn’t move coins from one address to another (which is the case for other cryptocurrencies like Ethereum), but rather a Bitcoin transaction spends from existing UTXOs (ie the transaction inputs) and generate new UTXOs (ie the transaction outputs).
A UTXO contains a locking script and can only be spent with a corresponding unlocking script. Think of these two scripts as the digital equivalent of a padlock and its key. A UTXO can only be spent once, and is discarded afterwards.
The shared state of all the available UTXOs is maintained by the Bitcoin blockchain and is called the UTXO set.
Transaction
Each transaction output TxOut contains a scriptPubKey field (the locking script) which gives encoded instructions on how to spend the coins associated with that output.
Each transaction input TxIn references a previous transaction output and contains a scriptSig field (the unlocking script) which contains data necessary to spend the transaction (generally a signature, but it doesn’t have to be)
To verify the validity of a transaction, scriptSig data is appended to scriptPubKey data and is executed by a script engine that will return True or False. Anyone who can provide scriptSig data such that the script evaluates to True, is able to spend the coins associated with that UTXO.
Bitcoin script
Bitcoin Script is the programming language that powers that script engine. It is a simple stack based language that was purposefully designed to be non Turing-complete.
It consists of a small (less than 255) set of asymmetric cryptography specific op codes:
- Arithmetic operations: OP_ADD, OP_SUB, OP_MUL, …
- Cryptographic operations: OP_SHA256, OP_CHECKSIG, …
- Bitwise operations: OP_AND, OP_OR, …
- And more
Some scripts are so widely used that they have a dedicated name. For example, the most common script is called Pay To Public Key Hash (P2PKH) and looks like this:
ScriptPubKey=OP_DUP OP_HASH160 <Public KeyHash> OP_EQUAL OP_CHECKSIG
ScriptSig=<Signature> <Public Key>
This script evaluates to True if <Public key> in the unlocking script matches the hash in the ScriptPubKey locking script and if <Signature> is valid for that <Public key>
Part 2: Bitcoin Script Quirks
Bitcoin Script was designed to be simple to understand and execute, however as the years passed, various soft forks changed the meaning of certain op codes.
In the rest of this post we will review 5 of theses changes. A basic understanding of the common op codes will help.
1. OP_CHECKMULTISIG
As you may have guessed, this op code is used for multisig! For example, the ScriptPubKey locking script for specifying a 2 out of 3 multi signature spend condition is:
ScriptPubKey=OP_2 <pk1> <pk2> <pk3> OP_3 OP_CHECKMULTISIG
The OP_CHECKMULTISIG code will need 2 signatures from any of the 3 specified public keys in order to return True. Therefore we might expect this ScriptSig unlocking script to work…
ScriptSig=<sig1> <sig2>
… but actually it would fail because of a bug in the OP_CHECKMULTISIG implementation which results in an extra value being popped off the stack when evaluating. Therefore you need to add an extra OP_0 byte in front of your signatures, like so:
ScriptSig=OP_0 <sig1> <sig2>
Without the OP_0, the evaluation fails. This bug has been known for a long time but was never important enough to be fixed because it would require a non backward compatible change in the consensus code (hence a hard fork). Moreover the impact is pretty small since it is just an extra 1 byte being wasted, therefore this quirk is probably something developers will have to deal with forever
2. OP_RETURN
This op code stops the execution of the script and returns False no matter what data is on the stack, making the output unspendable forever.
ScriptPubKey=OP_RETURN <arbitrary data>
However historically it was not always the case. In the first implementation of Bitcoin, OP_RETURN always returned True making all the scripts using it vulnerable to being spent by anyone, without any signatures. That behavior was quickly patched by Satoshi Nakamoto
Nowadays, OP_RETURN is widely used as the op code of choice to write arbitrary data on the blockchain. For example, ION (Bitcoin implementation of the Sidetree protocol) uses it to anchor IPFS hashes of JSON data.
OP_RETURN is also used in Proof of Burn protocols.
Since Segwit, OP_RETURN is also used to store the merkle root of the witness data in an output of the coinbase transaction for any blocks containing Segwit transactions.
3. Pay to Script Hash (P2SH)
Earlier we mentioned P2PKH as the most common script used in Bitcoin. P2SH is another popular script type that looks like this:
ScriptPubKey=OP_HASH160 <Script Hash> OP_EQUAL
ScriptSig=<Redeem Script>
Pretty simple, looks like the script engine will hash <Redeem Script> data, and will return True if it equals <Script Hash>, right?
Actually there is a special hidden condition that was added via soft fork in 2012 (block 173805): After the script engine has checked that the hashes match, it will interpret <Redeem script> data as a script and recursively execute it. Therefore even if the hashes match, the <Redeem script> itself must be a valid script that evaluates to True.
This special condition only triggers if the ScriptPubKey locking script has these exact op codes. So if you want to craft a locking script that just requires a pre-image without executing it as a script you may add a OP_0 no-op at the beginning of the script to avoid triggering the P2SH special condition.
4. Segwit
Segwit was a controversial soft fork that activated in 2017 (block 481824). To understand the problem it solves and how it works, checkout this excellent MIT Opencourseware lecture. In this post we’ll just focus on how it changed Bitcoin Script. This is how a Segwit script looks like:
ScriptPubKey=OP_0 <hash>
ScriptSig=EMPTY
There are two weird things about this script: 1) it doesn’t contain any conditional op codes, so it seems like it would always return True. 2) the ScriptSig is empty, it seems like no data is required to spend that transaction.
But actually, similarly to P2SH, there is a special hidden condition that was introduced via soft fork:
- If ScriptPubKey is of the format “OP_0 followed by 20 bytes of data”, interpret the script as if it were a P2PKH script and look for ScriptSig in another field of the transaction called Witness (Segregated Witness)
- If ScriptPubKey is of the format “OP_0 followed by 32 bytes of data”, interpret the script as if it were a P2SH script and look for ScriptSig in another field of the transaction called Witness (Segregated Witness)
This special condition only triggers if the script starts with OP_0. Such scripts are called “Version 0 Segwit script”.
5. Taproot
The Segwit change also reserved OP_1, OP_2, …, OP_16 op codes and gave them similar special conditions for future upgrades. Taproot is the first soft fork upgrade happening since Segwit. It is / was meant to activate in November 2021 (block 709632)
A Pay-to-Taproot script is a “Version 1 Segwit script”, meaning that it looks like:
ScriptPubKey=OP_1 <hash>
ScriptSig=EMPTY
Similarly to Segwit, it redefines the ScriptPubKey locking script and tells the script engine to look in the Witness field for ScriptSig unlocking data. More details about Taproot can be found in this MIT Opencourseware lecture
Final words
Because of a preference for backward compatible changes in Bitcoin, special conditions to the execution of certain scripts are being added.
On the one hand, it means that we have a way to upgrade Bitcoin, and make it more secure, scalable and usable over time without needing a hard fork which is great! On the other hand, it also means that Bitcoin Script is getting harder and more confusing for new developers. Not being aware of all these special conditions can cause a custom locking script to be unlocked in unwanted situations, resulting in potential hacks and loss of funds.
P.S: This list is not exhaustive, if I missed something please let me know in the comment and I will add it.