main page

Hash functions

Definition
A hash function is any function that can be used to map data (of arbitrary size) to data of fixed size.
For example, if we take the SHA256 hash function and apply it on the word Bitcoin we get this result:
SHA256("Bitcoin") = b4056df6691f8dc72e56302ddad345d65fead3ead9299609a826e2344eb63aa4
The values returned by a hash function are called hash values, hash codes, digests or simply hashes.
A cryptographic hash is like a fingerprint of a data, i.e., its unique representation.
An important feature of a hash function is that for an input p it will always return the digest d.
If we add the character ! at the end of p and hash it we'll get a different digest:
SHA256("Bitcoin is the future") = fffb99461cb08e7d6734737a003f38ea0663bf4536a33fdd533dd223a91c7b17
SHA256("Bitcoin is the future!") = 824f1d92a415d4c2bb1ea8442653fd73446cd8565e364eca0f883d26dccbdcae
			
(Some) Features
Determinism
Must always generate the same digest d for the single input p
Fixed size
It is often desirable that the hash function generates a fixed size output.
So it doesn't matter if the input is Bitcoin or the entire code of it, the digest must have a fixed size.
Therefore, the digest is a smaller representation of the input.
Quick computation of h(p)
We should be able to obtain d by hashing p quickly.
Non-invertible
Not possible to simply take the digest d and obtain h(p).
The only way is to try and guess p by hashing it and verifying if the result is the expected d.
			
Ok, but why hash functions?
A very common use of hash functions is integrity verification.
Since any alteration in a data results in a complete different hash, we can verify its integrity with its digest.
Hash-based verification ensures that a data was not changed by comparing the data's hash value to a previously calculated value.
If these values match, the data is presumed to be unmodified.

The Blockchain technology uses hash functions for many purposes. Here are some examples:

1) BLOCKCHAIN CONSENSUS
An important one is to verify (with the digest) that some input is actually the one we are looking for.
As explained here, hash functions are used for integrity verification.
The network nodes have their own blockchain copy and they all agree on which is the correct version (i.e. the one present on the majority of the nodes).
If someone creates some different version of the blockchain, for example, adding to himself/herself 10 Bitcoin,
the hashes of this block and the blocks after that will be different from the hashes of the copies of the blockchain that the other nodes have.
So, by consensus, the impostor's blockchain version won't be accepted.

2) PROOF-OF-WORK
As explained in more details here, PoW is the work demanded to win the miners race.
It is something hard to produce but easy to verify.
For a new block to be added to the blockchain a certain computational problem must be solved.
It consists on finding p on the expression h(p) = d. The value of d is known beforehand.
Once it's not possible to obtain h(p) from d, a lot of trial and error is required to find the correct p.
Therefore, PoW is defined as the proof that a miner indeed made an effort to satisfy a certain condition.

3) HASHING TRANSACTIONS OF A BLOCK: THE MERKLE TREE ROOT
The Merkle Tree is explained is more details here.
This is another use of hash functions for integrity verification.
In a nutshell, successive hash functions are applied on the block transactions resulting in a unique and final digest.
This final digest is like a signature for the block transactions and assures its integrity.
			
Java SHA256 hash code: here
JavaScript SHA256 hash code: here

Hash functions (Wikipedia): here
Hashcash (Bitcoinwiki): here
Block hashing algorithm (Bitcoinwiki): here
SHA256 calculator: here
SHA-256 Cryptographic Hash Algorithm: here