Some things you might have heard being thrown around in the cryptography space are encoding methods. Such encoding methods are normally used to turn binary data into text, typically so it can be used to transmit data over communication methods that do not allow raw binary data to be transmitted.
How does binary turn into Base64?
First, let’s say that we want to encode an image. We start by reading all the bits in the image, which, for this example, will result in a binary stream of 010010101101001000111111
. It is important to note that most images are a lot larger than this and will contain hundreds of thousands of bits.
The encoder will first break the binary stream up into their own chunks, each containing six bits. We can do this to our example and get 010010 101101 001000 111111
.
Then, we turn our binary chunks into decimals.
To do this, working right to left, we multiply each bit by 2 to the power of p, where p starts from zero and increments by one for every bit in the chunk.
For example, we can calculate 010010
by performing the expression (0 * 20) + (1 * 21) + (0 * 22) + (0 * 23) + (1 * 24) + (0 * 25)
. We can strip away any parts that multiply by zero, since those will always be zero. The simplified equation is (1 * 21) + (1 * 24) = 2 + 16 = 18
. Therefore, 010010
is equivalent to the decimal 18
.
Turning every single six-bit binary chunk into a decimal using the process shown above constitutes the numbers (in left-to-right order) 18, 45, 8, and 63.
After that, we can simply cross reference the numbers with whatever alphabet we should use. For example, the six-bit chunk 010010
turns into the decimal 18, which corresponds to S
on the Base64 standard alphabet. Therefore, 010010 101101 001000 111111
turns into StI/
.
How does binary turn into Base64?: fixing the padding
If the binary string is a multiple of 3, everything fits perfectly. However, if it is not, then we need to add padding to the encoded string until the string’s bit count becomes a multiple of 3.
Let’s take 010010 101101 001000 111111 01
as an example. It is 26 bits long and all full six-bit chunks encode to StI/
. However, this is leaving out the two bits at the end.
To pad this, we will keep adding zeroes to the end of the string until it can become a full chunk. Thus, the final chunk, 01
, becomes 010000
, which corresponds to Q
in the Base64 alphabet. Then, we add one padding character (typically an =
) for every two bits we had to add for the string to only be made of full chunks. We add two padding characters because four (the number of bits we had to add) divided by two is two.
Thus, you would think that the padded string would be StI/Q==
, right? Well, no, because that extra character adds another six bits, meaning that there are still two bits at the end. We can only stay in such a loop two times, so we need to add more zeroes. That means that the final letter is 000000
, which is A
.
So, the final result is StI/QA==
.
How does Base64 turn into binary?
Base64 decoding is just like base64 encoding, but in reverse. First, we turn every single non-padding character into its respective index on the alphabet, then we turn every single number into its binary equivalent.
For example, let’s take our string StI/QA==
. We first take every single non-padding character and turn it into a string of numbers that represents their indexes on the alphabet.
This turns our Base64-encoded string into a string of numbers, specifically 18 45 37 62 16 0 = =
, since we don’t turn the padding into a number
The Standard Base64 Alphabet
0: A | 7: H | 14: O | 21: V | 28: c | 35: j | 42: q | 49: x | 56: 5 |
1: B | 8: I | 15: P | 22: W | 29: d | 36: k | 43: r | 50: y | 57: 6 |
2: C | 9: J | 16: Q | 23: X | 30: e | 37: l | 44: s | 51: z | 58: 7 |
3: D | 10: K | 17: R | 24: Y | 31: f | 38: m | 45: t | 52: 1 | 59: 8 |
4: E | 11: L | 18: S | 25: Z | 32: g | 39: n | 46: u | 53: 2 | 60: 9 |
5: F | 12: M | 19: T | 26: a | 33: h | 40: o | 47: v | 54: 3 | 61: + |
6: G | 13: N | 20: U | 27: b | 34: i | 41: p | 48: w | 55: 4 | 62: / |