How Does Base64 Decoding Work?

Base64 is an ASCII representation of binary data often used to pass raw binary data in environments that don’t support it, such as SMTP. I covered Base64 encoding earlier and wrote my own C++ implementation of a Base64 encoder in a post, but this article will explain how Base64 decoding works.

Following the Base64 Decoding Process

Let’s decode SGVsbG8gV29ybGQ=.

We start by splitting it up by four-letter groups.

  • SGVs
  • bG8g
  • V29y
  • bGQ=

The Base64 standard alphabet can be used to convert every letter back into a number. For example, using the first quartet (S, G, V, and s) and dereferencing it against that alphabet gives you 18, 6, 21, and 44.

Then, you take these numbers and convert them to binary which gives you 010010, 000110, 010101, and 101100. Concentrating these binary numbers together (in other words, just combining them) gives us 010010000110010101101100.

Do the same for all groups and add the binary strings together. This will leave us with 0100100001100101011011000110110001101111001000000101011101101111011100100110110001100100.

We are now left with a long string of binary data. Converting that to ASCII gives us the text Hello World.

If the final quartet has padding at the end, the number of padding characters tells you how many bytes in that last quartet contain actual data and not stuff added to pad the encoded string.

  • One byte of padding (=): output only 2 bytes from the final quartet – the rest is padding
  • Two bytes (==): output just one byte
  • None: output all three bytes

Writing a C++ Base64 Decoder

Let’s begin by including iostream for writing to the console.

C++
#include <iostream>

Then for reading and writing files, we can use fstream.

C++
#include <fstream>

After that, we can use cstdint to give us better control over the numbers we store.

C++
#include <cstdint>

For keeping the Base64 input, we can use a string.

C++
#include <string>

Now, let’s define our main function.

C++
int main() {
  return 0;
}

Within our main function, we can hardcode a decoding table for the Base64 standard alphabet.

C++
// constexpr will tell the compiler to include the variable directly in the executable instad of storing it at runtime, which will make the program faster
// static will tell the compiler to keep space allocated for the variable throughout the entire program's runtime, which will also make things faster
static constexpr unsigned char padChar = '=';

// C++ can easily convert a single letter into a number, so for every numerical representation of a number (according to the ASCII specification), the decode table will have its index in the Base64 alphabet
// Here, 0xFF is used to denote a charcater not part of the Base64 alphabet
static constexpr uint8_t decodeTable[256] = {
  // 0x00–0x0F
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  // 0x10–0x1F
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  // 0x20–0x2F   ' ' … '/'
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  0xFF,0xFF,0xFF,62  ,0xFF,0xFF,0xFF,63  , // '+'=62 '/'=63
  // 0x30–0x3F   '0' … '?'
  52  ,53  ,54  ,55  ,56  ,57  ,58  ,59  ,
  60  ,61  ,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  // 0x40–0x4F   '@' … 'O'
  0xFF,0   ,1   ,2   ,3   ,4   ,5   ,6   ,
  7   ,8   ,9   ,10  ,11  ,12  ,13  ,14  ,
  // 0x50–0x5F   'P' … '_'
  15  ,16  ,17  ,18  ,19  ,20  ,21  ,22  ,
  23  ,24  ,25  ,0xFF,0xFF,0xFF,0xFF,0xFF,
  // 0x60–0x6F   '`' … 'o'
  0xFF,26  ,27  ,28  ,29  ,30  ,31  ,32  ,
  33  ,34  ,35  ,36  ,37  ,38  ,39  ,40  ,
  // 0x70–0x7F   'p' … DEL
  41  ,42  ,43  ,44  ,45  ,46  ,47  ,48  ,
  49  ,50  ,51  ,0xFF,0xFF,0xFF,0xFF,0xFF,
  // 0x80–0x8F
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  // 0x90–0x9F
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  // 0xA0–0xAF
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  // 0xB0–0xBF
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  // 0xC0–0xCF
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  // 0xD0–0xDF
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  // 0xE0–0xEF
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  // 0xF0–0xFF
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
  0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF
};

Then, we can prompt the user.

C++
std::string inputPath, outputPath;
std::cout << "Enter input file path (Base64 text)\n>";
std::getline(std::cin, inputPath);
std::cout << "Enter output file path (binary)\n>";
std::getline(std::cin, outputPath);

After that, we can open the input and output files.

C++
std::ifstream in(inputPath, std::ios::in);
if (!in.is_open()) {
  std::cerr << "Failed to open input file\n";
  return 1;
}
std::ofstream out(outputPath, std::ios::binary);
if (!out.is_open()) {
  std::cerr << "Failed to create output file\n";
  return 1;
}

Next, we can make variables to keep track of the current quartet we are processing and the current index.

C++
char quartet[4];
size_t qIndex = 0;

Next, we can enter a loop that will process our quartets.

C++
while (true) {
	
}

Within our while loop, we need to keep track of the current character.

C++
char c;
if (!in.get(c)) break; // in.get(c) will place the next character (reading the file from left to right) into the variable c. If it returns false, we have reached the end of the file and need to break out of our while loop

We then should see if the current character is whitespace (a new line, a space, or a tab) and skip the character’s processing if it is.

C++
// The continue keyword will skip all further instructions in the loop and immediately move to the next iteration (in this case, we are immediately moving to the next character in the file)
if (c == '\n' || c == '\r' || c == ' ' || c == '\t') continue;

We also need to update the quartet variable with the current character.

C++
// Using qIndex++ as an expression like this increments qIndex by 1 and returns the updated value
quartet[qIndex++] = c;

If qIndex is four, we have reached the end of the current quartet and need to process it.

C++
if (qIndex == 4) {

}

Within this if statement, we can declare a variable to collect four Base64 six-bit chunks before processing them (since Base64 works on groups of six).

C++
uint8_t v[4];

We can also keep track of the padding we have encountered so far.

C++
int padCount = 0;

Then we can loop over v and keep track of it against the current quartet.

C++
for (int i = 0; i < 4; i++) {

}

Within this loop, we can check if the current character is padding.

C++
if (quartet[i] == padChar) {

} else {

}

Within the first branch of this if statement (the code to be executed if the current character is padding), we can set v at the current index to zero because we will not need to use it later and setting it to zero would be better than undefined garbage data.

C++
v[i] = 0;

We can then increment padCount so the next iteration of the for loop can use it.

C++
padCount++;

In the second branch, we can add the Base64 alphabet index to v according to the decoding table.

C++
uint8_t val = decodeTable[(unsigned char)quartet[i]];
if (val == 0xFF) { std::cerr << "Invalid Base64 char\n"; return 1; } // Earlier, we used 0xFF for characters not part of the Base64 alphabet
v[i] = val;

Outside of this if statement and after the for loop, we can take the entire quartet and combine it.

C++
uint32_t triple = (v[0] << 18) | (v[1] << 12) | (v[2] << 6) | v[3]; // We basically concentrate the groups together with the classic binary shift and OR gate method

The padding chooses how many groups are actual binary data and not inserted for correction, so we can extract data from the full triple based on that.

C++
// The reverse binary shift and the AND logic gate is the exact opposite of what we did before. We essentailly "peel apart" the groups based on the number of padding characters.
if (padCount < 3) {
    char b1 = (triple >> 16) & 0xFF; // Shifts the top 8 bits (bits 23-16) down into the lowest byte position. Then, we mask with 0xFF so only those 8 bits remain. This gives you the first decoded byte.
    out.put(b1);
    std::cout.put(b1);
}
if (padCount < 2) {
    char b2 = (triple >> 8) & 0xFF; // Shifts bits 15..8 down. Then masks with 0xFF. This is the second decoded byte.
    out.put(b2);
    std::cout.put(b2);
}
if (padCount < 1) {
    char b3 = triple & 0xFF; // Takes bits 7..0 directly. No mask is needed here because we have reached the end of the triple. That gives the third decoded byte.
    out.put(b3);
    std::cout.put(b3);
}

Then we can reset qIndex.

C++
qIndex = 0;

After the while loop (so we are back in top-level scope now), we can close the input and output files and exit with code zero (for “success”).

C++
in.close();
out.close();
return 0;

The Result

This is what your final code should look like.

C++
#include <iostream>
#include <fstream>
#include <cstdint>
#include <string>

int main() {
    static constexpr uint8_t decodeTable[256] = {
      // 0x00–0x0F
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      // 0x10–0x1F
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      // 0x20–0x2F   ' ' … '/'
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      0xFF,0xFF,0xFF,62  ,0xFF,0xFF,0xFF,63  , // '+'=62 '/'=63
      // 0x30–0x3F   '0' … '?'
      52  ,53  ,54  ,55  ,56  ,57  ,58  ,59  ,
      60  ,61  ,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      // 0x40–0x4F   '@' … 'O'
      0xFF,0   ,1   ,2   ,3   ,4   ,5   ,6   ,
      7   ,8   ,9   ,10  ,11  ,12  ,13  ,14  ,
      // 0x50–0x5F   'P' … '_'
      15  ,16  ,17  ,18  ,19  ,20  ,21  ,22  ,
      23  ,24  ,25  ,0xFF,0xFF,0xFF,0xFF,0xFF,
      // 0x60–0x6F   '`' … 'o'
      0xFF,26  ,27  ,28  ,29  ,30  ,31  ,32  ,
      33  ,34  ,35  ,36  ,37  ,38  ,39  ,40  ,
      // 0x70–0x7F   'p' … DEL
      41  ,42  ,43  ,44  ,45  ,46  ,47  ,48  ,
      49  ,50  ,51  ,0xFF,0xFF,0xFF,0xFF,0xFF,
      // 0x80–0x8F
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      // 0x90–0x9F
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      // 0xA0–0xAF
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      // 0xB0–0xBF
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      // 0xC0–0xCF
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      // 0xD0–0xDF
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      // 0xE0–0xEF
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      // 0xF0–0xFF
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
      0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF
    };

    std::string inputPath, outputPath;
    std::cout << "Enter input file path (Base64 text)\n>";
    std::getline(std::cin, inputPath);
    std::cout << "Enter output file path (binary)\n>";
    std::getline(std::cin, outputPath);

    std::ifstream in(inputPath, std::ios::in);
    if (!in.is_open()) {
        std::cerr << "Failed to open input file\n";
        return 1;
    }
    std::ofstream out(outputPath, std::ios::binary);
    if (!out.is_open()) {
        std::cerr << "Failed to create output file\n";
        return 1;
    }

    char quartet[4];
    size_t qIndex = 0;
    while (true) {
        char c;
        if (!in.get(c)) break;

        if (c == '\n' || c == '\r' || c == ' ' || c == '\t') continue;

        quartet[qIndex++] = c;
        if (qIndex == 4) {
            uint8_t v[4];
            int padCount = 0;
            for (int i = 0; i < 4; i++) {
                if (quartet[i] == '=') {
                    v[i] = 0;
                    padCount++;
                } else {
                    uint8_t val = decodeTable[(unsigned char)quartet[i]];
                    if (val == 0xFF) { std::cerr << "Invalid Base64 char\n"; return 1; }
                    v[i] = val;
                }
            }

            uint32_t triple = (v[0] << 18) | (v[1] << 12) | (v[2] << 6) | v[3];

            if (padCount < 3) {
                char b1 = (triple >> 16) & 0xFF;
                out.put(b1);
                std::cout.put(b1);
            }
            if (padCount < 2) {
                char b2 = (triple >> 8) & 0xFF;
                out.put(b2);
                std::cout.put(b2);
            }
            if (padCount < 1) {
                char b3 = triple & 0xFF;
                out.put(b3);
                std::cout.put(b3);
            }
            qIndex = 0;
        }
    }

    in.close();
    out.close();
    return 0;
}