This is a follow-up to an earlier post where I explained how Base64 encoding and decoding works. If you have not read that yet, I would recommend you do as it provides helpful background information.
C++ Implementation of Base64 Encoding
Now that we know how Base64 encoding works, we can write our own encoder using C++.
I would like to be able to use files while I am encoding it, so this encoder will store the entire file in a vector, and the decode it from there. This is not really RAM-efficient, however, but I won’t be encoding large files.
First, let’s import our standard IO.
#include <iostream>Since we will be reading from the file to be encoded and writing to the file that needs to store all of the Base64 as text, we need to import an inbuilt library used for file operations.
#include <fstream>Since we will be working with bits, we should also be using bit sets in our program. The type needs to be imported from an inbuilt library, however, so let’s import it.
#include <bitset>Then, since we will be storing the data in a vector, we will need to import that type.
#include <vector>I want to work with strings, not C-style arrays of characters, so let’s import that.
#include <string>Because we will be using typedefs (type aliases) like uint8_t that will give us more granular control over how much space each number we store takes up in RAM, we will also need to include cstdint.
#include <cstdint>Now, we can start with the main function.
int main() {
return 0;
}Within this main function, we should first start by asking the user which file they should read. We could use std::cin, but that considers all whitespace characters terminating, meaning that it will not read anything after a whitespace character. So, if there are spaces in our filename, we won’t be able to read it properly.
Instead, we should use std::getline; something like that will only consider newline characters terminating, which is useful because newlines aren’t allowed in file paths anyways.
Below, we collect the input and output files.
// Standard Base64 alphabet
std::string base64Standard = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; // Values 0-63
const char padChar = "="; // Technically, the padding character is NOT part of the alphabet and so including it above would be misleading
// Variable that holds our input file
std::string filePathInput;
std::cout << "Enter a file path to read from\n>";
// Read the line and place it directly in filePathInput
std::getline(std::cin, filePathInput);
// Variable that holds our output file
std::string filePathOutput;
std::cout << "Enter a file path to write to\n>";
// Read the line and place it directly in filePathOutput
std::getline(std::cin, filePathOutput);After that, we can try opening the file that we need to read from.
std::cout << "\nAttempting to open input file...\n";
// Opening a read-only file stream to whichever path the user specified, in binary mode (since we are reading ones and zeroes, not text)
// To be more specific: On Windows systems, binary mode will prevent the OS from turning \n into \r\n, which will mess up the encoding and result in incorrect output
std::ifstream inputFileStream(filePathInput, std::ios::binary);Then, we will see if our file stream has actually been opened. If it hasn’t, that likely means some error occurred that prevented the file from opening.
if (!inputFileStream.is_open()) {
std::cerr << "Failed to open file!\n";
return 1;
}
std::cout << "File open!\n";In order to maintain speed, we should pre-allocate the file data that will be stored in a vector. To do this, we seek to the end of the file, and then use the tellg function to tell us how far we have seeked from the beginning of the file. Then, we store this a variable.
std::cout << "Detecting file size...\n";
inputFileStream.seekg(0, std::ios::end);
// There is a dedicated type for file stream positions in C++
const std::streampos endPos = inputFileStream.tellg();
if (endPos < 0) { std::cerr << "Failed to get file size\n"; return 1; }
const size_t fileSizeBytes = static_cast<size_t>(endPos); // We must convert std::streampos to the dedicated type I mentioned above, size_t
std::cout << "Detected file size: " << fileSizeBytes << " bytes\n";
inputFileStream.seekg(0, std::ios::beg);Now that the file stream is open, we can load the entire thing into a vector, and then close it as we have no use for it anymore.
// Each character in the vector holds raw bytes of the file
std::vector<uint8_t> fileData(fileSizeBytes);
std::cout << "Reading file...\n";
// read(...) takes char* and std::streamsize.
// In C++, a static cast is a cast (type conversion) where there is native support for the two types to convert from and to each other
// We use one here to convert the result of fileData.size() into std::streamsize
// A reinterpret cast tells the compiler that we know for sure the data coming in can be converted into a certain type (std::streamsize in this case)
// This essentially causes the type conversion to happen during runtime
// This is very unsafe and is not usually recommended
inputFileStream.read(reinterpret_cast<char*>(fileData.data()),
static_cast<std::streamsize>(fileData.size()));
inputFileStream.close();
std::cout << "File data read! It is now safe to modify or delete the file!\n";After that, we can create the output file.
std::cout << "Creating output file...\n";
std::ofstream encodedDataOutputFile(filePathOutput, std::ios::binary);
// Fires if an error occurs during the file creation process
if (!encodedDataOutputFile.is_open()) {
std::cerr<< "Error opening output file\n";
return 1;
}Now, we can process each and every bit to convert it into a Base64 character.
size_t totalBits = fileData.size() * 8; // There are 8 bits in a byte
std::cout << "Processing "
<< totalBits
<< " bits...\n";
size_t bitsProcessed = 0; // Let's keep track of the bits processed here
while (bitsProcessed < totalBits) {
}Within this while loop comes the actual processing of bits.
Let’s start by keeping track of the current position of the byte being processed and the bit being processed, which will be useful for future calculations.
Keeping track of the byte index is simple – you just divide the number of bits processed by 8, since there are 8 bits in a byte.
Keeping track of the bit index required some more thinking for me. We need to use the bit index to determine the number of bits that spill over. Finding the remainder of the number of bits processed should be good enough for this use case.
size_t byteIndex = bitsProcessed / 8;
size_t bitIndex = bitsProcessed % 8; // This tells us how many bits are overflowing into the next byteNext, we will keep track of the current byte and the next byte, since it is possible for six bits to span across two bytes.
uint8_t currentByte = fileData[byteIndex];
uint8_t nextByte = (byteIndex + 1 < fileData.size()) ? fileData[byteIndex + 1] : 0; // Six bits may span across two bytes, so we need to keep track of the next byteNext, we extract six bits to cross-reference it to our alphabet.
To do this, we put two bytes (16 bits), store it in a variable called combined, and extract six bits from it. We do it this way because, like I said before, six bits can span across two bytes.
uint16_t combined = (static_cast<uint16_t>(currentByte) << 8) | nextByte;
uint8_t sixBits = (combined >> (10 - bitIndex)) & 0x3F; // Extracting six bitsThen, we can finally cross-reference the number created from this six-bit value with the alphabet we will be using.
char resultChar = base64Standard[static_cast<int>(sixBits)];Last but not least, we output this character to both the console and the output file the user specified at the beginning and increment the bitsProcessed by 6.
encodedDataOutputFile << resultChar;
std::cout << resultChar;
bitsProcessed += 6; // Advance the loopC++ Implementation: Solving the Padding Issues
Now, we just need to add padding to our C++ program.
Keep in mind that we have now moved on from the while loop, and any future snippets of code will take place in the main function.
Let’s start by computing the amount of padding characters we need using the implementation I showed you at the beginning of this tutorial.
size_t remaining = fileData.size() /*The size of the input file, in bits*/ % 3;
size_t padCount = (3 - remaining) % 3; // yields 0, 1, or 2Then, we add however many padding characters we need to the console and the output file.
for (size_t i = 0; i < padCount; ++i) {
encodedDataOutputFile << padChar;
std::cout << padChar;
}Final Code
And we are done! Your code should look something like this:
#include <iostream>
#include <fstream>
#include <cstdint>
#include <bitset>
#include <vector>
#include <string>
int main() {
// Standard Base64 alphabet
std::string base64Standard = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; // Values 0-63
const char* padChar = "="; // Technically, the padding character is NOT part of the alphabet and so including it above would be misleading
// Variable that holds our input file
std::string filePathInput;
std::cout << "Enter a file path to read from\n>";
// Read the line and place it directly in filePathInput
std::getline(std::cin, filePathInput);
// Variable that holds our output file
std::string filePathOutput;
std::cout << "Enter a file path to write to\n>";
// Read the line and place it directly in filePathOutput
std::getline(std::cin, filePathOutput);
std::cout << "\nAttempting to open input file...\n";
// Opening a read-only file stream to whichever path the user specified, in binary mode (since we are reading ones and zeroes, not text)
// To be more specific: On Windows systems, binary mode will prevent the OS from turning \n into \r\n, which will mess up the encoding and result in incorrect output
std::ifstream inputFileStream(filePathInput, std::ios::binary);
if (!inputFileStream.is_open()) {
std::cerr << "Failed to open file!\n";
return 1;
}
std::cout << "File open!\n";
std::cout << "Detecting file size...\n";
inputFileStream.seekg(0, std::ios::end);
// There is a dedicated type for file stream positions in C++
const std::streampos endPos = inputFileStream.tellg();
if (endPos < 0) {
std::cerr << "Failed to get file size\n";
return 1;
}
const size_t fileSizeBytes = static_cast < size_t > (endPos); // We must convert std::streampos to the dedicated type I mentioned above, size_t
std::cout << "Detected file size: " << fileSizeBytes << " bytes\n";
inputFileStream.seekg(0, std::ios::beg);
// Each character in the vector holds raw bytes of the file
std::vector < uint8_t > fileData(fileSizeBytes);
std::cout << "Reading file...\n";
// read(...) takes char* and std::streamsize.
// In C++, a static cast is a cast (type conversion) where there is native support for the two types to convert from and to each other
// We use one here to convert the result of fileData.size() into std::streamsize
// A reinterpret cast tells the compiler that we know for sure the data coming in can be converted into a certain type (std::streamsize in this case)
// This essentially causes the type conversion to happen during runtime
// This is very unsafe and is not usually recommended
inputFileStream.read(reinterpret_cast < char * > (fileData.data()),
static_cast < std::streamsize > (fileData.size()));
inputFileStream.close();
std::cout << "File data read! It is now safe to modify or delete the file!\n";
std::cout << "Creating output file...\n";
std::ofstream encodedDataOutputFile(filePathOutput, std::ios::binary);
// Fires if an error occurs during the file creation process
if (!encodedDataOutputFile.is_open()) {
std::cerr << "Error opening output file\n";
return 1;
}
size_t totalBits = fileData.size() * 8; // There are 8 bits in a byte
std::cout << "Processing " <<
totalBits <<
" bits...\n";
size_t bitsProcessed = 0; // Let's keep track of the bits processed here
while (bitsProcessed < totalBits) {
size_t byteIndex = bitsProcessed / 8;
size_t bitIndex = bitsProcessed % 8; // This tells us how many bits are overflowing into the next byte
uint8_t currentByte = fileData[byteIndex];
uint8_t nextByte = (byteIndex + 1 < fileData.size()) ? fileData[byteIndex + 1] : 0; // Six bits may span across two bytes, so we need to keep track of the next byte
uint16_t combined = (static_cast < uint16_t > (currentByte) << 8) | nextByte;
uint8_t sixBits = (combined >> (10 - bitIndex)) & 0x3F; // Extracting six bits
char resultChar = base64Standard[static_cast < int > (sixBits)];
encodedDataOutputFile << resultChar;
std::cout << resultChar;
bitsProcessed += 6; // Advance the loop
}
size_t remaining = fileData.size() /* <-- The size of the input file, in bits */ % 3;
size_t padCount = (3 - remaining) % 3; // yields 0, 1, or 2 (for number of padding characters we need to add)
for (size_t i = 0; i < padCount; ++i) {
encodedDataOutputFile << padChar;
std::cout << padChar;
}
return 0;
}Performance Testing
I wanted to compare this implementation to [Convert]::ToBase64String on PowerShell.
$TIME = Measure-Command {[Convert]::ToBase64String([System.IO.File]::ReadAllBytes("./hello.txt"))}
$TIME.TotalMilliseconds
# 7.0512
$DIY_IMPLEMENTATION = Measure-Command {echo "./hello.txt`n./res.txt`n" | ./a.exe}
$DIY_IMPLEMENTATION.TotalMilliseconds
# 31.5084Disappointing but not at all surprising. Not only is our implementation made for readability, not efficiency, our implementation writes to a file (which is a lot slower than you think).