How to calculate the size of a base64 encoded string?
Base64 is a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The term Base64 originates from a specific MIME content transfer encoding.
Above is a technical definition of Base64 from Wikipedia. In this post, I’ll explain you an optimised way to calculate the size of a base64 encoded string.
Scenario to calculate the size of a base64 encoded string
Let us take a scenario where a user is allowed to upload an image file in a web application. It’s likely that you will encode that binary data to base64 encoding scheme to the server. Now, we would like to add a file size limit that can be uploaded. If you think that having a validation on the client-side (HTML File API) will do the job, you are wrong! This validation then can easily be bypassed using browser’s debugger. So, we need to add a server-side validation.
The stack that you use on server is not important in this case as this is a general topic, so, consider the concept and not the syntax of the code added to this post. The first most important task is to ensure that the encoded string that we have received from the client is properly formed, i.e. a valid base64 encoded value. I’m not writing that piece of code here as it’s going to be stack specific. You may have a method out of the box in your stack that can validate this.
The second way you may think to do is to convert that string to bytes and get the length from it just to know the size. This doesn’t make sense as this action will consume memory to hold those bytes and we don’t know how big or small that is! So, now the question is – how can we calculate the size without even converting it to bytes?
To understand the answer, I recommend you to read Base64 Wiki, if you have time. Below is the code to get the base64 length based on a string of length ‘n’:
// where n is the length of base64 encoded string
var result = 4*Math.Ceiling(((double)n/3)));
Let me explain you the algorithm above. Each character in the string is used to represent 6 bits that is log(64) = 6. Therefore, 4 characters are used to represent 4 * 6 = 24 bits = 3 bytes. So the above algorithm is the results of 4*(n/3) chars to represent n bytes and this needs to be rounded up to a multiple of 4.
The result from formula above also includes padding information. To get the actual n bytes without padding:
- Subtract 2 from the result, if the string ends with “==”
- Subtract 1 from the results, if the string ends with “=”
The padding is added to the string so as to make the encoded output a multiple of 4 characters. Now, the result can be used for validation. For our example, we can now use result, that is length in bytes for validation. Say, we would like to keep “512 KB” as maximum size limit to upload, so, we can easily write the validation like this algorithm below:
// result/1000 is to convert bytes to KB & then compare with size limit.
if((result/1000) > 512)
// add validation error and return to the client.
Just an easy way to calculate the size of a base64 encoded string.