All Your Base64 Are Belong to Us

I recently found the Matasano Crypto Challenges and eagerly set about solving the first one - converting a hex string to base64. My code can be found here.

Hex (short for hexadecimal) is the base 16 number system, so whereas decimal has 10 symbols representing the values from 0 to 9, hexadecimal has 16 representing the values 0 to 15. It uses the numbers 0-9 for 0-9 and then the letters a-f for 10-15. Base64 on the other hand has 64 symbols representing the values 0 to 63 (A-Z representing 0-25, a-z representing 26-51, 0-9 representing 52-61 and + and / representing 62 and 63 respectively)

Any number can be represented in any base as a polynomial. For example we can represent any number, \(N_{16}\) ,in base 16 (hexadecimal) as the following:

\[ N_{16} = \sum_{i} a_{i} 16^i \]

where \(a_i\) is some integer between 0 and 15.

Similarly we may represent a number in Base64 as:

\[ N_{64} = \sum_{j} b_{j} 64^j = \sum_{j} b_{j} 16^j 4^j \]

If we are representing the same number (i.e. we are converting between bases) then we can equate these two:

\[ N = \sum_{j} b_{j} 16^j 4^j = \sum_{i} a_{i} 16^i \]

We wish to convert from base 16 to base 64. The representation of a number in a higher base can only be smaller or at most the same size (in terms of the length of the string required to represent it, i.e. the values of i or j) as the representation in the original lower base.

This means that if we start at the most significant digit of the hex string we can imagine that at most there will be a digit equivalent to it in the base64 string and then we can equate i and j and calculate the value of that digit. Therefore for that digit, we obtain:

\[ b_{i} 16^i 4^i = a_{i} 16^i \] \[ \therefore b_{i} = \frac{a_{i}}{4^i} \]

But the values of a and b must be integers within the range of representation of their respective bases - in this case it is important to us that b is an integer, we know that it will always be less than 64 because we are starting with the most significant digit and converting to a lower base so the problem will be carrying the decimal remainder to the lower digits rather than carrying the modulo 64 remainder to higher digits. This is confusing to discuss in the abstract but will be clearly demonstrated by an example.

Let us consider the hex string '2af3' which is equal to 10995 in decimal. We start with the most significant digit which is 2. It is the 4th digit and therefore using our polynomial representation we have i=3, as i=0 for the first digit. Therefore from the above arguments we calculate:

\[ \frac{2}{4^3} = \frac{1}{32} \]

Therefore the integer part is zero, so the base64 value for this digit is zero, represented by A. To carry the remainder along to the next digit we multiply the decimal part by 64 and then add it to the next digit.

\[ \frac{1}{32} \cdot 64 = 2 \]

Therefore the following digit has the value:

\[ 2 + \frac{10}{4^2} \]

The 2 comes from the carry and the 10 from the fact that the next digit is 'a' which represents a value of 10 in the hex number system.

The integer part of this is 2, represented in base64 as C, so the first digit in our base 64 string is C. Then we repeat the carry process as before: \[\frac{10}{4^2} \cdot 64 = 40 \]

Thus the value of the next digit is:

\[40+\frac{15}{4} = 43 + \frac{3}{4}\]

The representation of 43 in base64 is r. And then we carry over the decimal part as before: \[\frac{3}{4}\cdot 64 = 48\]

Thus we get the value of the final digit to be:

\[ 48+ 3 = 51\]

51 is represented as z in base 64 and thus the representation of the hex string 2af3 is Crz in base64, note that the string we obtained was ACrz but we strip the leading zeroes (which are represented by A).

This result can be confirmed by quickly checking that the decimal representations agree: \[ (2 \times 64^2) + (43 \times 64) + 51 = (2 \times 16^3) + (10 \times 16^2) + (15 \times 16) + 3 = 10995 \]

I had some difficulty with insufficient precision when using the standard Python float types and so I used the Decimal class from the decimal module which allows arbritary precision floating point calculations.

Then I was able to take their example hex string input of:

49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d

and produce the correct base64 output:

SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

One down, 47 to go. For great justice.

In The Aggregate

Guaranteed to be Probably Approximately Correct!

All Your Base64 Are Belong to Us