The new release of UltraEdit offers the possibility to encode / decode files with base46. I want to know more about it. I found nice information in this file from Chris Melnick (2004).
Base64 is a different way of interpreting bits of data in order to transmit that data over a text-only medium, such as the body of an e-mail. In the standard 8-bit ASCII character set, there are 256 characters that are used to format text. However, only a fraction of these characters are actually printable and readable when you are looking at them on screen, or sending them in an e-mail.
We need a way to convert unreadable characters into readable characters, do something with them (i.e. send them in an e-mail), and convert them back to their original format.
So how do you convert unreadable, nonprintable characters into readable, printable characters? There are many ways to do this, but the way we are covering now is by using base64 encoding.
The 256 characters in the ASCII character set are numbered 0 through 255. For any ASCII
character, you simply need one byte to represent this data. As far as a computer is concerned, there is no difference between an ASCII character, and a number between 0 and 255 (which is a string of 8 binary placeholders), only how it is interpreted. Because we are now detached from ASCII characters, you can also apply these same techniques to binary data, such as a picture or executable file. All you are doing is interpreting data one byte at a time.
The problem with representing data one byte at a time in a readable manner is that there are not 256 readable characters in the ASCII character set, so we cannot print a character for each of the 256 combinations that a byte can offer. So we need to take a different approach to looking at the bits in a byte.
So what if instead of looking at a whole byte, we looked at half of a byte, or 4 bits (also known as a nibble) at a time. This would be entirely possible because there are certainly sixteen readable characters that we could use to represent each variation of nibble. This type of translation is known as hex.
Binary Decimal Hex Binary Decimal Hex
------------------------------------------------------------
0000 0 0 1000 8 8
0001 1 1 1001 9 9
0010 2 2 1010 10 A
0011 3 3 1011 11 B
0100 4 4 1100 12 C
0101 5 5 1101 13 D
0110 6 6 1110 14 E
0111 7 7 1111 15 F
The problem with using hex, is that since you are using one ASCII character (which is one byte long in storage space) to represent every four bits, anything you translate into hex will be exactly twice as big as the original data. This might not seem like a problem for a small message, but imagine you are trying to send an image or executable. The original size of perhaps a megabyte or more is now doubled. Sending this over email or a slow Internet connection will take twice as long.
Base64 as an alternative
We now know that using 16 different characters to represent each half byte is a viable option, but not our ideal option because it is only half as space efficient as a byte. So how else can we dice bytes up to get our goal: readable characters for any value of 0 to 255?
Instead of looking at one byte at a time, and trying to chop that byte up, take several bytes and see what we can do with them.
Byte 1 Byte 2 Byte 3
0000 0000 0000 0000 0000 0000
As you can easily see, using three bytes, we have a total of 24 bits. How else can we chop 24 bits up? If instead of 3 bytes of 8 bits each we use 4 "clumps" of 6 bytes each, what are we left with? Now we have 26 which equals 64. So now instead of needing 3 instances of a character that can represent any of 256 different combinations, we now need just 4 instances of a character that can represent any of 64 different combinations. The same bits as in the above table fit into the table below.
Clump 1 Clump 2 Clump 3 Clump 4
000000 000000 000000 000000
Now we have to ask ourselves, "do we have 64 readable characters?". The answer is yes. The characters we will use are uppercase A-Z (26 characters), lowercase a-z (26 characters), 0-9 (10 characters), '+' (1 character) and '/' (1 character). 26 + 26 + 10 + 1 + 1 = 64, just the number we need. As you can surmise, base64 is still less space efficient than using a full byte, but instead of hex's double space usage, base64 uses only one and a third as much space. In other words for every 3 bytes, you must have 4 base64 characters.
---
By the way, UltraEdit seems to encode and decod text to base64 fine. All my tests with other material, such as pictures and pdf files failed.
Recent Comments