Shortening-uuid

June 18, 2017 java uuids

Many are familiar today with the UUIDs, today they are widely used as identifiers due to their non-sequentiality and their low collision probability.

The uuids are a sequence of 128 bits (16 bytes). The usual representation of these is with 36 characters: 32 hexadecimal characters plus 4 separators (for example: 7625c7e9-38b1-4622-aa71-1ad439c1bced). The separators are decorative, so the characters that really have information are 32 (7625c7e938b14622aa711ad439c1bced). Since hexadecimal characters can be represented with 4 bits, each character represents 4 bits of the UUID, requiring 32 characters to represent the 128 bits.

That is, for each character (1 byte), we are using only 4 bits to represent the uuid. Why not use the 8 bits? This is because some of the characters ASCII are in control. Obviously we could invent some encoding where all 256 characters are graphically representable, but we would have an id with non-traditional characters.

Could a more compact representation be used? Yes, we could, and we can also make it URL friendly (as these identifiers should often be used in URLs it would be convenient to use valid characters for a URL). If we include all the numbers (10 characters), plus all the letters in lowercase and uppercase (52 characters without including the ñ), we have 62 valid characters. With 6 bits we can represent 64 values, with which adding the characters - and _ we have a set of 64 characters that are valid in a URL. In fact this encoding is better known as * base 64 * and its use to represent UUIDs was not mine. You can read something in base64 url applications.

When passing the representation of hexadecimal (4 bits) to * base 64 * (6 bits), we can represent the UUID with 22 characters (over 4 bits). I wanted to do some tests with this idea and I left them in short-ids in case someone serves them. So I could try the different representations of the same UUID:

7625c7e9-38b1-4622-aa71-1ad439c1bced
7625c7e938b14622aa711ad439c1bced
01110110001001011100011111101001001110001011000101000110001000101010101001110001000110101101010000111001110000011011110011101101
B2JcfpOLFGIqpxGtQ5wbzt

I'm not sure if there is any traditional way to get the latest version. I was able to do a test using java.util.Base64.Encoder and UUID.randomUUID() code:

        byte[] randomBytes = new byte[16];
        new SecureRandom().nextBytes(randomBytes);
        randomBytes[6]  &= 0x0f;  /* clear version        */
        randomBytes[6]  |= 0x40;  /* set to version 4     */
        randomBytes[8]  &= 0x3f;  /* clear variant        */
        randomBytes[8]  |= 0x80;  /* set to IETF variant  */

        System.out.println(Base64.getEncoder().withoutPadding().encodeToString(randomBytes));

Getting (for example) as a result: qcwbFlNkQDWTZlM2NoG7tg

Then I will see if I can move forward and investigate a little more about the subject.