Knowledge

The Complete Guide to Base85 Encoding: Principles, Variants, and Real-World Applications

An in-depth guide to Base85 encoding: understand how it works, explore Ascii85, Z85, and other major variants, and discover its applications in PDF, PostScript, Git, and more. Includes an online Base85 encoder/decoder tool.

Among the many data encoding schemes, Base64 is undoubtedly the most widely recognized. But if you’re looking for better encoding efficiency, Base85 might be the superior choice. It encodes 4 bytes of binary data into 5 ASCII characters, compared to Base64’s 3:4 ratio (3 bytes to 4 characters), improving space utilization by approximately 7%. This article provides a comprehensive overview of Base85’s principles, variants, and real-world applications.

Need to encode or decode Base85 quickly? Try our Online Base85 Encoder/Decoder.

1. What is Base85?

Base85 is an encoding method that uses 85 printable ASCII characters to represent binary data. Its core concept is straightforward:

4 bytes of binary data → 5 printable characters

Why 85? Because 85⁵ = 4,437,053,125, which is slightly greater than 2³² = 4,294,967,296 (the maximum value representable by 4 bytes). This means 5 Base85 characters can cover all possible values of 4 bytes, achieving lossless encoding.

By comparison, Base64 requires 4 characters to encode 3 bytes of data, with an encoding overhead of 33%, while Base85’s overhead is only 25%.

Encoding Efficiency Comparison

EncodingRaw DataEncodedOverhead
Hexadecimal1 byte2 chars100%
Base325 bytes8 chars60%
Base643 bytes4 chars33%
Base854 bytes5 chars25%

2. How Base85 Works

2.1 Encoding Process

  1. Grouping: Split the input data into groups of 4 bytes. If the last group has fewer than 4 bytes, pad it with zero bytes.
  2. Convert to integer: Treat each 4-byte group as a 32-bit big-endian unsigned integer.
  3. Base conversion: Repeatedly divide the integer by 85, yielding 5 remainders (from least significant to most significant).
  4. Character mapping: Add an offset to each remainder (depending on the variant) and map it to a printable ASCII character.
  5. Tail handling: If the last group was padded, discard the extra characters from the encoded result.

2.2 Encoding Example

Let’s encode the string "Man " (4 bytes):

Step 1: Get byte values

'M' = 77,  'a' = 97,  'n' = 110,  ' ' = 32

Step 2: Convert to 32-bit integer

77 × 256³ + 97 × 256² + 110 × 256 + 32 = 1,298,230,816

Step 3: Repeatedly divide by 85

1298230816 ÷ 85 = 15273303 remainder 61
  15273303 ÷ 85 =   179686 remainder 3
    179686 ÷ 85 =     2114 remainder 21
      2114 ÷ 85 =       24 remainder 64
        24 ÷ 85 =        0 remainder 24

Step 4: Map to characters (Ascii85 variant, offset 33)

Remainder sequence (MSB to LSB): 24, 64, 21, 3, 61
Add offset 33:                   57, 97, 54, 36, 94
ASCII characters:                9,  a,  6,  $,  ^

Final encoded result: 9a6$^

2.3 Decoding Process

Decoding is the reverse of encoding:

  1. Split the encoded string into groups of 5 characters.
  2. Subtract the offset from each character to get its numeric value.
  3. Treat the 5 values as a base-85 number and convert to a 32-bit integer.
  4. Split the integer into 4 bytes.
  5. If the last group was padded, discard the corresponding extra bytes.

3. Major Base85 Variants

Base85 is not a single standard but a family of encoding schemes. Different variants use different character sets, padding rules, and delimiters.

3.1 Ascii85 (btoa Format)

Ascii85 is the most classic Base85 variant. It was designed by Paul E. Rutter in 1984 for the btoa utility and was later adopted by Adobe for PostScript and PDF.

Character set: Uses 85 consecutive ASCII characters from code 33 (!) through 117 (u):

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu

Special rules:

  • All-zero compression: If a 4-byte group is all zeros (0x00000000), it’s represented by a single z character instead of !!!!!.
  • Delimiters: Adobe’s version uses <~ and ~> as start and end markers for the encoded data.
  • Whitespace: Spaces, tabs, and newlines can be inserted in the encoded data and are ignored during decoding.

Example:

Original text: Man is distinguished
Ascii85:       <~9jqo^BlbD-BleB1DJ+*+F(f,q~>

3.2 Z85 (ZeroMQ Base85)

Z85 is a Base85 variant designed by the ZeroMQ community, specifically optimized for use within string literals in programming languages and in formats like XML/JSON.

Character set (85 carefully selected characters):

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-:+=^!/*?&<>()[]{}@%$#

Design highlights:

  • Avoids single quotes ', double quotes ", and backslashes \, so it can be directly embedded in most programming language strings
  • Avoids backtick ` and comma , for safety in more contexts
  • No delimiters
  • No all-zero compression
  • Requires input length to be a multiple of 4

3.3 RFC 1924 Base85 (IPv6 Encoding)

RFC 1924 proposed a scheme for encoding IPv6 addresses using Base85. While the RFC was originally an April Fools’ proposal, its defined character set has been used in some serious projects.

Character set:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!#$%&()*+-;<=>?@^_`{|}~

Features:

  • Encodes a 128-bit IPv6 address into 20 characters (compared to up to 39 characters in standard notation)
  • Character set starts with digits and letters for a more intuitive arrangement

3.4 Variant Comparison

FeatureAscii85Z85RFC 1924
Year designed198420101996
Starting char code33 (!)MixedMixed
Delimiters<~~>NoneNone
All-zero compressionYes (z)NoNo
String-safeNo (contains " \)YesNo
Tail paddingFlexibleStrict (multiple of 4)N/A
Primary usePDF / PostScriptZeroMQ / NetworkingIPv6 addresses

4. Real-World Applications

4.1 PDF and PostScript

This is the most important application domain for Base85 (Ascii85). In PDF files, binary data streams (such as embedded images and fonts) can be encoded using Ascii85:

stream
<~9jqo^BlbD-BleB1DJ+*+F(f,q/0JhKF<GL>[email protected]$d7F!,L7@<6@)/0JDEF<G%<+EV:2F!,
O<DJ+*.@<*K0@<6L(Df-\0Ec5e;DffZ(EZee.Bl.9pF"AGXBPCsi+DGm>@3BB/F*&OCAfu2/AKY
i(DIb:@FD,*)+C]U=@3BN#EcYf8ATD3s@q?d$AftVqCh[NqF<G:8+EV:.+Cf>-FD5W8ARlolDIa
l(DId<j@<?3r@:F%a+D58'ATD4$Bl@l3De:,-DJs`8ARoFb/0JMK@qB4^F!,R<AKZ&-DfTqBG%G
>uD.RTpAKYo'+CT/5+Cei#DII?(E,9)oF*2M7/c~>
endstream

Compared to hexadecimal encoding, Ascii85 can reduce file size by approximately 25%.

4.2 Git Binary Patches

When generating patches for binary files, Git uses Base85 encoding to represent the binary diff data:

diff --git a/image.png b/image.png
GIT binary patch
literal 1234
zcmV;@1TFkJiwFP!0000...(Base85 encoded data)

Git uses a custom Base85 variant with this character set:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!#$%&()*+-;<=>?@^_`{|}~

4.3 ZeroMQ Messaging

ZeroMQ uses Z85 encoding to transmit binary data (such as CURVE encryption keys), making it safe to embed in configuration files and logs:

# CURVE public key (32 bytes → 40 Z85 characters)
Yne@$w-vo<fVvi]a<NY6T1ed:M$fCG*[IaLV{hID

4.4 Data Serialization

Some data serialization libraries use Base85 as their binary data encoding choice because it’s more compact than Base64:

Python’s standard library natively supports Base85:

import base64

data = b"Hello, World!"
encoded = base64.b85encode(data)
print(encoded)  # b'NM&qnZ*Oe1c^gn'

decoded = base64.b85decode(encoded)
print(decoded)  # b'Hello, World!'

5. Programming Examples

5.1 JavaScript (Ascii85)

// Ascii85 encoding
function ascii85Encode(data) {
  const bytes = typeof data === 'string'
    ? new TextEncoder().encode(data)
    : new Uint8Array(data);

  let result = '<~';
  const padding = (4 - (bytes.length % 4)) % 4;
  const padded = new Uint8Array(bytes.length + padding);
  padded.set(bytes);

  for (let i = 0; i < padded.length; i += 4) {
    // 4 bytes → 32-bit integer
    const value = (padded[i] << 24) | (padded[i+1] << 16)
                | (padded[i+2] << 8) | padded[i+3];

    if (value === 0 && i + 4 <= bytes.length) {
      result += 'z';  // All-zero compression
      continue;
    }

    // Integer → 5 Base85 characters
    const chars = [];
    let v = value >>> 0;
    for (let j = 4; j >= 0; j--) {
      chars[j] = String.fromCharCode((v % 85) + 33);
      v = Math.floor(v / 85);
    }

    // Last group may need fewer characters
    const charsNeeded = (i + 4 > bytes.length) ? (bytes.length - i + 1) : 5;
    result += chars.slice(0, charsNeeded).join('');
  }

  result += '~>';
  return result;
}

// Ascii85 decoding
function ascii85Decode(str) {
  // Remove delimiters
  str = str.replace(/^<~/, '').replace(/~>$/, '');
  str = str.replace(/\s/g, ''); // Ignore whitespace

  const bytes = [];

  for (let i = 0; i < str.length;) {
    if (str[i] === 'z') {
      bytes.push(0, 0, 0, 0);  // All-zero expansion
      i++;
      continue;
    }

    const group = str.slice(i, i + 5);
    const padLen = 5 - group.length;
    const padded = group + 'u'.repeat(padLen); // Pad with 'u' (84)

    // 5 characters → 32-bit integer
    let value = 0;
    for (const ch of padded) {
      value = value * 85 + (ch.charCodeAt(0) - 33);
    }

    // 32-bit integer → 4 bytes
    const groupBytes = [
      (value >>> 24) & 0xFF,
      (value >>> 16) & 0xFF,
      (value >>> 8)  & 0xFF,
      value & 0xFF
    ];

    // Keep only the bytes corresponding to actual characters
    for (let j = 0; j < 4 - padLen; j++) {
      bytes.push(groupBytes[j]);
    }

    i += group.length;
  }

  return new Uint8Array(bytes);
}

// Usage example
const encoded = ascii85Encode('Hello, World!');
console.log(encoded);
// <~87cURD]j7BEbo80~>

const decoded = new TextDecoder().decode(ascii85Decode(encoded));
console.log(decoded);
// Hello, World!

5.2 Python

Python’s standard base64 module has built-in support for both Base85 and Ascii85:

import base64

# ===== Ascii85 =====
data = b"Hello, World!"

# Encoding
a85_encoded = base64.a85encode(data, adobe=True)
print(a85_encoded)   # b'<~87cURD]j7BEbo80~>'

# Decoding
a85_decoded = base64.a85decode(a85_encoded, adobe=True)
print(a85_decoded)   # b'Hello, World!'

# ===== RFC 1924 / Base85 =====
b85_encoded = base64.b85encode(data)
print(b85_encoded)   # b'NM&qnZ*Oe1c^gn'

b85_decoded = base64.b85decode(b85_encoded)
print(b85_decoded)   # b'Hello, World!'


# ===== Custom Ascii85 implementation =====
def ascii85_encode(data: bytes) -> str:
    """Encode bytes to an Ascii85 string"""
    result = []
    padding = (4 - len(data) % 4) % 4
    data_padded = data + b'\x00' * padding

    for i in range(0, len(data_padded), 4):
        # 4 bytes → 32-bit integer
        value = int.from_bytes(data_padded[i:i+4], 'big')

        if value == 0 and i + 4 <= len(data):
            result.append('z')
            continue

        # Integer → 5 Base85 characters
        chars = []
        for _ in range(5):
            chars.append(chr(value % 85 + 33))
            value //= 85
        chars.reverse()

        # Keep only needed characters for the last group
        if i + 4 > len(data):
            chars = chars[:len(data) - i + 1]

        result.extend(chars)

    return '<~' + ''.join(result) + '~>'


# Using custom implementation
print(ascii85_encode(b"Hello, World!"))
# <~87cURD]j7BEbo80~>

6. Choosing Between Base85 and Base64

6.1 When to Choose Base85

ScenarioRecommendationReason
PDF / PostScript streamsBase85Industry standard, saves file size
Git binary patchesBase85Natively used by Git
ZeroMQ keysZ85String-safe, no escaping needed
Bandwidth-sensitive scenariosBase85Lower encoding overhead (25% vs 33%)

6.2 When to Choose Base64

ScenarioRecommendationReason
Web APIs / JSONBase64Broad support, best compatibility
Email / MIMEBase64MIME standard
Data URIsBase64Native browser support
General purposeBase64Widest tool and library support

6.3 Efficiency Benchmarks

Encoded size comparison for different input sizes:

Raw Data SizeHexadecimalBase64Base85Savings (vs Base64)
100 bytes200 chars136 chars125 chars~8%
1 KB2,048 chars1,368 chars1,280 chars~6%
10 KB20,480 chars13,684 chars12,800 chars~6%
1 MB~2 MB~1.33 MB~1.25 MB~6%

7. Limitations of Base85

Despite its superior encoding efficiency, Base85 has some limitations:

7.1 Character Set Issues

Ascii85’s character set includes many special characters (such as ", \, <, >), which can cause issues in certain contexts:

  • JSON: Requires escaping " and \
  • XML: Requires entity encoding for <, >, &
  • URLs: Most characters need percent-encoding
  • Shell: Many characters need escaping

Z85 partially addresses this by carefully selecting its character set, but it’s still not as universally safe as Base64url.

7.2 Limited Library Support

Compared to Base64, Base85 has less native library support:

  • Browsers: No native equivalent of btoa/atob
  • Most languages: Require third-party libraries or manual implementation
  • Python: Standard library support (base64.a85encode/base64.b85encode)
  • Go: Standard library support (encoding/ascii85)

7.3 Incompatibility Between Variants

Ascii85, Z85, RFC 1924, and other variants use different character sets and rules. They are not interchangeable. When using Base85, you must clearly specify which variant you’re working with.

8. Frequently Asked Questions

Are Base85 and Ascii85 the same thing?

Strictly speaking, Ascii85 is a specific variant of Base85. “Base85” is a broader term covering all encoding schemes that use 85 characters. However, in everyday usage, the two terms are often used interchangeably.

Why not use Base128 for even higher efficiency?

While a larger character set is theoretically possible, ASCII only has 95 printable characters (codes 32–126). Base85 already uses the vast majority of them. Base128 would require non-printable control characters, leading to serious compatibility issues.

Is Base85 encoding secure?

Base85 is an encoding scheme, not an encryption scheme. It provides no security protection whatsoever. Encoded data can be easily decoded by anyone. If you need to protect data, encrypt it first, then encode it.

How can I tell which Base85 variant was used?

  • If the data starts with <~ and ends with ~>, it’s Ascii85 (Adobe format)
  • If the data contains only characters from the Z85 character set (no unusual punctuation), it might be Z85
  • Other cases require context to determine

9. Conclusion

Base85 is a family of encoding schemes that excels in specific domains. With a 25% encoding overhead, it achieves the best space efficiency among all common Base encodings.

EncodingOverheadBest For
Hexadecimal100%Debugging, low-level data inspection
Base3260%Case-insensitive contexts, key exchange
Base6433%Web APIs, email, general purpose
Base8525%PDF, Git, ZeroMQ, bandwidth-sensitive scenarios

If you’re working with PDF files, Git patches, or ZeroMQ messages, Base85 is a tool worth understanding and mastering.

Want to try Base85 encoding and decoding yourself? Use our Online Base85 Encoder/Decoder for quick conversion supporting Ascii85, Z85, and more variants.