Knowledge

The Complete Guide to Base91 Encoding: Principles, Algorithm, and Real-World Applications

An in-depth guide to Base91 (basE91) encoding: understand how it works, compare it with Base64 and Base85, and discover its applications in data compression, embedded systems, and more. Includes an online Base91 encoder/decoder tool.

In the world of binary-to-text encoding, Base64 is the undisputed standard, and Base85 has carved out its niche with superior efficiency in domains like PDF and Git. However, there’s an even more efficient encoding scheme that remains relatively unknown — Base91 (also stylized as basE91). By leveraging 91 printable ASCII characters and a clever variable-length encoding strategy, Base91 achieves an encoding overhead of only ~23%, making it one of the most space-efficient printable ASCII encoding schemes available. This article provides a deep dive into Base91’s principles, algorithm details, and practical applications.

Need to encode or decode Base91 quickly? Try our Online Base91 Encoder/Decoder.

1. What is Base91?

Base91 is a binary-to-text encoding scheme designed by Joachim Henke. It uses 91 printable ASCII characters to represent binary data and is one of the most efficient encoding schemes that relies solely on printable ASCII characters.

Core idea: Use as many printable characters as possible to minimize encoding overhead.

Why 91?

ASCII has 94 visible printable characters in the range 0x21 to 0x7E (or 95 printable characters if you also count the space character at 0x20). basE91 builds its alphabet from those 94 visible printable characters and excludes 3 that are problematic in various contexts:

Excluded CharacterReason
- (hyphen)Has special meaning in command-line arguments
\ (backslash)Escape character in most programming languages and shells
' (single quote)Used for string delimiting in shells and programming languages

The Base91 alphabet is:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!#$%&()*+,./:;<=>?@[]^_`{|}~"

Encoding Efficiency Comparison

EncodingOverhead (worst case)Overhead (average)Encoded Size (1000-byte input)
Hexadecimal100%100%2000 chars
Base3260%60%1600 chars
Base6433%33%1336 chars
Base8525%25%1250 chars
Base9123%≈23%≈1230 chars

2. How Base91 Works

Unlike Base64 and Base85, which use fixed-size grouping strategies, Base91 employs a variable-length encoding mechanism — the key to its superior efficiency.

2.1 Encoding Process

The Base91 encoding algorithm operates on a bit stream:

  1. Build a bit stream: Treat the input data as a continuous stream of bits, reading from the least significant bit (LSB) of each byte first.
  2. Extract 13 bits: Take 13 bits from the stream to form a value v (range: 0–8191).
  3. Choose 13 or 14 bits: If v > 88, encode those 13 bits directly. Otherwise, read 1 additional bit and recompute v from the low 14 bits, so this output pair carries 14 bits.
  4. Split into two characters: Divide v by 91 to get quotient q and remainder r. Output alphabet[r] and alphabet[q] as two characters.
  5. Handle the tail: If the bit stream ends with fewer than 13 remaining bits, process them as the final value. If the value fits in a single character (i.e., < 91), output 1 character; otherwise output 2 characters.

2.2 Why is the Threshold 88?

This is the most elegant aspect of Base91’s design. The key insight is:

  • 13 bits can represent values from 0 to 8191
  • 14 bits can represent values from 0 to 16383
  • 91² = 8281

If the low 13-bit value is between 89 and 8191, Base91 encodes exactly those 13 bits, because every such value already fits in the 0..8280 range addressable by 2 Base91 characters.

If the low 13-bit value is between 0 and 88, Base91 uses that range as a signal that this chunk should consume 14 bits instead. In that case the final 14-bit value is either 0..88 or 8192..8280, which still fits inside the 0..8280 range of 91² states.

This strategy allows the algorithm to pack as many data bits as possible into each pair of output characters without wasting encoding space.

2.3 Encoding Example

Let’s encode the string "Hi" (2 bytes):

Step 1: Get byte values and build the bit buffer

'H' = 72  = 0x48
'i' = 105 = 0x69

b = 72 + (105 << 8) = 26952
n = 16

Step 2: Extract 13 bits

Take the low 13 bits:
v = 26952 & 8191 = 2376

Step 3: Check threshold

2376 > 88, so encode 13 bits

Step 4: Divide by 91

2376 ÷ 91 = 26 remainder 10
r = 10 → alphabet[10] = 'K'
q = 26 → alphabet[26] = 'a'

Step 5: Handle remaining bits

b = 26952 >> 13 = 3
n = 3
3 < 91, output alphabet[3] = 'D'

Final encoded result: KaD

2.4 Decoding Process

Decoding is the reverse of encoding:

  1. Map each encoded character back to its index in the alphabet via a lookup table.
  2. Read 2 values at a time: r and q, and compute v = q × 91 + r.
  3. If (v & 8191) > 88, the pair represents a 13-bit chunk; otherwise it represents a 14-bit chunk.
  4. If only 1 character remains at the end, its index value is written directly to the low bits of the stream.
  5. Convert the bit stream back to bytes, 8 bits at a time.

3. Detailed Comparison with Other Encodings

3.1 Base91 vs Base64

FeatureBase64Base91
Alphabet size6491
Encoding overhead33%≈23%
Encoding strategyFixed (3→4)Variable (13/14 bits → 2 chars)
StandardizationRFC 4648No formal RFC
Library supportExtremely broadLimited
URL-safe variantBase64urlNone
Native browser supportYes (btoa/atob)No

Encoded size comparison (1024 bytes of random data):

Raw data:   1024 bytes
Base64:     1368 characters (33.6% overhead)
Base91:     ≈1259 characters (≈22.9% overhead)
Savings:    approximately 8%

3.2 Base91 vs Base85

FeatureBase85Base91
Alphabet size8591
Encoding overhead25%≈23%
Encoding strategyFixed (4→5)Variable length
Multiple variantsYes (Ascii85, Z85, etc.)Essentially one
Primary applicationsPDF, GitData transfer, embedded systems

The efficiency gain of Base91 over Base85 is modest (~2%), but Base91’s variable-length strategy can perform better with certain data patterns.

4. Programming Examples

4.1 JavaScript

// Base91 alphabet
const BASE91_ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!#$%&()*+,./:;<=>?@[]^_`{|}~"';

// Build decode lookup table
const BASE91_DECODE_TABLE = new Array(256).fill(-1);
for (let i = 0; i < BASE91_ALPHABET.length; i++) {
  BASE91_DECODE_TABLE[BASE91_ALPHABET.charCodeAt(i)] = i;
}

// Base91 encode
function base91Encode(data) {
  const bytes = typeof data === 'string'
    ? new TextEncoder().encode(data)
    : new Uint8Array(data);

  let result = '';
  let n = 0;    // Number of bits in the buffer
  let b = 0;    // Bit buffer

  for (let i = 0; i < bytes.length; i++) {
    // Append byte to bit buffer (LSB first)
    b |= bytes[i] << n;
    n += 8;

    // When buffer has more than 13 bits, extract and encode
    if (n > 13) {
      let v = b & 8191; // Take low 13 bits
      if (v > 88) {
        b >>= 13;
        n -= 13;
      } else {
        // Low values 0..88 signal that this pair should carry 14 bits
        v = b & 16383;
        b >>= 14;
        n -= 14;
      }
      // Output two characters
      result += BASE91_ALPHABET[v % 91];
      result += BASE91_ALPHABET[Math.floor(v / 91)];
    }
  }

  // Handle remaining bits
  if (n > 0) {
    result += BASE91_ALPHABET[b % 91];
    if (n > 7 || b > 90) {
      result += BASE91_ALPHABET[Math.floor(b / 91)];
    }
  }

  return result;
}

// Base91 decode
function base91Decode(str) {
  const bytes = [];
  let b = 0;    // Bit buffer
  let n = 0;    // Number of bits in buffer
  let v = -1;   // Current value being assembled

  for (let i = 0; i < str.length; i++) {
    const d = BASE91_DECODE_TABLE[str.charCodeAt(i)];
    if (d === -1) continue; // Skip invalid characters

    if (v === -1) {
      v = d;  // Read first character (remainder)
    } else {
      v += d * 91;  // Read second character (quotient), rebuild v
      b |= v << n;

      // Determine how many bits this value carries
      n += (v & 8191) > 88 ? 13 : 14;

      // Extract complete bytes from buffer
      while (n >= 8) {
        bytes.push(b & 0xFF);
        b >>= 8;
        n -= 8;
      }
      v = -1;
    }
  }

  // Handle the last unpaired character
  if (v !== -1) {
    b |= v << n;
    n += 8; // Flush the trailing single character as one final byte
    while (n >= 8) {
      bytes.push(b & 0xFF);
      b >>= 8;
      n -= 8;
    }
  }

  return new Uint8Array(bytes);
}

// Usage example
const encoded = base91Encode('Hello, World!');
console.log(encoded);
// >OwJh>}AQ;r@@Y?F

const decoded = new TextDecoder().decode(base91Decode(encoded));
console.log(decoded);
// Hello, World!

4.2 Python

# Base91 alphabet
BASE91_ALPHABET = (
    'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    'abcdefghijklmnopqrstuvwxyz'
    '0123456789'
    '!#$%&()*+,./:;<=>?@[]^_`{|}~"'
)

# Build decode table
BASE91_DECODE_TABLE = {ch: i for i, ch in enumerate(BASE91_ALPHABET)}


def base91_encode(data: bytes) -> str:
    """Encode bytes to a Base91 string"""
    result = []
    n = 0  # Number of bits in buffer
    b = 0  # Bit buffer

    for byte in data:
        b |= byte << n
        n += 8

        if n > 13:
            v = b & 8191  # Take low 13 bits
            if v > 88:
                b >>= 13
                n -= 13
            else:
                v = b & 16383  # Low values 0..88 mean this chunk uses 14 bits
                b >>= 14
                n -= 14
            result.append(BASE91_ALPHABET[v % 91])
            result.append(BASE91_ALPHABET[v // 91])

    # Handle remaining bits
    if n > 0:
        result.append(BASE91_ALPHABET[b % 91])
        if n > 7 or b > 90:
            result.append(BASE91_ALPHABET[b // 91])

    return ''.join(result)


def base91_decode(encoded: str) -> bytes:
    """Decode a Base91 string to bytes"""
    result = []
    b = 0
    n = 0
    v = -1

    for ch in encoded:
        d = BASE91_DECODE_TABLE.get(ch, -1)
        if d == -1:
            continue

        if v == -1:
            v = d
        else:
            v += d * 91
            b |= v << n
            n += 13 if (v & 8191) > 88 else 14

            while n >= 8:
                result.append(b & 0xFF)
                b >>= 8
                n -= 8
            v = -1

    if v != -1:
        b |= v << n
        n += 8
        while n >= 8:
            result.append(b & 0xFF)
            b >>= 8
            n -= 8

    return bytes(result)


# Usage example
data = b"Hello, World!"
encoded = base91_encode(data)
print(f"Encoded: {encoded}")
# Encoded: >OwJh>}AQ;r@@Y?F

decoded = base91_decode(encoded)
print(f"Decoded: {decoded}")
# Decoded: b'Hello, World!'

5. Real-World Applications

5.1 Data Embedding and Transmission

When binary data must be transmitted through text-only channels, Base91 saves approximately 8% compared to Base64. For large-scale data transfers, this difference adds up to significant savings:

Raw Data SizeBase64 EncodedBase91 EncodedSavings
1 KB1.33 KB1.23 KB~8%
100 KB133 KB123 KB~8%
1 MB1.33 MB1.23 MB~8%
10 MB13.3 MB12.3 MB~8%

5.2 Embedded Systems

In storage- and bandwidth-constrained embedded systems, every byte matters. Base91’s high encoding efficiency makes it an ideal choice for firmware updates, configuration data transfers, and other scenarios where minimizing payload size is critical.

5.3 Logging and Debugging

When embedding binary data in log files (such as error dumps or core dump summaries), Base91 can represent more data with shorter text, reducing log file bloat.

5.4 Email and Messaging

In messaging systems that don’t support MIME, Base91 can serve as a more efficient alternative to Base64. However, since Base91 is not part of any email standard, compatibility requires additional consideration.

6. Advantages and Limitations

6.1 Advantages

  • Best-in-class encoding efficiency: Among printable ASCII encoding schemes, Base91 has nearly the lowest overhead
  • Simple algorithm: Despite being variable-length, the implementation is relatively straightforward
  • Stream processing: Supports byte-by-byte input, suitable for streaming encode/decode operations
  • No padding characters: Unlike Base64 which requires = padding, Base91 output is compact

6.2 Limitations

  • No formal standard: No RFC or international standard exists for Base91
  • Limited library support: Very few programming languages natively support Base91; most require third-party libraries
  • Character set concerns: Contains " and other special characters that may need escaping in JSON, XML, and similar formats
  • Less universal than Base64: In mainstream areas like web development and email, Base64 remains the de facto standard
  • Variable-length complexity: Compared to Base64’s fixed 3:4 mapping, Base91’s variable-length nature makes manual calculation and debugging more difficult

7. Frequently Asked Questions

Is there a difference between Base91 and basE91?

No. basE91 is the original name of the encoding scheme (as named by its author, Joachim Henke), where the capital E is a stylistic choice. Base91 is the more commonly used simplified name. Both refer to the same encoding scheme.

Is Base91 suitable for URLs?

Not really. Base91’s character set includes #, %, &, ?, /, and other characters that have special meanings in URLs. For encoding data in URLs, Base64url encoding is recommended instead.

Is there anything more efficient than Base91?

In theory, a Base95 encoding using all 95 printable ASCII characters could further reduce overhead, but at the cost of handling spaces, backslashes, and other highly problematic characters. Base91 represents an excellent balance between efficiency and practicality.

Is Base91 encoding the same as encryption?

No. Like all Base encodings, Base91 is purely an encoding scheme and provides no security whatsoever. Anyone can easily decode Base91 data. If you need to protect data, encrypt it first (e.g., using AES), then apply Base91 encoding.

Why does Base91 use variable-length encoding instead of fixed grouping?

This is the result of efficiency optimization. Since 91² = 8281 and 2¹³ = 8192, 2 Base91 characters can cover all 13-bit values plus 89 extra states. Base91 uses those extra states by treating low 13-bit values 0..88 as a signal to consume 14 bits instead, producing values in 0..88 or 8192..8280. That is how it squeezes slightly more information into many output pairs.

8. Conclusion

Base91 is a highly efficient binary-to-text encoding scheme that achieves near-optimal encoding efficiency within the constraint of 91 printable ASCII characters, thanks to its carefully designed variable-length encoding strategy.

EncodingOverheadBest For
Hexadecimal100%Debugging, low-level data inspection
Base3260%Case-insensitive contexts
Base6433%Web APIs, email, general purpose
Base8525%PDF, Git, ZeroMQ
Base91≈23%Bandwidth-sensitive, storage-constrained scenarios

While Base91 doesn’t match Base64 in universality and ecosystem support, it’s an excellent choice when minimizing encoding overhead is a priority.

Want to try Base91 encoding and decoding yourself? Use our Online Base91 Encoder/Decoder for quick Base91 conversions.