The Complete Guide to Base62 Encoding: Principles, Character Set, and Practical Applications
An in-depth guide to Base62 encoding: understand its design philosophy, encoding mechanism, and real-world applications in URL shorteners, distributed ID generation, and more.
Among the many Base encoding schemes, Base62 is an incredibly practical yet often underappreciated method. It combines digits, uppercase letters, and lowercase letters into a clean set of 62 characters, maintaining high encoding efficiency while ensuring URL safety and cross-platform compatibility. From URL shortening services to distributed unique ID generation, Base62 plays an important role in modern internet applications. This article provides a comprehensive look at Base62 encoding.
Need to encode or decode Base62 quickly? Try our Online Base62 Encoder/Decoder.
1. What is Base62?
Base62 is an encoding method that uses 62 alphanumeric characters to represent binary data or numeric values. Compared to Base64, Base62 removes the two non-alphanumeric characters + and / (as well as the padding character =), resulting in output that consists purely of letters and digits.
This means Base62 encoded output is inherently URL-safe, filename-safe, and requires no additional escaping in virtually any context.
The Base62 Character Set
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
A total of 62 characters, broken down as follows:
| Type | Included Characters | Count | Index Range |
|---|---|---|---|
| Digits | 0-9 | 10 | 0 - 9 |
| Uppercase | A-Z | 26 | 10 - 35 |
| Lowercase | a-z | 26 | 36 - 61 |
Note: Different Base62 implementations may use different character orderings. The arrangement shown above (digits first, then uppercase, then lowercase) is the most common. Some implementations place lowercase letters before uppercase.
2. Why Choose Base62?
The core advantage of Base62 is that it’s a purely alphanumeric encoding, which brings several benefits:
2.1 Inherently URL-Safe
Standard Base64 uses + and / characters, which have special meanings in URLs and require percent-encoding (e.g., %2B, %2F). While the Base64url variant addresses this, Base62 avoids the issue entirely — its output contains only [0-9A-Za-z], requiring no additional processing.
2.2 No Padding Characters
Base64 uses = as a padding character to ensure output length is a multiple of 4, while Base62 requires no padding at all. This makes encoded results more concise and convenient for use in URL parameters, database fields, and API requests.
2.3 Double-Click Selectable
In most text editors and browsers, double-clicking selects a complete alphanumeric string. Since Base62 encoded output contains only letters and digits, users can easily double-click to select the entire encoded string for copying and sharing.
2.4 High Information Density
Compared to Base16 (hexadecimal), each Base62 character carries significantly more information:
| Encoding | Information per Character | Characters for 128-bit Value |
|---|---|---|
| Base16 (Hex) | 4 bits | 32 |
| Base36 | ~5.17 bits | 25 |
| Base58 | ~5.86 bits | 22 |
| Base62 | ~5.95 bits | 22 |
| Base64 | 6 bits | 22 (+padding) |
Base62’s information density is very close to Base64, but completely avoids special characters.
3. Base62 Compared to Other Encodings
| Feature | Base62 | Base64 | Base58 | Base36 | Hexadecimal |
|---|---|---|---|---|---|
| Character Set Size | 62 | 64 | 58 | 36 | 16 |
| Purely Alphanumeric | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes |
| Case Sensitive | Yes | Yes | Yes | No | No |
| URL Safe | ✅ Yes | ❌ No (standard) | ✅ Yes | ✅ Yes | ✅ Yes |
| Padding Character | None | = | None | None | None |
| Encoding Efficiency | ~74.5% | ~75% | ~73% | ~64.6% | 50% |
| Double-click Selectable | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes |
| Ambiguous Characters | Yes | Yes | ❌ No | No | No |
Base62 vs Base64: Base62 sacrifices minimal encoding efficiency in exchange for purely alphanumeric output, eliminating concerns about special character escaping.
Base62 vs Base58: Base58 excludes visually ambiguous characters (0, O, I, l), making it better suited for human transcription scenarios. Base62 retains all 62 alphanumeric characters for higher encoding efficiency, better suited for machine-generated and machine-processed scenarios.
Base62 vs Base36: Base36 is case-insensitive, which is more convenient in some scenarios (like case-insensitive file systems). Base62 is case-sensitive with higher encoding efficiency — the same string length can represent larger values.
4. How Base62 Works
4.1 Numeric Encoding
The most common use of Base62 is converting an integer to base-62 representation. The process is straightforward:
Encoding Process:
- Repeatedly divide the input value by 62.
- Record the remainder from each division.
- Map each remainder to the corresponding character in the Base62 alphabet.
- Reverse the result string (since remainders are produced from least significant to most significant).
Example: Converting 12345678 to Base62
12345678 ÷ 62 = 199123 remainder 52 → q
199123 ÷ 62 = 3211 remainder 41 → f
3211 ÷ 62 = 51 remainder 49 → n
51 ÷ 62 = 0 remainder 51 → p
Reversed: pnfq
So 12345678 in Base62 is pnfq.
Decoding Process:
Decoding pnfq back to decimal:
p = 51, n = 49, f = 41, q = 52
51 × 62³ + 49 × 62² + 41 × 62¹ + 52 × 62⁰
= 51 × 238328 + 49 × 3844 + 41 × 62 + 52
= 12154728 + 188356 + 2542 + 52
= 12345678
4.2 Byte Data Encoding
Similar to Base58, Base62 can also encode arbitrary byte data. The method treats the byte array as a big-endian big integer and repeatedly divides by 62:
- Interpret the byte array as a big-endian unsigned integer.
- Repeatedly divide by 62, recording remainders.
- Map remainders to characters.
- Reverse the result.
- Handle leading zero bytes (typically mapped to the character
0).
4.3 Fundamental Difference from Base64 Encoding
| Comparison | Base62 | Base64 |
|---|---|---|
| Encoding Method | Big integer division | Fixed 6-bit group mapping |
| Computational Complexity | O(n²) for byte encoding | O(n) |
| Output Length Predictability | Less precisely predictable | Precisely predictable |
| Padding | Not required | Required (=) |
Base64 splits every 3 bytes into four 6-bit groups, each directly mapped to a character. This approach is highly efficient but requires padding.
Base62’s byte encoding is based on mathematical division, which is less efficient computationally, but more flexible for encoding short data like IDs and hash values.
5. Common Use Cases
5.1 URL Shorteners / Short URL Services
This is Base62’s most classic application. The core logic behind services like bit.ly and tinyurl.com is:
- Store the original long URL in a database and get an auto-incrementing ID (e.g.,
12345678). - Encode that ID using Base62 (e.g.,
pngU). - Generate the short URL:
https://short.url/pngU.
Why Base62 instead of hexadecimal?
ID = 2147483647 (~2.1 billion)
Hexadecimal: 7FFFFFFF → 8 characters
Base62: 2LKcb1 → 6 characters
Base62 can represent larger values with fewer characters, making short URLs shorter.
6 Base62 characters can represent:
62⁶ = 56,800,235,584 (~56.8 billion)
This is more than enough for most URL shortening services.
5.2 Distributed Unique IDs
Generating globally unique IDs is a common requirement in distributed systems. Base62 is frequently used to format these IDs:
Twitter’s Snowflake IDs:
A Snowflake ID is a 64-bit integer. In decimal, it can be quite long (e.g., 1234567890123456789, 19 digits). Converting to Base62 can reduce this to about 11 characters.
Snowflake ID: 1234567890123456789
Decimal length: 19 characters
Base62 encoded: 1ly7VK1gLFbJ
Base62 length: 12 characters
UUID in Base62:
A UUID is 128-bit data with a standard format of 36 characters (including 4 hyphens). Base62 encoding can shorten this to about 22 characters:
UUID: 550e8400-e29b-41d4-a716-446655440000
Standard format: 36 characters
Hex (no hyphens): 32 characters
Base62 encoded: 5GcSJfHLwJdY7VabkAGH2a
Base62 length: 22 characters
5.3 Session Tokens and Temporary Identifiers
Session IDs, password reset tokens, and email verification codes in web applications need to:
- Be long enough to prevent guessing
- Consist of purely alphanumeric characters (URL-safe)
- Be easy to store and transmit
Base62 encoding perfectly meets these requirements. For example, generating a 128-bit random number and encoding it in Base62 produces a secure, compact token.
5.4 File Naming
When uploading files, unique and safe filenames need to be generated. Base62 encoding ensures filenames contain only letters and digits, making them safe across all operating systems and file systems.
Original name: IMG_2026_04_30_1914.jpg
Base62 name: 3hT9kL12Qm.jpg
5.5 Database Primary Keys
Some systems convert auto-incrementing database IDs to Base62 before exposing them to users, providing several benefits:
- Hides actual IDs: Users can’t directly guess total record count or other record IDs
- Shorter identifiers: More concise in URLs and APIs
- Good compatibility: Purely alphanumeric, no URL encoding needed
6. Programming Examples
6.1 JavaScript
// Base62 alphabet
const ALPHABET = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';
const BASE = 62n;
// Encode: integer → Base62
function base62Encode(num) {
if (num === 0 || num === 0n) return '0';
let n = BigInt(num);
let result = '';
while (n > 0n) {
const remainder = n % BASE;
n = n / BASE;
result = ALPHABET[Number(remainder)] + result;
}
return result;
}
// Decode: Base62 → integer
function base62Decode(str) {
let num = 0n;
for (const char of str) {
const index = ALPHABET.indexOf(char);
if (index === -1) throw new Error(`Invalid Base62 character: ${char}`);
num = num * BASE + BigInt(index);
}
return num;
}
// Encode byte data
function base62EncodeBytes(bytes) {
if (bytes.length === 0) return '';
// Count leading zeros
let leadingZeros = 0;
for (const byte of bytes) {
if (byte !== 0) break;
leadingZeros++;
}
// Convert to big integer
let num = 0n;
for (const byte of bytes) {
num = num * 256n + BigInt(byte);
}
// Repeatedly divide by 62
let result = '';
while (num > 0n) {
const remainder = num % BASE;
num = num / BASE;
result = ALPHABET[Number(remainder)] + result;
}
// Leading zeros represented by '0'
return '0'.repeat(leadingZeros) + result;
}
// Usage examples
console.log(base62Encode(12345678)); // pnfq
console.log(base62Decode('pnfq')); // 12345678n
// Byte encoding
const encoder = new TextEncoder();
console.log(base62EncodeBytes(encoder.encode('Hello')));
6.2 Python
# Base62 alphabet
ALPHABET = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
BASE = 62
def base62_encode(num: int) -> str:
"""Encode an integer to a Base62 string"""
if num == 0:
return '0'
result = ''
while num > 0:
num, remainder = divmod(num, BASE)
result = ALPHABET[remainder] + result
return result
def base62_decode(s: str) -> int:
"""Decode a Base62 string to an integer"""
num = 0
for char in s:
index = ALPHABET.index(char)
num = num * BASE + index
return num
def base62_encode_bytes(data: bytes) -> str:
"""Encode bytes to a Base62 string"""
# Count leading zeros
leading_zeros = 0
for byte in data:
if byte != 0:
break
leading_zeros += 1
# Convert to big integer
num = int.from_bytes(data, 'big')
# Repeatedly divide by 62
result = ''
while num > 0:
num, remainder = divmod(num, BASE)
result = ALPHABET[remainder] + result
return '0' * leading_zeros + result
# Usage examples
encoded = base62_encode(12345678)
print(encoded) # pnfq
print(base62_decode(encoded)) # 12345678
# UUID to Base62
import uuid
u = uuid.uuid4()
base62_uuid = base62_encode(u.int)
print(f"UUID: {u}")
print(f"Base62: {base62_uuid}")
print(f"Length: {len(str(u))} → {len(base62_uuid)}")
6.3 Java
import java.math.BigInteger;
public class Base62 {
private static final String ALPHABET =
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
private static final BigInteger BASE = BigInteger.valueOf(62);
public static String encode(BigInteger num) {
if (num.equals(BigInteger.ZERO)) return "0";
StringBuilder sb = new StringBuilder();
while (num.compareTo(BigInteger.ZERO) > 0) {
BigInteger[] divRem = num.divideAndRemainder(BASE);
num = divRem[0];
sb.insert(0, ALPHABET.charAt(divRem[1].intValue()));
}
return sb.toString();
}
public static BigInteger decode(String s) {
BigInteger num = BigInteger.ZERO;
for (char c : s.toCharArray()) {
int index = ALPHABET.indexOf(c);
if (index == -1) throw new IllegalArgumentException("Invalid char: " + c);
num = num.multiply(BASE).add(BigInteger.valueOf(index));
}
return num;
}
public static void main(String[] args) {
System.out.println(encode(BigInteger.valueOf(12345678))); // pnfq
System.out.println(decode("pnfq")); // 12345678
}
}
7. Best Practices in System Design
7.1 URL Shortener System Design
┌──────────┐ ┌──────────────────────┐ ┌──────────┐
│ User │ ───→ │ Generate Auto- │ ───→ │ Base62 │
│ Request │ │ Increment ID │ │ Encode │
│ Shorten │ │ (e.g., Snowflake) │ │ the ID │
└──────────┘ └──────────────────────┘ └──────────┘
│
▼
┌──────────┐
│ Store │
│ Mapping │
│ (code→URL)│
└──────────┘
7.2 Length Planning
Choose the appropriate Base62 string length based on your business requirements:
| Base62 Length | Maximum Value | Suitable For |
|---|---|---|
| 4 | ~14.78 million | Small internal systems |
| 6 | ~56.8 billion | URL shortening services |
| 7 | ~3.5 trillion | Large-scale URL shorteners |
| 8 | ~218 trillion | Large-scale ID systems |
| 11 | ~52 quintillion | Snowflake IDs |
| 22 | Covers 128 bits | UUID replacement |
7.3 Security Considerations
- Don’t use Base62 as encryption: Base62 is encoding, not encryption. Anyone can easily decode it.
- Prevent ID enumeration: If using Base62-encoded auto-incrementing IDs as public identifiers, attackers could enumerate all records by iterating consecutive values. Consider using random IDs or introducing an obfuscation step before encoding.
- Mind case sensitivity: Base62 is case-sensitive. Use caution in case-insensitive systems (such as certain DNS configurations or Windows file systems).
8. Frequently Asked Questions
What’s the difference between Base62 and Base64?
Base62 uses only 0-9, A-Z, and a-z — 62 alphanumeric characters total, excluding +, /, and =. Base64 additionally uses + and /, plus the padding character =. Base62 output is inherently URL-safe, while standard Base64 requires additional processing.
Is Base62 standardized?
Unlike Base64 (defined in RFC 4648), Base62 has no official RFC standard. However, its character set (0-9A-Za-z) is widely understood and used. The main variation across implementations is the character ordering.
Is Base62 an encryption algorithm?
No. Base62 is an encoding scheme that provides no security protection. Anyone can easily decode a Base62 string. If you need to protect data, use proper encryption algorithms (such as AES, RSA, etc.).
When should I choose Base62 over Base58?
If your data is primarily machine-generated and machine-processed (like short URLs, database IDs), Base62’s higher encoding efficiency makes it the better choice. If your data needs to be manually entered and transcribed (like cryptocurrency addresses), Base58 offers better human-friendliness by excluding visually ambiguous characters.
How do I ensure Base62 string uniqueness?
Base62 encoding itself is deterministic — the same input always produces the same output. Uniqueness depends on the input data’s uniqueness, not the encoding method itself. Use UUIDs, Snowflake IDs, or database auto-increment primary keys to ensure input uniqueness.
9. Conclusion
Base62 is a clean, efficient, purely alphanumeric encoding scheme particularly well-suited for applications requiring URL safety and compact representation.
| Scenario | Recommended Encoding |
|---|---|
| Short URLs / URL Shorteners | Base62 |
| Distributed Unique ID Display | Base62 |
| General Data Transfer | Base64 |
| Cryptocurrency Addresses | Base58Check |
| Case-insensitive Scenarios | Base36 |
| Debugging / Log Viewing | Hexadecimal |
While Base62 hasn’t been standardized in an RFC like Base64, its value in engineering practice is undeniable. Whether you’re designing a URL shortening system or building a distributed ID scheme, Base62 is a dependable choice.
Want to try Base62 encoding and decoding yourself? Use our Online Base62 Encoder/Decoder for quick integer encoding and text conversion.