Guide

The Complete Guide to Regular Expressions: Syntax, Patterns, and Practical Tips

A comprehensive guide to regular expression syntax, metacharacters, quantifiers, grouping, assertions, and common matching patterns. Packed with practical examples to help you master text matching and processing.

Regular expressions (commonly known as Regex or RegExp) are a compact, powerful mini-language for describing text patterns. Whether you’re validating form inputs, analyzing logs, performing search-and-replace operations, or cleaning data, regular expressions are an indispensable tool in every developer’s toolkit.

While the syntax may look intimidating at first, once you understand the core concepts, you’ll be able to apply them to real-world tasks quickly. This guide will take you from zero to proficiency with regular expressions.

If you want to test regular expressions interactively, try our Online Regex Tester, which highlights matches in real time and helps you debug expressions effortlessly.

1. What is a Regular Expression?

A regular expression is a string composed of ordinary characters and special characters (metacharacters) that defines a search pattern. This pattern can be used to:

  • Match: Determine whether a piece of text conforms to a pattern
  • Search: Find all substrings in a text that match a pattern
  • Replace: Substitute matched portions with new content
  • Extract: Pull specific formatted data from text

Regular expressions are supported by virtually every programming language (JavaScript, Python, Java, Go, PHP, etc.) and many command-line tools (grep, sed, awk).

2. Basic Syntax

2.1 Literal Characters

Most letters and digits in a regex match themselves. For example, the regex hello matches the literal text “hello” in a string.

2.2 Metacharacters

Metacharacters are characters with special meanings in regular expressions. They form the backbone of regex functionality.

MetacharacterMeaningExample
.Matches any single character except the newline \na.c matches “abc”, “a1c”, “a-c”
\Escape character; turns a metacharacter into a literal\. matches a literal ”.”
^Matches the start of a string (or the start of each line in multiline mode)^Hello matches lines starting with “Hello”
$Matches the end of a string (or the end of each line in multiline mode)world$ matches lines ending with “world”
|Logical OR; matches either the expression on the left or the rightcat|dog matches “cat” or “dog”

2.3 Character Classes

Character classes are defined with square brackets [] and match any single character within the brackets.

ExpressionMeaningExample
[abc]Matches any one of a, b, or c[aeiou] matches any lowercase vowel
[a-z]Matches any lowercase letter from a to z[A-Za-z] matches any English letter
[0-9]Matches any digit[0-9] is usually equivalent to \d
[^abc]Negation; matches any character not in the brackets[^0-9] matches any non-digit

Note: Inside a character class, most metacharacters lose their special meaning. For example, [.] matches a literal ”.”, not any character. However, \, ], ^ (only at the beginning), and - (in the middle) still retain special significance.

Additional note: In some regex engines with Unicode mode enabled, \d may match a wider range of numeric characters (such as digits from other scripts), while [0-9] only matches ASCII digits (0–9).

2.4 Predefined Character Classes

For convenience, regex provides shorthand notations for commonly used character classes.

ShorthandEquivalentMeaning
\d[0-9]Matches any digit
\D[^0-9]Matches any non-digit character
\w[A-Za-z0-9_]Matches any “word character” (letter, digit, underscore)
\W[^A-Za-z0-9_]Matches any non-word character
\s[ \t\n\r\f\v]Matches any whitespace character (space, tab, newline, etc.)
\S[^ \t\n\r\f\v]Matches any non-whitespace character
\bMatches a word boundary
\BMatches a non-word boundary

3. Quantifiers

Quantifiers specify how many times the preceding character or group should appear.

QuantifierMeaningExample
*Matches 0 or more timesab*c matches “ac”, “abc”, “abbc”
+Matches 1 or more timesab+c matches “abc”, “abbc”, but not “ac”
?Matches 0 or 1 timecolou?r matches “color” and “colour”
{n}Matches exactly n times\d{4} matches exactly 4 digits
{n,}Matches at least n times\d{2,} matches 2 or more digits
{n,m}Matches between n and m times (inclusive)\d{2,4} matches 2 to 4 digits

3.1 Greedy vs. Lazy Matching

By default, quantifiers are greedy—they match as many characters as possible. Adding a ? after the quantifier makes it lazy (non-greedy), matching as few characters as possible.

Greedy:  <.+>   on "<em>hello</em>" matches the entire "<em>hello</em>"
Lazy:    <.+?>  on "<em>hello</em>" matches "<em>" and "</em>"

This distinction is critical when processing HTML tags or similar structured content.

4. Groups and Capturing

4.1 Capturing Groups ()

Parentheses () group multiple characters into a logical unit and capture the matched content for later use.

(abc)+      Matches one or more consecutive occurrences of "abc"
(\d{4})-(\d{2})-(\d{2})   Matches a date format, capturing year, month, and day separately

In replacements, you can reference captured groups using $1, $2 (or \1, \2, depending on the language).

4.2 Non-Capturing Groups (?:)

Sometimes you need grouping without capturing. Use (?:) to avoid unnecessary performance overhead.

(?:https?|ftp)://    Groups but does not capture the protocol part

4.3 Named Groups (?<name>)

Some regex engines support named capturing groups, making the results more readable.

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

You can reference named groups using $<year> or the corresponding API in your language.

5. Assertions

Assertions specify conditions at a position in the string without consuming characters (zero-width matching).

5.1 Anchor Assertions

  • ^: Matches the start of the string
  • $: Matches the end of the string
  • \b: Matches a word boundary
\bcat\b    Matches the whole word "cat" only, not "cat" within "category"

5.2 Lookahead Assertions

SyntaxNameMeaning
(?=pattern)Positive lookaheadMatches a position followed by the pattern
(?!pattern)Negative lookaheadMatches a position not followed by the pattern
\d+(?=%)      Matches digits followed by "%", e.g. "100" in "100%"
foo(?!bar)    Matches "foo" that is not followed by "bar"

5.3 Lookbehind Assertions

SyntaxNameMeaning
(?<=pattern)Positive lookbehindMatches a position preceded by the pattern
(?<!pattern)Negative lookbehindMatches a position not preceded by the pattern
(?<=\$)\d+    Matches digits preceded by "$", e.g. "99" in "$99"
(?<!\\)\"     Matches double quotes not preceded by a backslash

Note: Lookbehind assertions are not supported by all regex engines. JavaScript has supported them since ES2018; Python and Java both support them. Some engines require the pattern inside a lookbehind to be of fixed length.

6. Flags (Modifiers)

Flags modify the behavior of a regular expression.

FlagNameMeaning
iCase-InsensitiveMatching is case-insensitive
gGlobalFinds all matches rather than stopping at the first one
mMultiline^ and $ match the start and end of each line, not just the whole string
sDotAll (Single-line)Makes . also match the newline character \n
uUnicodeEnables full Unicode matching

Usage varies slightly across languages:

// JavaScript
/pattern/gi

// Python
re.compile(r'pattern', re.IGNORECASE | re.MULTILINE)

7. Common Regex Patterns

Below are some tried-and-tested regular expression patterns. Always adapt them to your specific use case.

7.1 Email Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Explanation: Matches standard email addresses. The local part allows letters, digits, dots, underscores, percent signs, plus signs, and hyphens; the domain part allows letters, digits, dots, and hyphens; the top-level domain requires at least 2 letters.

Tip: A fully RFC 5322-compliant email regex is extremely complex. The pattern above works for most common scenarios, but consider combining it with server-side validation for strict requirements.

7.2 URL Matching

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

Explanation: Matches URLs starting with http:// or https://.

7.3 IPv4 Address

^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$

Explanation: Matches standard IPv4 addresses where each octet ranges from 0 to 255.

7.4 Strong Password Validation

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Explanation: Requires at least 8 characters, including at least one lowercase letter, one uppercase letter, one digit, and one special character.

7.5 Date Format (YYYY-MM-DD)

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Explanation: Matches dates in YYYY-MM-DD format. Note that this regex does not validate the logical validity of the date (e.g., February 30 would pass).

7.6 Hexadecimal Color Code

^#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$

Explanation: Matches 3-digit or 6-digit hexadecimal color codes such as #FFF or #FF5733.

7.7 US Phone Number

^(\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

Explanation: Matches US phone numbers in various formats like (555) 123-4567, 555-123-4567, +1 555 123 4567, etc.

7.8 US Social Security Number (SSN)

^\d{3}-\d{2}-\d{4}$

Explanation: Matches the standard SSN format XXX-XX-XXXX.

8. Regex in Different Languages

8.1 JavaScript

// Creating a regex
const regex = /\d+/g;
// Or using the constructor
const regex2 = new RegExp('\\d+', 'g');

// Testing for a match
regex.test('hello 123');          // true

// Finding matches
'hello 123 world 456'.match(/\d+/g);  // ['123', '456']

// Replacing
'hello 123'.replace(/\d+/, '***');     // 'hello ***'

// Capturing groups
const match = '2026-04-19'.match(/(\d{4})-(\d{2})-(\d{2})/);
// match[1] = '2026', match[2] = '04', match[3] = '19'

8.2 Python

import re

# Searching for a match
re.search(r'\d+', 'hello 123')       # Returns a Match object

# Finding all matches
re.findall(r'\d+', 'hello 123 world 456')   # ['123', '456']

# Replacing
re.sub(r'\d+', '***', 'hello 123')         # 'hello ***'

# Capturing groups
match = re.match(r'(\d{4})-(\d{2})-(\d{2})', '2026-04-19')
match.group(1)  # '2026'
match.group(2)  # '04'
match.group(3)  # '19'

# Named groups
match = re.match(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})', '2026-04-19')
match.group('year')  # '2026'

8.3 Java

import java.util.regex.*;

// Compiling a regex
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("hello 123 world 456");

// Finding all matches
while (matcher.find()) {
    System.out.println(matcher.group());  // Outputs "123", then "456"
}

// Replacing
String result = "hello 123".replaceAll("\\d+", "***");
// result = "hello ***"

9. Performance Optimization and Best Practices

9.1 Beware of Catastrophic Backtracking

Regex engines can suffer from catastrophic backtracking with certain patterns, causing execution time to grow exponentially. Typical dangerous patterns include:

(a+)+b         Nested quantifiers
(a|aa)+b       Overlapping alternatives
(.*a){n}       Quantifier combinations with backtracking

Prevention strategies:

  • Avoid nested quantifiers (e.g., (a+)+)
  • Use atomic groups or possessive quantifiers (if the engine supports them)
  • Make your expressions as deterministic as possible

9.2 Prefer Specific Character Classes

# Not recommended
.+@.+\..+

# Recommended
[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}

# Not recommended
.*foo.*

# Recommended
[^\r\n]*foo[^\r\n]*    # for single-line text only

Explanation: When you already know the allowed character range, prefer explicit character classes instead of . or overly broad .*. This is usually clearer and can reduce unnecessary backtracking.

9.3 Use Anchors Wisely

Using ^ and $ helps the engine quickly rule out non-matching positions, significantly improving performance.

9.4 Compile and Reuse Regex Objects

When using the same regex multiple times, compile it once and reuse the compiled object:

# Python: compile and reuse
pattern = re.compile(r'\d{4}-\d{2}-\d{2}')
for line in lines:
    if pattern.search(line):
        # Process match
        pass
// JavaScript: create regex outside the loop
const pattern = /\d{4}-\d{2}-\d{2}/g;

9.5 Don’t Use Regex for Everything

Regular expressions are not a silver bullet. Consider specialized tools for these scenarios:

  • Parsing HTML/XML: Use a DOM parser (e.g., Python’s BeautifulSoup, JavaScript’s DOMParser)
  • Parsing JSON: Use native methods like JSON.parse()
  • Complex grammar analysis: Use parser generators (e.g., ANTLR, PEG.js)

10. Online Testing Tool

When writing and debugging regular expressions, a visual tool can dramatically boost your productivity. We recommend our Online Regex Tester, which offers:

  • Real-time matching with highlighted results
  • Support for multiple flags (global, case-insensitive, multiline, etc.)
  • Visual display of captured groups

11. Conclusion

Regular expressions are one of the most powerful text processing tools in a programmer’s arsenal. While their syntax is compact and may seem confusing at first, once you master the core concepts—character classes, quantifiers, groups, and assertions—you’ll have the ability to handle virtually any text pattern.

The best way to learn is by doing. Head over to our Online Regex Tester and start experimenting with every example in this article!