Regular expressions
A regular expression (shortened as regex or regexp) is a string that specifies a pattern in text.
Matching
| Expression | Description |
|---|
| . | any character except new line (includes new line with s flag) |
| \d | digit |
| \D | not digit |
| \s | whitespace |
| \S | not whitespace |
| \w | word character |
| \W | not word character |
| ^ | the beginning of text (or start-of-line with multi-line mode) |
| $ | the end of text (or end-of-line with multi-line mode) |
| \A | only the beginning of text (even with multi-line mode enabled) |
| \z | only the end of text (even with multi-line mode enabled) |
| \b | a Unicode word boundary (\w on one side and \W, \A, or \z on other) |
| \B | not a Unicode word boundary |
| [xyz] | A character class matching either x, y or z (union) |
| [^xyz] | A character class matching any character except x, y and z |
| [a-z] | A character class matching any character in range a-z |
| [[:alpha:]] | ASCII character class ([A-Za-z]) |
| [[:^alpha:]] | Negated ASCII character class ([^A-Za-z]) |
| [x[^xyz]] | Nested/grouping character class (matching any character except y and z) |
| [a-y&&xyz] | Intersection (matching x or y) |
| [0-9&&[^4]] | Subtraction using intersection and negation (matching 0-9 except 4) |
| [0-9--4] | Direct subtraction (matching 0-9 except 4) |
| [a-g~~b-h] | Symmetric difference (matching `a` and `h` only) |
| [\[\]] | Escaping in character classes (matching [ or ]) |
Repetitions
| Expression | Description |
|---|
| x* | zero or more of x (greedy) |
| x+ | one or more of x (greedy) |
| x? | zero or one of x (greedy) |
| x*? | zero or more of x (ungreedy/lazy) |
| x+? | one or more of x (ungreedy/lazy) |
| x?? | zero or one of x (ungreedy/lazy) |
| x{n,m} | at least n x and at most m x (greedy) |
| x{n,} | at least n x (greedy) |
| x{n} | exactly n x |
| x{n,m}? | at least n x and at most m x (ungreedy/lazy) |
| x{n,}? | at least n x (ungreedy/lazy) |
| x{n}? | exactly n x |
Escape sequences
| Expression | Description |
|---|
| \* | literal *, works for any punctuation character: \.+*?()|[]{}^$ |
| \a | bell (\x07) |
| \f | form feed (\x0C) |
| \t | horizontal tab |
| \n | new line |
| \r | carriage return |
| \v | vertical tab (\x0B) |
| \123 | octal character code (up to three digits) (when enabled) |
| \x7F | hex character code (exactly two digits) |
| \x{10FFFF} | any hex character code corresponding to a Unicode code point |
| \u007F | hex character code (exactly four digits) |
| \u{7F} | any hex character code corresponding to a Unicode code point |
| \U0000007F | hex character code (exactly eight digits) |
| \U{7F} | any hex character code corresponding to a Unicode code point |
ASCII character classes
| Expression | Description |
|---|
| [[:alnum:]] | alphanumeric ([0-9A-Za-z]) |
| [[:alpha:]] | alphabetic ([A-Za-z]) |
| [[:ascii:]] | ASCII ([\x00-\x7F]) |
| [[:blank:]] | blank ([\t ]) |
| [[:cntrl:]] | control ([\x00-\x1F\x7F]) |
| [[:digit:]] | digits ([0-9]) |
| [[:graph:]] | graphical ([!-~]) |
| [[:lower:]] | lower case ([a-z]) |
| [[:print:]] | printable ([ -~]) |
| [[:punct:]] | punctuation ([!-/:-@\[-`{-~]) |
| [[:space:]] | whitespace ([\t\n\v\f\r ]) |
| [[:upper:]] | upper case ([A-Z]) |
| [[:word:]] | word characters ([0-9A-Za-z_]) |
| [[:xdigit:]] | hex digit ([0-9A-Fa-f]) |