Production-ready regex patterns (and where they fail)
After a decade of writing regex against real-world data, here are the patterns I reach for first — and, importantly, the failure modes I no longer ignore. None of these are “perfect”. Perfect regex usually means a parser would have been the right tool instead. Paste any of them into the tester above to see them work on your own input.
Email — pick the right pragmatism level
The fully RFC 5322-compliant regex is over 6 000 characters and accepts unicode local parts, IP-literal domains, and quoted-string addresses. Nobody uses it. What I use in production:
^[^\s@]+@[^\s@]+\.[^\s@]+$
It accepts almost anything that “looks like an email” — which is the point. Actual deliverability is checked by sending the verification email anyway. Don’t try to be a postmaster with regex; you will lose to user.name+tag@sub.example.co.uk within a day.
URL detection in free text
https?:\/\/[^\s<>"]+
Pragmatic. The common trap is writing https?:\/\/\S+ and then watching it swallow the trailing punctuation in “Check https://example.com.” Excluding <, >, " covers most HTML-embedding cases; strip trailing ., ,, ), ] in post-processing.
Phone numbers — country-specific, always
A single international phone regex that doesn’t either reject valid numbers or accept garbage does not exist. For the E.164 canonical form:
^\+[1-9]\d{1,14}$
For US (loose):
^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
For France:
^(?:\+33|0)[1-9](?:[\s.]?\d{2}){4}$
The right tool for anything beyond a UI hint is Google’s libphonenumber. Regex catches obviously-wrong input; libphonenumber tells you whether the number is actually dialable in its country.
Date validation — don’t use regex
A pattern like ^\d{4}-\d{2}-\d{2}$ happily accepts 2024-13-45. To check whether a date is real, parse it (new Date() in JavaScript, datetime.strptime() in Python, Carbon::parse() in PHP, chrono::NaiveDate::parse_from_str() in Rust) and verify it round-trips back to the same string. Regex tells you a string looks like a date. Parsing tells you it is one.
Password strength — also don’t use regex
A regex that requires 8+ characters with one upper, one lower, one digit and one symbol scores high on outdated checkbox audits and zero on actual security. Password1! passes it. A 20-character random sentence (vastly stronger) fails it. The current consensus — NIST SP 800-63B — is: minimum length (8–12), allow long passphrases, and check against a breach database (HaveIBeenPwned API). Drop the regex policy.
Trim whitespace
^\s+|\s+$ with the global flag — but if your runtime has .trim() or strip(), use that. Regex is for cases where the built-in doesn’t exist.
Takeaway: regex shines for recognition (does this look like X?), not for validation (is this really X?). For anything safety-critical — emails delivered, phone numbers dialled, dates stored, passwords secured — pair the regex with a real check downstream. The patterns above are battle-tested first lines of defence, not silver bullets.
Sources: RFC 5322 (email) · Google libphonenumber · NIST SP 800-63B (passwords) · RFC 3986 (URI).