.NET Regex: \d is different from [0-9]
From the .NET documentation of Regex, \d
matches any decimal digit. The signification of a "decimal digit" depends on the options of the regex:
- Without
RegexOptions.ECMAScript
(default):\d
means\p{Nd}
, e.g. any character from the Unicode category "Decimal digit" - With
RegexOptions.ECMAScript
:\d
means[0-9]
The Unicode category "Decimal digit" contains characters such as 0, 1 or 2 but also characters from other languages such as ٣, ٧, ൩ or ໓. The full list contains 610 characters:
0x0030-0x0039, // ASCII
0x0660-0x0669, // Arabic-Indic
0x06f0-0x06f9, // Eastern Arabic-Indic
0x0966-0x096f, // Devanagari
0x09e6-0x09ef, // Bengali
0x0a66-0x0a6f, // Gurmukhi
0x0ae6-0x0aef, // Gujarati
0x0b66-0x0b6f, // Oriya
0x0c66-0x0c6f, // Telugu
0x0ce6-0x0cef, // Kannada
0x0d66-0x0d6f, // Malayalam
0x0e50-0x0e59, // Thai
0x0ed0-0x0ed9, // Lao
0x0f20-0x0f29, // Tibetan
0x1040-0x1049, // Myanmar
0x17e0-0x17e9, // Khmer
0x1810-0x1819, // Mongolian
0x1946-0x194f, // Limbu
0xff10-0xff19, // Fullwidth
0x1d7ce-0x1d7d7 // Math Bold
0x1d7d8-0x1d7e1 // Math Double
0x1d7e2-0x1d7eb // Math SansSerif
0x1d7ec-0x1d7f5 // Math SS Bold
0x1d7f6-0x1d7ff // Math Monosp
Here're some examples to show the differences:
C#
// \u0030 - \u0039
Regex.IsMatch("0123456789", "\\d{10}"); // True
Regex.IsMatch("0123456789", "[0-9]{10}"); // True
// DEVANAGARI DIGIT: \u0966 - \u096F
Regex.IsMatch("०१२३४५६७८९", "\\d{10}"); // True
Regex.IsMatch("०१२३४५६७८९", "[0-9]{10}"); // False
// RegexOptions.ECMAScript
Regex.IsMatch("0123456789", "\\d{10}", RegexOptions.ECMAScript); // True
Regex.IsMatch("०१२३४५६७८९", "\\d{10}", RegexOptions.ECMAScript); // False
The next time you want to match a digit in a regex, make sure to know which kind of digits you want to match [0-9]
or \p{Nd}
.
Do you have a question or a suggestion about this post? Contact me!
Enjoy this blog?💖 Sponsor on GitHub