I know regular expressions!
Translated from German using DeepL.
Date: October 2020
Reading time: 6 minutes
Source: https://content.codecademy.com/courses/regex/regular_expressions_xkcd_2.png (opens in a new tab)
Regex?
Imagine you are ordering a product online. Before you can pay, you have to register. You will be asked to enter the following information:
- e-mail address
- username
- password
Your entries will be validated. But how is this done?
There is a technology that is present on almost every website: Regex A regex, also known as a "regular expression", is a sequence of characters that defines a search pattern. The matching strings can then be found using this search pattern.
Regex enables the following:
- Validate a user's input in the form
- Check texts
- Evaluate results
- Search for words in emails or on websites
Regex is integrated in many tools and languages:
- Unix
- Java, Python, PHP, ...
- Development environments and text editors
Application
As described above, you can use regex for various things. Here I show this using examples. The matches in the text are highlighted
.
Literally
You can search for a specific string.
Apple
Apple
Google
Samsung
Alternatively
Sometimes you search for several words. You can link several queries with "|" (or). In the example, you are searching for baseball or football.
baseball|football
baseball
football
rugby
Groups of characters
This regular expression is often used to find words with different spellings. These results can be standardized later.
Ka[iy]
Kai
Kay
Kao
Kau
Note
[kay]
does not match "kay", only the individual characters match.
points
Sometimes you know that a number occurs.
Tim has scored . goals
Tim has scored 3 goals
Tim has scored no goals
Tom has scored 5 goals
But how do you search for a "."?
As I am looking for a sentence here, I would like to add a period at the end. This is what my regex looks like.
Tim has scored . goals.
It worked! Tim has scored 2 goals.
matches.
However, the sentence Tim has scored 2 goals!
now also matches. This is because the period is not a punctuation mark. Here, too, it only represents a character. To validate after a period, you have to put a backslash in front of it.
Tim has scored . goals\.
This applies to all characters that have a different meaning in Regex.
Note
The dots can also be used to validate after a certain number of characters.
..........
Ranges
To search for ranges of characters, you can use this regular expression.
Lebron James scored [40-50] points
Lebron James scored 46 points
Lebron James scored 38 points
Michael Jordan scored 40 points
Note
It is also possible to search for an area with letters.
Abbreviations
[0-9][ \t\r\n\f\v][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_]
A finished regex could look like this. This requires a digit followed by a space. This is followed by seven characters.
Such queries can be long and incomprehensible. That is why there are abbreviations.
Abbreviated | Written out | Name | Meaning |
---|---|---|---|
\w | [A-Za-z0-9_] | word character | capital letter, lowercase letter, digit, underscore |
\d | [0-9] | digit character | digit |
\s | [ \t\r\n\f\v] | whitespace character | spacing, Tab, Carriage return, Word wrap, Page break, Vertical tab |
\W | [^A-Za-z0-9_] | non-word character | All characters except word characters |
\D | [^0-9] | non-digit character | All characters except digits |
\S | [^ \t\r\n\f\v] | non-whitespace character | All characters except whitespace characters |
Using these abbreviations makes the query shorter and easier to understand.
\d\s\w\w\w\w\w\w\w\w
Grouping
How you group the statements is often crucial.
In this example, I want to filter out the following sentences:
- I like rice
- I like noodles
I like rice|noodles
If I execute them now, I don't get the desired result. But why?
I search for "I like rice" or "noodles" using the OR operator.
I like rice|noodles
In order to validate correctly, I have to adjust the expression.
I like (rice|noodles)
You can group the queries with brackets.
I like rice
I like noodles
I don't like noodles
Quantities
Fix
Earlier we had the following query.
\d\s\w\w\w\w\w\w\w\w
This is already very clear. But it can be even better.
You can use the curly brackets to specify a number.
\d\s\w{7}
In this example, there must be exactly 7 characters. However, you can also set a minimum and maximum.
mu{3,7}h
Here, there should be a minimum of 3 and a maximum of 7 characters.
0 or more
Use *
to specify that the previous character may appear 0 or more times.
he*y
hy
heeeeey
hey
1 or more
A +
specifies that the previous character may occur once or more than once.
he+lp
help
heeeeeeelp
hello
Optional
With ?
you specify that the characters can occur but do not have to.
Xander Bogaerts is a (good )?baseball player
Xander Bogaerts is a baseball player
Xander Bogaerts is a good baseball player
James Harden is a good baseball player
Note
A backslash must also be written in front of the question mark so that it can be searched for.
Anchor
This regex starts with a ^
and ends with a $
. This means that it only matches if everything in between matches 100%. So if another word occurs at the beginning or end, the regex is not correct.
^regular expressions are useful$
regular expressions are useful
i think regular expressions are useful
regular expressions are useful, so i think
Caret
The ^
symbol reverses the statement of the expression.
^baseball
Matches every word except baseball
.
Practice
We have found a string above. But what now? What can you do with this string?
Until now, I have only used regex to validate forms. But you don't have to do this.
If you create a form, there are various types
in the input
that you can use. These then validate the input.
<input type="email" placeholder="Mail" />
With a title
you can even include your own error message. However, this validation has one disadvantage:
inaccuracy - it looks for gross errors. However, in my opinion, important things are not validated.
Regex solves this problem. You can write your own query for it. Mine looks like this:
^[^\.](\.|\w|!|#|\$|%|&|'|\*|\+|-||\/|=|\?|^|_|`|{|\||}|~){0,64}@[a-z-]+\.\w+
Now there are several ways to use this regex.
Variant 1
This solution does not require JS. In HTML, you specify the pattern in the input element.
<input class="tel" pattern="(\+41|0041|0)79\d{3}\d{2}\d{2}" />
Now you can style the input element if the regex does not apply.
.tel:invalid {
border-color: rgb(255, 79, 102);
}
Advantages
If the validation were carried out in JS, the number would first have to be converted into a string. This is not the case here. In addition, when filling out the form, it is always clear whether the input is correct.
Disadvantage
Allowing spacing is possible but not easy. You could extend the regex so that spaces are allowed. However, this would make the expression more confusing.
Variant 2
This solution requires HTML and JS. With onfocusout
you can call a function when the element is no longer in focus.
<input id="zip" onfocusout="zipFocused()" />
In the JS function, I convert the numbers into a string. All spaces are then removed.
function zipFocused() {
let zip = document.getElementById('zip');
let stringZip = zip.value.toString();
let formattedZip = stringZip.replaceAll(' ', '');
if (formattedZip.match(/[1-9][0-6][0-5][0-8]/)) {
correctStyle(zip);
} else {
errorStyle(zip);
}
}
In line 5, I use my regex to validate the input. I use match for this. This is one of many regex methods. Other methods:
matchAll()
replace()
replaceAll()
search()
split()
Advantages
In JS, I can still remove characters before validating. The HTML is also cleaner, as the validation is outsourced.
Disadvantages
You need a little JS knowledge. You also have to convert other types into a string first.
Another disadvantage, in this example, is that you cannot directly see whether the zip code is correct when you enter it.
Conclusion
I find Regex very interesting and can recommend the following links:
Name | Link | Description |
---|---|---|
I hate regex | https://ihateregex.io/ (opens in a new tab) | You can find a regex for the most important validations here. There is also an area where you can test the expression. Unfortunately, some regular expressions still need to be adapted. |
Regex101 | https://regex101.com/ (opens in a new tab) | Helpful tool for writing expressions |