2020
Regex

I know regular expressions!

Translated from German using DeepL.

Date: October 2020
Reading time: 6 minutes


comic

Source: https://content.codecademy.com/courses/regex/regular_expressions_xkcd_2.png (opens in a new tab)

Regex?

Imagine you are ordering a product online. Before you can pay, you have to register. You will be asked to enter the following information:

  • e-mail address
  • username
  • password

Your entries will be validated. But how is this done?

There is a technology that is present on almost every website: Regex A regex, also known as a "regular expression", is a sequence of characters that defines a search pattern. The matching strings can then be found using this search pattern.

Regex enables the following:

  • Validate a user's input in the form
  • Check texts
  • Evaluate results
  • Search for words in emails or on websites

Regex is integrated in many tools and languages:

  • Unix
  • Java, Python, PHP, ...
  • Development environments and text editors

Application

As described above, you can use regex for various things. Here I show this using examples. The matches in the text are highlighted.

Literally

You can search for a specific string.

Apple

Apple
Google
Samsung

Alternatively

Sometimes you search for several words. You can link several queries with "|" (or). In the example, you are searching for baseball or football.

baseball|football

baseball
football
rugby

Groups of characters

This regular expression is often used to find words with different spellings. These results can be standardized later.

Ka[iy]

Kai
Kay
Kao
Kau

Note

[kay] does not match "kay", only the individual characters match.

points

Sometimes you know that a number occurs.

Tim has scored . goals

Tim has scored 3 goals
Tim has scored no goals
Tom has scored 5 goals

But how do you search for a "."?

As I am looking for a sentence here, I would like to add a period at the end. This is what my regex looks like.

Tim has scored . goals.

It worked! Tim has scored 2 goals. matches.

However, the sentence Tim has scored 2 goals! now also matches. This is because the period is not a punctuation mark. Here, too, it only represents a character. To validate after a period, you have to put a backslash in front of it.

Tim has scored . goals\.

This applies to all characters that have a different meaning in Regex.

Note

The dots can also be used to validate after a certain number of characters.

..........

Ranges

To search for ranges of characters, you can use this regular expression.

Lebron James scored [40-50] points

Lebron James scored 46 points Lebron James scored 38 points Michael Jordan scored 40 points

Note

It is also possible to search for an area with letters.

Abbreviations

[0-9][ \t\r\n\f\v][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_]

A finished regex could look like this. This requires a digit followed by a space. This is followed by seven characters.
Such queries can be long and incomprehensible. That is why there are abbreviations.

AbbreviatedWritten outNameMeaning
\w[A-Za-z0-9_]word charactercapital letter, lowercase letter, digit, underscore
\d[0-9]digit characterdigit
\s[ \t\r\n\f\v]whitespace characterspacing, Tab, Carriage return, Word wrap, Page break, Vertical tab
\W[^A-Za-z0-9_]non-word characterAll characters except word characters
\D[^0-9]non-digit characterAll characters except digits
\S[^ \t\r\n\f\v]non-whitespace characterAll characters except whitespace characters

Using these abbreviations makes the query shorter and easier to understand.

\d\s\w\w\w\w\w\w\w\w

Grouping

How you group the statements is often crucial.
In this example, I want to filter out the following sentences:

  • I like rice
  • I like noodles
I like rice|noodles

If I execute them now, I don't get the desired result. But why?
I search for "I like rice" or "noodles" using the OR operator.
I like rice|noodles

In order to validate correctly, I have to adjust the expression.

I like (rice|noodles)

You can group the queries with brackets.

I like rice I like noodles I don't like noodles

Quantities

Fix

Earlier we had the following query.

\d\s\w\w\w\w\w\w\w\w

This is already very clear. But it can be even better.
You can use the curly brackets to specify a number.

\d\s\w{7}

In this example, there must be exactly 7 characters. However, you can also set a minimum and maximum.

mu{3,7}h

Here, there should be a minimum of 3 and a maximum of 7 characters.

0 or more

Use * to specify that the previous character may appear 0 or more times.

he*y

hy heeeeey hey

1 or more

A + specifies that the previous character may occur once or more than once.

he+lp

help heeeeeeelp hello

Optional

With ? you specify that the characters can occur but do not have to.

Xander Bogaerts is a (good )?baseball player

Xander Bogaerts is a baseball player Xander Bogaerts is a good baseball player James Harden is a good baseball player

Note

A backslash must also be written in front of the question mark so that it can be searched for.

Anchor

This regex starts with a ^ and ends with a $. This means that it only matches if everything in between matches 100%. So if another word occurs at the beginning or end, the regex is not correct.

^regular expressions are useful$

regular expressions are useful i think regular expressions are useful regular expressions are useful, so i think

Caret

The ^ symbol reverses the statement of the expression.

^baseball

Matches every word except baseball.

Practice

We have found a string above. But what now? What can you do with this string?

Until now, I have only used regex to validate forms. But you don't have to do this.
If you create a form, there are various types in the input that you can use. These then validate the input.

<input type="email" placeholder="Mail" />

mail

With a title you can even include your own error message. However, this validation has one disadvantage: inaccuracy - it looks for gross errors. However, in my opinion, important things are not validated.

Regex solves this problem. You can write your own query for it. Mine looks like this:

^[^\.](\.|\w|!|#|\$|%|&|'|\*|\+|-||\/|=|\?|^|_|`|{|\||}|~){0,64}@[a-z-]+\.\w+

Now there are several ways to use this regex.

Variant 1

This solution does not require JS. In HTML, you specify the pattern in the input element.

<input class="tel" pattern="(\+41|0041|0)79\d{3}\d{2}\d{2}" />

Now you can style the input element if the regex does not apply.

.tel:invalid {
    border-color: rgb(255, 79, 102);
}

Advantages

If the validation were carried out in JS, the number would first have to be converted into a string. This is not the case here. In addition, when filling out the form, it is always clear whether the input is correct.

Disadvantage

Allowing spacing is possible but not easy. You could extend the regex so that spaces are allowed. However, this would make the expression more confusing.

Variant 2

This solution requires HTML and JS. With onfocusout you can call a function when the element is no longer in focus.

<input id="zip" onfocusout="zipFocused()" />

In the JS function, I convert the numbers into a string. All spaces are then removed.

function zipFocused() {
    let zip = document.getElementById('zip');
    let stringZip = zip.value.toString();
    let formattedZip = stringZip.replaceAll(' ', '');
    if (formattedZip.match(/[1-9][0-6][0-5][0-8]/)) {
        correctStyle(zip);
    } else {
        errorStyle(zip);
    }
}

In line 5, I use my regex to validate the input. I use match for this. This is one of many regex methods. Other methods:

  • matchAll()
  • replace()
  • replaceAll()
  • search()
  • split()

Advantages

In JS, I can still remove characters before validating. The HTML is also cleaner, as the validation is outsourced.

Disadvantages

You need a little JS knowledge. You also have to convert other types into a string first.
Another disadvantage, in this example, is that you cannot directly see whether the zip code is correct when you enter it.

Conclusion

I find Regex very interesting and can recommend the following links:

NameLinkDescription
I hate regexhttps://ihateregex.io/ (opens in a new tab)You can find a regex for the most important validations here. There is also an area where you can test the expression. Unfortunately, some regular expressions still need to be adapted.
Regex101https://regex101.com/ (opens in a new tab)Helpful tool for writing expressions