The search engine continues the search at the next position: the star quantifier can be applied again and finds the word drastisch. Remember? We have the $quantifier; the machine applies it, but it fails again atansteigt!!!
The regex engine goes back and reduces the number of iterations until all possible paths have been examined. We expect regular expression searches to take O(n) to run, where n is the length of the input string.
In most cases, this may be true. However, in some cases - like the one we're currently considering - the regex engine may have to take an exponentially increasing number of paths through the input string to find a match.
With an input of about 125 characters, a situation arises where the machine takes an exponential number of paths. About 2^125 different paths, which results in about 4.2535296e+37 different combinations because denmark consumer email list an invalid word occurs at a certain position. This usually leads to what is known as catastrophic backtracking. Executing such regular expressions is very time and resource consuming.
Finally, we will look at different ways we can protect our models from such problems.
1. Reduce the number of combinations
One approach is to reduce the number of combinations performed by the regex engines. There are several ways to do this:
Avoid nested quantifiers – for example(a+)*
Avoid ORs with overlapping clauses - for example(b|b)*
Depending on the engine, some regular expressions written with nested quantifiers and overlapping clauses may execute quickly, but there is no guarantee. Better safe than sorry.
2. Check the backtracking
Another approach is to control backtracking. Although backtracking allows the construction of complex and powerful regular expressions, the ultimate benefit may be insignificant, especially compared to the poor performance in cases such as those studied above.
Fortunately, we can use certain features to either limit or suppress backtracking and still create powerful regular expressions. Let's look at two of them: atomic groups and lookahead.
a. Atomic groups
An atomic group uses ?>-syntax to suppress backtracking in the expression. Once a match is found, it is no longer possible to go back to other parts, even if it means there is a chance of a successful match.
This method of suppressing traceability helps improve performance when using nested quantifiers. Unfortunately, this feature is not implemented in all regex engines and is not available in JavaScript/Node.js in particular.
Let's look at another function that allows us to achieve something similar and is available in JavaScript/Node.js.
b. Lookahead
In the example we saw before, we want our quantifier not to backtrack because in most cases backtracking can cause serious problems as we saw before. To enforce this, we can use a function called lookahead.
How to protect yourself against an attack
-
- Posts: 193
- Joined: Sat Dec 21, 2024 5:19 am