Validate entries

Forms with HTML5 TOP DE

Entries in form fields <input> must be checked before further processing. I do not want to discuss which problems are caused by unchecked data here.

To check an input on the client side, a JavaScript function always had to be written with great effort. With HTML5 there are new element types which cause the modern browser to take over checks.

For the <input> element a type is defined:
<input type="text">

For the types: »text, search, url, tel, email, and password« now exists the element attribute 'pattern',
<input type="text" pattern="[0-9]{5}">
. With the pattern given here the browser can check the input. It compares the input with the pattern created by the website builder. In this way it can be prevented that wrong entries are made already before the further processing. The standard is that the form is only processed in the case of a positive check, otherwise the form is not sent, and a tooltip from the browser alerts the user to the incorrect input.
Other checks must, of course, be performed additionally on the client side and on the server side anyway in order to test for security features such as check digits or to prevent the introduction of malicious code.

There are other input types that cause touchscreens to display special keyboard layouts or let the browser display special selection windows. The types »color, date, datetime, datetime-local, month, week, time, number, range« do not support the element 'pattern' and are therefore not mentioned in this tutorial.

It is important to set the document type to HTML5
<!DOCTYPE html>.
and the character set to UTF-8
<meta charset="UTF-8">
In addition, it is required that the title attribute
<input title="Please enter postcode">
be used so that the user immediately knows which input is expected from him. Therefore it made sense to use the placeholder attribute
<input placeholder="12345">
. Mandatory input can be marked by the required attribute
<input required >
. Users are then automatically informed by the browser via tooltip that the field must be filled in.

<!DOCTYPE html>
<meta charset="utf-8">
 <form action="#.ext" method="post">
  <label for="plz">PLZ: </label>
  <input required
    id="plz" name="plz"
    title="Please enter the 5-digit postcode">


Forms with CSS3 TOP DE

Browser mark with an input which does not fit on the given pattern, the input field mostly with a red border.

We could now use CSS3 pseudo classes to visualize the highlighting of the incorrect input as well as the correct input. This leaves room for many CSS games with background colors or images. Of course we also work with it.

/* style for HTML5 formular input pattern */

/*An incorrect input:*/

/*An correct input:*/

/*A mandatory field:*/

Pattern TOP DE

The pattern for input fields is the expression with which you compare (not search). The simplest pattern consists exactly of the characters that are compared with the input. These can also be texts, metacharacters, groups of characters or character classes. For example, a pattern that we will examine later might look like this:

The round parenthesis ( ) combine pattern blocks. The pipe | stands for the OR function, so the text must contain either one or the other. Selection lists are used to define groups of characters that are to be allowed (or forbidden). These character classes are noted within square parenthesis [ ]. The hyphen - defines ranges of characters within a character class. If the character itself is to be a member of the character class, it can be used as the first or last character in the list or it must be masked by prefixing it with \. The Accent Zirkumflex ^ excludes characters.

Metacharacters are placeholders for a group of other characters. If the character itself is to be part of the pattern, it must be masked. The backslash (itself also a metacharacter) \ is responsible for this. The dot . stands for any character. If the parenthesis and the pipe character should be part of the pattern, they have to be masked.

Special characters can be used as an alternative to the characters. In connection with the backslash the \s(whitespace) stands for the space or a (copied) tab. \w(word) allows all digits and the characters from A to Z and a to z as well as the underscore, \d(digit) allows all digits. The respective upper case /S /W /D prohibits the corresponding input. However, using this negation is not easy for input-pattern.

Quantifiers determine the number of an expression directly in front of it. In this way, the minimum, maximum or exact number of characters or character groups can be determined. The asterisk * indicates that an expression cannot occur once or as often as desired. The plus + indicates that an expression can occur at least once up to any number of times. The question mark ? indicates that an expression can occur once, but does not have to occur. Curly braces { } allow the minimum, maximum or exact number of characters or repetitions to be specified.

How you can assemble all this to get a meaningful pattern is shown below.

Test it TOP DE

In web forms, the entire input in an input field is always compared with the given pattern. You cannot check parts of the input.

The rules of the form input pattern follow the rules of the RegEx of Javascript but there are differences. The start and end identifiers ^ required for programming languages ... $ do not need to be used with HTML5 pattern. You often see this ^[0-9]{5}$, and it obviously doesn't lead to an error, but as you know, the whole input is always compared. You don't have to worry about that as well. The correct way is simple: [0-9]{5}.

Since we only have a single-line input field, we cannot check for line breaks, of course. Also the modifiers for upper and lower case are not used.

For the form itself we still need the appropriate input type, otherwise no check takes place. Input types for a pattern comparison are »text, search, url, tel, email, and password«. For this tutorial I only use the text type. type="text"
The other types may help a tablet or smartphone to display the customized keyboard (i.e. a numeric keyboard for type ="tel", an additional ".com" for type ="url").

For the website builder there is no error message!

If the pattern itself is incorrect, it is possible that any input will be recognized as valid or even invalid. If you only want to allow numbers and have accidentally entered letters as patterns, how do you test this?
You have to check, check, check.
syntax errors so violations of the sentence construction rules such as missing parenthesis or so you can of course still run through a test machine.

Patternmaker TOP DE

Often you have to decide if you want to force upper case [A-Z] or if you want to convert it later via script.
For the IBAN (DE1212345678901234567890) the country code is always in capital letters and has no spaces, but for humans it is easier to enter this long chain in groups, i.e. with spaces. (DE12 1234 5678 9012 3456 7890). This must then be corrected during further processing. Sometimes there are different rules for machine-machine and man-machine. We have to take this into account in our prototyping work. Ultimately, it is the decision of the pattern maker whether to make concessions to the user when entering the data and then to comply with the regulations using scripting.

Create a pattern

SimplePattern TOP DE

Simple patterns are constructed from characters for which you want to find a direct match. Letters, numbers or special characters. Only if the input matches the pattern is the input accepted by the browser. A simple word or sequence of numbers is also a valid pattern. pattern="Hello World"can already be a simple pattern used by the browser as a basis for comparison with the input. Only this letter combination is accepted.
special characters '§§15 & 16' and numbers 0815 are treated the same way. We just need to add that.

Here only the capital letter 'A' is accepted as correct input.
<input type="text" pattern="A">



Here only 'hello world' is accepted as correct input.
<input type="text" pattern="Hallo Welt">

Hallo Welt Hello World

Here only '0815' is accepted as correct input.
<input type="text" pattern="0815">

0815 0816

character selection TOP DE

With the square parenthesis [ ] you can specify a list of characters as a pattern. However, the parenthesis only stands for one characters from the selection. [ABCD] So only A or B or C or D would be valid here. Since the letters in this example are continuous, you can also write the expression with the hyphen. [A-D].
All upper case letters [A-Z], all lower case letters [a-z]and all numbers. [0-9].

Country-specific letters, in Germany these are the umlauts and the eszettas, are not included in this list. We simply add these characters to the list. So, to sample all German letters, the expression [A-Za-zÄäÖÖÜüß]is valid as pattern.

Only a capital letter from the selection A-D is accepted as correct input here.
<input type="text" pattern="[ABCD]">


Letters, also umlauts (ÄÖÜäöü), no special characters
<input type="text" pattern="[A-zÄÖÜäöüß]">


a number
<input type="text" pattern="[0-247-9]">


Unicode table TOP DE

Of course we often need all upper and lower case letters [A-Za-z], you could also write them down as [A-z]. But then the characters [\]^_`are still there. Why?
Well, the Unicode character table is used. In this table there are 16 levels with 65536 characters each.
For example, besides the expected Latin characters we also find syllable sounds of Canadian natives or the characters of the Cherokee. For almost every language the characters, many symbols and some control characters. The table "latin-basic" contains e.g. - who would have thought it - the latin letters, numbers and some special characters that you can also find on a normal western PC keyboard.

Now comes what we are interested in: The order of the characters we note with [from - to]as a pattern is determined by their position in the Unicode table.
Simplifiedly said, first comes the A, then the B, then the C, then... so we can write [A-Z]or [0-9] if all the characters in between are meant.
But in the table the upper and lower case letters are not directly behind each other. The order is ..XYZ[\]^_`abc. Thus [A-z] also contains the intermediate characters [\]^_`.

We find the characters Ä, Ö, Ü in the table Latin Supplement 1. When we search for the characters we find several characters in between. So [Ä-Ü]would also include many other characters ÄÅÆÇÈ**ÉÊËÌÍÎÏÐÑÒ**ÓÔÕÖ×ØÙÚÛÜ.

The usable characters in the Latin-basic table: U+0000 bis U+007F

The usable characters in the table Latin Supplement 1: U+0080 bis U+00FF

The usable characters in the table Latin extension A: U+0100 bis U+017F

The usable characters in the table Latin extension B: U+0180 bis U+024F

The usable characters in the table Greek, Coptic: U+0370 to U+03FF

The usable characters in the table Latin-Extended Addition:U+1E00 to U+1EF9

Data can be entered beyond the limits of the tables. [[z-À]] also includes the '~'. It is interesting that the Greek table comes in the order before the Latin extension. This should be considered. if you want to combine Ḁ and Σ.

Setting the area backwards doesn't work by the way. [Z-A]or [9-0]are invalid, the pattern is ignored, so any input is valid. And that's not what we want.

<input type="text" pattern="[9-0]">


Exclude characters TOP DE

As the list in the square parenthesis stands for only one element, you can exclude characters by skilful selection. The patterns [0-247-9]or [01247-9]allow only the numbers 0 to 2, the 4 and the numbers 7 to 9. [A-HK-TZ]only the letters A-H, K-T and Z. All others are not accepted.
If you make a space or a comma between the areas, these are also part of the pattern!

With the Circumflex ^ we can exclude the characters we don't want.[^Ad5]therefore, all conceivable characters do not allow the selection in parenthesis, i.e. the A, the d or the 5th character. The expression, which stands for all special characters, could look like this: [^A-zÄÖÜäöüß0-9] But then, of course, all Greek, Chinese, and ... characters allowed.

The application is not quite easy in practice. I mean, the enumeration of what is allowed is often simpler than the enumeration of what is forbidden, especially if you need it nested, which is very likely to be the case very often.

Exclude character Format: $ or X or n
<input type="text" pattern="[^Ad5]">


Multiple characters with character selection TOP DE

Of course you often have to create a pattern for a larger number of characters, because the input will seldom contain only a single character. Several rectangular parenthesis could form the pattern for this.
The pattern [a-z][a-z][a-z][a-z]stands for 4 letters from a list of a-z each. This pattern: [A-Z][1-9][a-z][§&!]has different lists.
[A-Z][a-z][a-z][a-z][a-z] stands for a character with uppercase letters, followed by 4 characters with lowercase letters. The complete input is always compared with your sample expression. If more or less letters or even numbers or special characters are entered, the browser will prevent further processing of the form.

noun with 5 letters Format: Xxxxx
<input type="text" pattern="[A-Z][a-z][a-z][a-z][a-z]">

Die 3

Number with 5 digits Format: nnnnn
<input type="text" pattern="[0-9][0-9][0-9][0-9][0-9]">

Die 3

DE - Vehicle registration number Format: XX-Xnnn
<input type="text" pattern="[A-Z][A-Z]-[A-Z][0-9][0-9][0-9]">


Character selection with repetitions TOP DE

Now it is of course nonsensical to put an endless number of characters in square parenthesis behind each other. For similar characters we use the quantfier. In curly parenthesis { } you can specify the number of repetitions. You simply write it directly after your character selection. [A-Z]{20}
For a 10-digit order number of a mail order company you could instead of [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] so [0-9]{10} write as a sample

DE- Postcode 5 digits: Format: nnnnn
<input type="text" pattern="[0-9]{5}">

Die 3

German IBANFormat: DEnnnnnnnnnnnnnnnnnnnn
<input type="text" pattern="DE[0-9]{20}">


Character selection with minimum and maximum length TOP DE

The curly parenthesis {nbsp;} are also used for your pattern here. It applies {min, max}. For example, if you want to allow at least five, but no more than 10 characters, you would write [A-Za-z0-9]{5,10}

If you only want to specify a minimum number of characters, you can omit the max parameter {2, }, but you must specify the comma, otherwise the number in parenthesis would be the fixed number of characters. If you only want to specify a maximum number of characters, the minimum number must be specified anyway. {0,4}

Another possibility is to use the plus sign, the asterisk or the question mark.
The plus sign stands for at least one occurrence of the expression [0-9]+ .
The asterisk [0-9]* stands for none or any number of occurrences.
The question mark [0-9]? stands for once or no time.

In connection with the attribute required, which marks the form input field as mandatory, the browser will not process the form if the field may be empty according to the pattern x{0,8}, x*or x?. Finally, he expects something to be entered in the field.

HTML: For input fields the attribute maxlength="n" exists, which determines the maximum number of all characters. The browser does not allow you to enter more characters at all. For example, if you only want to allow a maximum of four characters, you write: <input maxlength="4">.
But if you have noted a number of at least 5 characters in the pattern <input maxlength="4" pattern="x{5,8}"> it won't work.
The same applies to the minimum length valid since HTML5

<input type="text" pattern="[A-z]{1,3}">
<input type="text" pattern="[A-Z][a-z]{0,12}">
<input type="text" pattern="[A-Z][a-z]{2,}">

<input type="text" pattern="[A-Z][a-z]?">
<input type="text" pattern="[A-Z][a-z]+">
<input type="text" pattern="[A-Z][a-z]*">

Problems with Input-pattern max specifications and HTLM length specifications or the specification: required:

Mandatory input (required) - according to pattern but also valid without input {0,}!
<input required type="text" pattern="[0-9]{0,}">
Mindestlänge muß angegeben werden!

<input type="text" pattern="[0-9]{,3}">
The field has a maximum of 4 digits, but the pattern requires at least 5 digits.
<input type="text" pattern="[0-9]{5,}" maxlength="4">

Character selection with alternatives TOP DE

If you want to offer alternatives, so coffee or tea is the pipe sign | the appropriate means, it separates the alternatives. We would label coffee or tea or juice as follows:coffee|tea|juice.

<input type="text" pattern="Kaffee|Tee|Saft">


<input type="text" pattern="Weißwein|Rotwein|Rosé|Sekt">


characters with special meaning TOP DE

Meta characters are characters that have a function in addition to their existence as characters. If you want to use them as characters and not as functions, they have to be masked. That is, you put a backslash \ in front of it. There are some of these metacharacters. You've already met a few of them. The parenthesis, round, square or curved or the question mark, the plus and the asterisk. ( ) { } [ ] ? - * ^ $. The dot . stands for any character on the one hand, but if you want to check on the dot, you put the backslash before it\.. The question mark indicates that 'can but does not have to'. If you want it as part of the pattern, it must be masked\?. If the character in square parenthesis is at the very front or at the very back, you can save yourself the masking, but not at the dot, which then stands for all characters. In the example water? the "?" stands for the quanifier once or not. With the backslash it stands for the question mark.

What would you like to drink?.
<input type="text" pattern="(Wasser)?">

Wasser Wasser?

<input type="text" pattern="(Wasser)\?">

Wasser? Wasser

Predefined (meta)characters TOP DE

To abbreviate letters and number selection, there are predefined characters. They are also preceded by a backslash to distinguish them from a normal character.
\d eine Ziffer, also [0-9]
\D ein Zeichen das keine Ziffer ist, also [^0-9] oder [^\d].
\w ein Buchstabe, eine Ziffer oder der Unterstrich, also [a-zA-Z_0-9]
\W ein Zeichen, das kein Buchstabe, keine Ziffer oder kein Unterstrich ist, also [^a-zA-Z_0-9] = §$%&!:#...
\s das Leerraum (whitespace) Leerzeichen, Tab..
\S ein Zeichen, das kein Leerzeichen ist, also [^ ] oder [^\s]
\t ein Tab (ist nur durch copy + paste in ein input-feld zu bekommen.


<input type="text" pattern="[\d]{5}">

Die 3

<input type="text" pattern="\D+">

Die 3

Group characters TOP DE

The round bracket ( )is used to mark longer related expressions. (pa){2}(ma){2} The quantfier here refers to the last expression, i.e. the contents of the round bracket. Without the round parenthesis we would only have a doubling of the last character.

Round parenthesis may also be nested.
(A(BC){2}){2} means:(A(BC)(BC)A(BC)(BC) therefore: One A 2xBC and the whole again twice =ABCBCABCBC

<input type="text" pattern="(ma){2}">

couscous cous cous

<input type="text" pattern="(Weiß|Rot)wein|Ros(é|e)|Sekt">


Which tea may I pour for you?
<input type="text" pattern="((Roi|Rooi)bos|Roibush|(Rot|Roi)busch|Massai)tee">

Grüner Tee

references TOP DE

Another function of the round parenthesis is that we can access our groups with masked numbers /1/2/3etc. The number refers to the parenthesis in exactly the order you created it.

The pattern ([A-z])\1 expects 2 characters, one from the list [A-z] and one identical to the first one. So for example 'AA'.

Children first speak a simple language, which usually consists of duplicating syllables. (mama, papa, popo, pipi, lala). ([a-z])([a-z])\1\2 is the pattern. The masked one \1 refers to the pattern of the first parenthesis, \2 refers to the pattern of the second parenthesis. So the input of the third letter must be the same as the input of the first letter, the input of the 4th letter must be the same as the input of the 2nd letter.(mama)

One from real life?
Stock management, you have a 9-digit part number of something. The system pretends that users can enter the number with separators (period, comma, space, dash) in blocks of 3 123.456.789, but does not have to. The separators must not be used mixed in any case! So only dots or only commas or (..). The reference to the repeated content could regulate this.
[0-9]([. ,-]?)[0-9]{3}\1[0-9]{3}.
The second isolator must therefore be entered identically to the first one.

The minus sign is not masked because it is not interpreted as 'from-to' at the beginning or end of the list. If instead of the list the pipe is used you would have to mask the dot.

<input type="text" pattern="([A-z])\1">


<input type="text" pattern="([a-z])([a-z])\1\2">


9-digit TypePartsNumber Format: nnnnnnnnn oder nnn nnn nnn oder nnn.nnn.nnn oder nnn-nnn-nnn
<input type="text" pattern="[0-9]{3}([., -]?)[0-9]{3}\1[0-9]{3}">

123 123 123
1231 23123
123 123.123
123-123 123

Exclude clamp TOP DE

And it gets even better: If you now want to exclude a parenthesis from these links, you can simply do that with ?:.
Example: (\w)(?:\w)(\w)(\w))\2. 5 characters are expected. The masked /2 refers to the second parenthesis, but since it is not released, the next parenthesis is the reference target. In this case the parenthesis three. This means that the browser will only accept the input if the 3rd and 5th characters are identical.

Exclude the 2nd parenthesis, so the third parenthesis is the second
<input type="text" pattern="(\w)(?:\w)(\w)(\w)\2">


Positive predictions TOP DE

A positive predictive statement (?=x)(look-ahead) sets parameters for the following pattern. It is written in parentheses with question marks and equal signs.
The prediction together with the pattern (?=.{5}$)[A-Z]* indicates that the input must be 5 capital letters long. The "$" stands for the end of the input, i.e. a total of 5 characters, without this character the prediction would be interpreted as at least 5 characters. This example is too simplified to better understand the meaning please read Training ISBN.
For example, the pattern (?=.* apple .*).* could mean that the word "apple" must appear in a word or sentence.
The expression (?=.*[A-Z]).* specifies that the expression must contain uppercase letters. If now upper and lower case letters must be contained and a certain length is to be kept, then you build the pattern from a combination of the parameters. (?=.{5,})(?=.*[A-Z])(?=.*[a-z])[A-z]*

a total of 5 characters long, the dot somewhere in the middle Format: n.nnn oder nn.nn oder nnn.n
<input type="text" pattern="(?=.{5}$)\d+\.\d+">


The string 'tan' must appear in the response.
<input type="text" pattern="(?=.*[Tt]an.*).*">


Password must be at least 8 characters long and must contain upper case letters, lower case letters, numbers and special characters!
<input type="text" pattern="(?=.{8,}$)(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*\W).*">

Negative Predictions TOP DE

The negative predictive statement (?!x)(look-ahead) specifies that an expression is not followed by anything. For example, the pattern (?!02)(0[1-9]) claims that subsequent input 00 to 09 is possible, but not 02!
An example: In your favorite descent there is no room number 13 on any floor. You know that. Why? Because the pattern maker does that? Because the pattern maker took the room out of the form! (?!13)\d{2}Was obvious.

Hotel - Roomnumber 001 - 299 but no x13
<input type="text" pattern="[012](?!13)\d{2}">


The string 'no' must never exist.
<input type="text" pattern="(?!.*no.*).*">


Three in a row.
<input type="text" pattern="(?=.*(.)\1\1.*)\w*">


noneThree in a row.
<input type="text" pattern="(?!.*(.)\1\1.*)\w*">

zoologisch zooologisch

Frequent errors

not tested, only copied TOP DE

Retrospective statements (look-behind) used in different programming languages do not exist for input patterns. As you know the patterns are based on the rules of Javascript RegExp. There is no such thing.
There is no carriage return or tab either! If you copy/paste it from somewhere into the input field, it will be tested like a space. The same is true for newline.

I've found many, many patterns that aren't running at all. Either they are totally broken or they don't do what they are supposed to. As webworkers, we have to make sure that the pattern we provide for comparison is correct. Did I mention that you should test your patterns before you feed them to the www world?

There are comparison pattern rules for many programming languages. (Regular Expresions, RegEx, RegExp, ...) They differ in one way or another. The regular expressions of the input pattern follow the rules of JavaScript according to the will of the 3W and not those of Perl or any other language! It doesn't make sense to simply copy a pattern you have found in C, C++, Java, Python, Ruby or PHP on the net, it probably won't do what you want it to do. Maybe it looks similar, but needs to be adapted to our form input pattern. However, there are also some rules that differ from JS, see above.

Levely sorry I also found some popular pattern collection and tips that shine with a lot of bad examples. If you copy something from there (this is of course also true for this page!), you should check well.

Surely my examples are not error free, because of the huge amount of patterns it is impossible to check all of them completely for positive or negative input.


Selection (list)
[   ]selection of the character[abc]
[^  ]none of the characters within the brackets[^abc]
[A-Z]one letter capital[A-Z]
[a-z]one letter small a-z[a-z]
[A-z]one letter, capital, small, some special characters [\]^_`[A-z]
[0-9]a number 0-9[0-9]
Groups and References
(  )group of characters(abc)
\1 reference (input is identical to the input of bracket 1)(\D)(\d)\1
\2 reference (input is identical to the input of bracket 2)(\D)(\d)\2
?: exclude from reference
(First bracket excluded, so the second is the first)
\metacharacter: mask a metacharacter \d
. (dot) any character1.2
\. (mask dot) = the dot.1\.2
/ normal special character
| metacharacters: orab|cd
-metacharacter: selection range from-to within [0-5]
\d(digit) a number [0-9] \d
\Dalles außer einer Zahl \D
\s(Leerzeichen) Leerzeichen (ein Leerzeichen, ein Tab (per C&P))\s
\Severything but a space (no space, no tab) \S
\t(tab) one tab (input only by copy/paste)
\Tno tab (input only by copy/paste)
\w(word) one letter, one digit or underscore\w
\WNo letter, no digit, no underscore\W
$ end marker (actually only used for predictions.(?=\d{3}$).*
? metacharacter: once or not once[a-z]\d?
+ metacharacter: once or several times[a-z]\d+
* metacharacter: once or several times[a-z]\d*
[ ] { } ( ) * + metacharacter
{n} Quantityx{5}
{n,} Number at leastx{5,}
{0,m}Number Maximumx{0,5}
{n,m}Minimum, maximum numberx{5,8}
?= positive prediction (input contains one digit)(?=\d).
... .. (Input contains three digits in a row at the beginning)(?=\d{3}).*
... .. (Input contains three digits in a row somewhere)(?=.*\d{3}).*
... .. (Input contains three digits in a row at the end)(?=.*\d{3}$).*
?! negative prediction (input contains not a digit)(?!\d).
... .. (Input contains not three digits in a row)(?!.*\d{3}).*

privacy policy

Legal Notice