Regular Expression
Rules on YOCTOL.AI come in forms such as NLU, keyword, and regular expression. Among these forms, regular expressions describe expressions that follows a certain rule. They are used to inspect, search, or replace texts that match the rule. Common abbreviations of regular expressions are Regexp, Regex or RE.
Example
There are hundreds of ways to express the same meaning in any language. Also, there are different structures to achieve the similar meanings. These sentences can be expressed in certain forms of structures if we carefully examine and categorize them.
Take the following as an example: Large cup of coke
, Medium cup of coke
, Small cup of sprite
and Medium cup of Fanta
. We know that there are two basic features in these phrases: the size of the cup
and the type of the drink
.
If we put Large, Medium and Small
into one group, and Coke, Sprite and Fanta
to another, we then have an essential rule of thumb for this kind of sentences.
Size
Kind
Large
Medium
Small
Coke
Sprite
Fanta
With regular expressions, we can narrow these three sentences down to a rule as below:
(large|medium|small) cup of (coke|sprite|fanta)
Common circumstances for regular expressions
Regular expressions are often used for emails, phone numbers, birthdays, or any sort of structural character combinations.
Phone number: ^(\+1)?d{10}$
Social Security Number: ^\d{3}-\d{2}-\d{4}$
Email (gmail for example): ^.*@gmail.com$
Birthday: ^\d{4}-\d{2}-\d{2}$
When to use regular expressions
When we build our chatbots, we sometimes hope that certain combinations of words in a customer’s message would normally lead them to a certain intent, which further triggers a response from the dialogue. These combinations may subtly display a certain kind of combination rule.
This is where regular expressions come in handy. When encountering situations like this, you should sort the customers’ messages and use regular expressions to categorize, exclude, or set limitations to the messages. We will demonstrate an actual use case later.
Symbols
Basic rule: These symbols limit the number of occurrence of the character before it. We will introduce a couple common symbols below. For more symbols, there are many cheat sheets available online.
Symbol
Meaning
Example
Represent
.
Any character
.
a
b
c
d
*
0 or more
a*
Ø
, a
, aa
, aaa
+
1 or more
a+
a
, aa
, aaa
, aaaa
?
0 or 1
a?
Ø
, a
^ start of a string:
Regular Expression: /^hello/
Example: hello!
-> Match "hey hello" -> No Match
$ end of a string:
Regular Expression: /+eat$/
Example: I want to eat
-> Match “I want to eat food” -> No Match
| or e.g., a|b >>> a
or b
() defines the priority of a scope, e.g., (apple|banana) >>> apple
, banana
[ ] any element within it, e.g., [abc] >>> any of a
or b
or c
.
{ } number range, e.g., .{2} >>> ab
, bc
, cd
, dw
\d any number, e.g., \d{3} >>> 123
, 456
, 789
Example
Say for an online clothes shop, we want to train an intent that knows the different colors of the different types of clothes in stock.
As to
colors
andclothes types
, there are many ways of expressing:
Red jeans, blue shirt, yellow shorts, green jacket, yellow jacket, green shorts
Different colors
and types
may rearrange into many possibilities. Therefore, we can use a regular expression to capture all the possibilities.
Color
Type
Red, Blue, Yellow, Green
Jeans, Shirt, Shorts, Jacket
Regular expression:
(red|blue|yellow|green) (jeans|shirt|shorts|jacket)
However, users may add in words between or around these keywords, for example:
Red pink shirt, green colored shorts…
In this case, the regular expression would be:
(red|blue|yellow|green) (.* )?(jeans|shirt|shorts|jacket)
To be more advanced, we would want to capture the description of intensity of the colors. That case, the regex would be:
(light|deep)(red|blue|yellow|green) (.* )?(jeans|shirt|shorts|jacket)
If you want to rule out sentences that ask for a certain style of something, the regex would be:
^I (need|want) the .* style$
Learning Resources
When designing regular expressions, along with knowing what the symbols mean, you also need to check if the expression you made actually captures what you want. The following websites can assist you:
Regex testing: https://regex101.com/
Regex testing: https://regexr.com
Last updated