Regular Languages (Part 1)
Main Source:
- Book chapter 1.4
- Neso Academy playlist 8-9
- Neso Academy playlist 45-48
Regular Language
Regular language is a language that can be recognized by some finite state machine. It is constructed by three set operations, union, concatenation, and Kleene star.
With alphabet , the class of regular languages over is defined as:
- is a regular language
- For each is a regular language
- For any natural number if are regular languages, then so is .
- For any natural number , if are regular languages, then so is .
- If is a regular language, then so is .
- Nothing else is a regular language unless its construction follows from rules (1) to (5).
So, a regular language can be constructed using the three mentioned operations, including any symbol, and the empty set is also considered a regular language.
Closure Properties
If an operation is said to have closure property, it means that performing the operation on members of the set should yield a result that is also a member of the set. For example, addition in the set of integers own the closure property, because adding two integers will always result in another integer. This is not the case for division, as it might result in fraction.
In regular language, operation that owns closure property are union, concatenation, intersection, set difference, complementation, and Kleene star.
Regular Expression
Regular expression is the formal notation of describing regular languages. It consists of a sequence of characters that represents a pattern or rule used to match and manipulate strings within a given alphabet.
Character and operators:
- Terminal symbols: Terminal symbol including symbols that are contained within the , including (empty string) and (empty set) are regular expression.
- Union (): Union operation is like a logical OR, if we have two regular expression, , union between them would be represented as . This mean we are making a pattern that match either or .
- Concatenation (): With , concatenation is , and it indicates that the two expressions must occur consecutively in the input string.
- Kleene star (): It acts like an iteration that specifies that the preceding expression can occur zero or more times. It matches any number of occurrences of the preceding expression, including no occurrence at all. For example, the regular expression matches strings like or "", "a", "aa", "aaa", and so on.
All the operations behave recursively, meaning we can apply it not only to individual characters or subexpressions but also to larger expressions or groups of expressions. For example, with , we can make it as .
Common Expression
Assume that
- Zero or more: We can use Kleene star for this, means one or more a's.
- One or more: Use one 'a' before the zero or more expression: ( is also right).
- Zero or one: describe optional 'a'.
- Any string: Union all alphabet with a Kleene star: .
- Any nonempty string: Because Kleene star permits zero or more, then we would need extra any string in front of it, so it will be .
- Any string not containing: We should exclude the string that we don't want it to contain from the any string expression, exclude any 'a'.
- Any string containing exactly one: Just add the desired string before or after the any string not containing expression, .
We can also use to describe zero or more string excluding the empty string.
Regular Expression Identities
Regular expression identities are regular expression laws or properties, or a set of rules or relationships that hold true for regular expressions. These identities are useful for simplifying or transforming regular expressions while preserving their matching behavior.
Source: https://youtu.be/yp4pYgXfYD8?si=rwI7XntRncW2qGoK&t=26
For example, applying Kleene star multiple times (rule 8) doesn't affect anything. This property is similar to adding infinity to infinity, it remains infinity.
In the rule 9, represents one or more string, meaning empty string is not possible to matched. If we add an empty string with that expression, in other words, we include in the set of all possible string matched, then it will be equal back to again.
Arden's Theorem
One application of regular expression identities is the Arden's theorem. Arden's theorem states that:
If and are two regular expressions over , and if does not contain , then the following equation in given by has a unique solution, that is .
Arden's theorem can be proved by using several regular expression identity:
Source: https://youtu.be/Idl_0mPzZjE?si=OyVyX6JPE4VhDaz2&t=430
The proof consist of two part, the first part proves that is equal to , and the second part proves that is the unique solution for the equation , by substituting and expanding the with the . It mostly uses the distributive property of regular expressions.