Skip to main content

TOC Fundamentals

Main Source:

Introduction

Theory of Computation (TOC) is the study of abstract models of computation and the fundamental limits of what can be computed. It seeks to understand the nature of computation, the power, and limitations of different computational models, and the relationships between different classes of problems.

This field is further divided into three:

  • Automata Theory & Formal Language: Automata theory is the study of abstract models of computation or machines used to solve computational problems. Formal language is concerned about formalizing a natural language by specifying with symbols or character, following a set of rules called the formal grammar. Automata are designed to recognize formal language. In other word, the formal language is used as the input and output for the abstract model of computation. Different types of automata recognize different types of languages.
  • Computability Theory: Computability theory focuses on the study of what can be computed and the limits of computation. It deals with the question of which problems can be solved algorithmically and which problems are unsolvable. Problems are analyzed with abstract model of computation along with other formal systems.
  • Computational Complexity Theory: Computational complexity theory focuses on understanding the resources required to solve computational problems. It focuses on classifying problems into different complexity classes based on the amount of time, space, or other resources needed to solve them.

Foundational Concepts

TOC uses much math.

Notation

TOC uses mathematical logics as the tool to formalize and analyze computational concepts. Therefore, there are various mathematical notation used here. These notations will be covered later.

Source: Book notation part

Sets

In mathematical logic, set is a collection of objects. The objects within a set can be anything from numbers, letters, other sets, or even more abstract mathematical objects.

Sets are typically denoted using curly braces {}\{\}. For example, the set of natural numbers less than 5 can be written as {0,1,2,3,4}\{0, 1, 2, 3, 4\}, where each number is an element of the set. If an object xx is an element of a set AA, it is represented as xAx \in A, and if xx is not an element of AA, it is represented as xAx \notin A.

Sets can be finite or infinite, depending on the number of elements they contain. For example, the set of all even numbers is an infinite set, while the set of colors in a rainbow is a finite set.

Set operations and terminologies:

  • Empty set: A set is empty, denoted as \varnothing if it contains no elements.

  • Union: The union of two sets AA and BB, denoted by ABA \cup B, is the set that contains all the elements that are in either set A or set B, or in both. For example, if #A={1,2,3}A = \{1, 2, 3\} and B={3,4,5}B = \{3, 4, 5\}, then AB={1,2,3,4,5}A \cup B = \{1, 2, 3, 4, 5\}.

  • Intersection: The intersection of two sets AA and BB, denoted by ABA \cap B, is the set that contains all the elements that are common to both AA and BB. For example, if A={1,2,3}A = \{1, 2, 3\} and B={3,4,5}B = \{3, 4, 5\}, then AB={3}A \cap B = \{3\}.

  • Set difference: ABA - B denotes the difference between set AA and BB. It is a set that contains everything in AA, but not in BB. For example, if A={1,3,9}A = \{1, 3, 9\} and B={3,5}B = \{3, 5\}, then AB={1,9}A - B = \{1, 9\}.

  • Complement: Denoted by ACA^{C}, is a set that contains everything that is not in AA.

  • Properties: Set has some properties as follows.

    Set properties
    Source: Book page 2-3

  • Subset: A set AA is said to be a subset of a set BB if every element of AA is also an element of BB. If A={1,2}A = \{1, 2\} and B={1,2,3}B = \{1, 2, 3\}, then AA is a subset of BB (ABA \subseteq B). It is possible for a set to be a subset of another set without being equal to it. If AA is a subset of BB and there exists at least one element in BB that is not in AA, then AA is called a proper subset of BB, denoted as ABA \subset B.

  • Superset: Conversely, a set BB is said to be a superset of a set AA if every element of AA is also an element of BB. In other words, if every element xx in AA satisfies the condition xBx \in B, then BB is a superset of AA, denoted as BAB \supseteq A. Similar to subsets, if BB is a superset of AA and there exists at least one element in BB that is not in AA, then BB is a proper superset of AA, denoted as BAB \supset A.

  • Disjoint sets: Disjoint sets are sets that have no elements in common. In other words, two sets AA and BB are disjoint if their intersection is the empty set \varnothing. For example, if A={1,2,3}A = \{1, 2, 3\} and B={4,5,6}B = \{4, 5, 6\}, then AA and BB are disjoint sets because their intersection is \varnothing.

  • Cardinality: The number of element in some set. The cardinality of a set AA is denoted as A|A|.

  • Powerset: The powerset of a set AA, denoted as P(A)P(A), is the set that contains all possible subsets of AA, including the empty set \varnothing and the set AA itself. For example, if A={1,2}A = \{1, 2\}, then P(A)={,{1},{2},{1,2}}P(A) = \{\varnothing, \{1\}, \{2\}, \{1, 2\}\}.

  • Cartesian Product: The Cartesian product of two sets AA and BB, denoted as A×BA \times B, is the set of all possible ordered pairs where the first element comes from AA and the second element comes from BB. In other words, it combines every element of AA with every element of BB. For example, if A={1,2}A = \{1, 2\} and B={a,b}B = \{a, b\}, then A×B={(1,a),(1,b),(2,a),(2,b)}A \times B = \{(1, a), (1, b), (2, a), (2, b)\}.

Relations

Relation is a set of ordered pairs that relates elements from one set, called the domain, to elements of another set, called the range or codomain. More specifically, a relation on sets SS and TT is a set of ordered pairs (s,t)(s, t), where:

  1. sSs \in S
  2. tTt \in T
  3. SS and TT need not be different
  4. The set of all first elements in the “domain” of the relation, and
  5. The set of all second elements is the “range” of the relation

Two sets A={1,2,3}A = \{1, 2, 3\} and B={a,b,c}B = \{a, b, c\}. A relation RR from AA to BB could be {(1,a),(2,b),(3,c)}\{(1, a), (2, b), (3, c)\}. This relation relates each element from set AA with a corresponding element from set BB.

Relations
Source: Book page 8-9

  • Equivalence Relations: An equivalence relation is a relation that satisfies three properties:
    • Reflexivity: Every element is related to itself. For all elements aa in the set, (a,a)(a, a) is in the relation.
    • Symmetry: If one element is related to another, then the other is related to the first. For all elements aa and bb in the set, if (a,b)(a, b) is in the relation, then (b,a)(b, a) is also in the relation.
    • Transitivity: If one element is related to a second, and the second is related to a third, then the first is related to the third. For all elements aa, bb, and cc in the set, if (a,b)(a, b) is in the relation and (b,c)(b, c) is in the relation, then (a,c)(a, c) is also in the relation.
  • Partial Ordering Relation: A relation that satisfies three properties: reflexivity, antisymmetry, and transitivity. A set with a relation being partial order is called partially ordered set.
    • Antisymmetry: The opposite of symmetry, for all elements aa and bb in the set, if (a,b)(a, b) is in the relation, then (b,a)(b, a) must not be in the relation, unless a=ba = b.
  • Partition: A partition of a set is a collection of non-empty, disjoint subsets (called blocks) that cover the entire set. Each element of the set belongs to exactly one block of the partition. A set {1,2,3,4,5}\{1, 2, 3, 4, 5\} can be partitioned into {{1,4},{2,5},{3}}\{\{1, 4\}, \{2, 5\}, \{3\}\}.

Functions

A function is a special type of relation that assigns a unique element from the range to each element in the domain. In other words, it relates each element in the domain to exactly one element in the range. Functions are often represented using mappings or formulas.

For example, the function f:ABf: A \rightarrow B is defined by f(1)=af(1) = a, f(2)=bf(2) = b, and f(3)=cf(3) = c. This function assigns a unique element from set BB to each element in set AA.

Functions
Source: Book page 12

Kind of functions:

  • Injection (one-to-one): A function in which each element in the domain is mapped to a unique element in the range. In other words, no two distinct elements in the domain are mapped to the same element in the range.

  • Surjection (onto): A function in which every element in the range is mapped to by at least one element in the domain. In other words, the range of the function covers the entire range.

  • Bijection (one-to-one and onto): A function that is both injective and surjective. In other words, it is a function where each element in the domain is mapped to a unique element in the codomain, and every element in the codomain is mapped to by exactly one element in the domain.

  • Invertible function: A function f:ABf: A \rightarrow B is invertible if there exist an inverse relation f1f^{-1} that maps from BB to AA.

    Function relations
    Source: Book page 12-13

Graphs

Formally, a graph GG is defined as a triple of (V,E,γ)(V, E, \gamma), where VV is a finite set of objects called vertices, EE is another finite set of objects called edges, and γ\gamma is a function that assigns to each edge in a graph a subset {v,w}\{v, w\}, where vv and ww are vertices of the graph.

Graph example
Source: Book page 15

See graph for terminologies and types of graph. In addition, we can say a graph is a tree if it is connected and has no simple cycles.

Strings & Languages

A string is a finite sequence of symbols chosen from a given alphabet. An alphabet aa, denoted by Σa\Sigma_a is a finite set of symbols. The symbols can be characters, numbers, or any other discrete elements. For example, if we have Σa\Sigma_a consisting of the symbols {0,1}\{0, 1\}, then "0101" and "111" are the example of strings over Σa\Sigma_a.

  • Length: The length of a string is the length of its sequence. The length of string ww is denoted as w|w|.

  • Empty string: A string with zero length, denoted as ϵ\epsilon.

  • Reverse string: The reverse of a string. If w=w1w2...wnw = w_1 w_2 ...w_n, where each wiΣw_i \in \Sigma, then the reverse is wnwn1...w1w_n w_{n-1} ... w_1.

  • Substring: A substring is a contiguous sequence of characters within a string. As an example, "deck" is a substring of "abcdeckabcjkl".

  • Concatenation: An operation of combining them together to form a new string. Suppose string xx with length of mm and string yy with length of nn. The concatenation between them is written as xyxy, which is obtained by appending yy to the end of xx, that is x1x2...xm y1y2...ynx_1 x_2...x_m \space y_1 y_2...y_n.

    In the case of concatenation of a string with itself, it is denoted as xkx^k, where kk is the number of repetition. A concatenation of w=abcw = abc with empty string ϵ\epsilon will be wϵ=abcw\epsilon = abc.

  • Prefix: Substring that appears at the beginning of the string. If w=vyw = vy for some yy, then vv is a suffix of ww.

  • Suffix: Substring that appears at the end of the string. If w=xvw = xv for some xx, then vv is a prefix of ww.

  • Lexicographical ordering: Lexicographical ordering is a way to compare strings based on their alphabetical order. For example, "apple" comes before "banana" and "01" comes before "10" in lexicographical ordering. The lexicographic ordering of all strings over the alphabet {0,1}\{0, 1\} is (ϵ,0,1,00,01,10,11,000,...)(\epsilon, 0, 1, 00, 01, 10, 11, 000, ...).

  • Language: Formally, it is any set of strings over an alphabet. The set of all strings, including the empty string over an alphabet Σ\Sigma is denoted as Σ\Sigma^*. A language can consist of string that follows some property. Examples:

    • L1={w{0,1}:w has an equal number of 0’s and 1’s}L_1 = \{w \in \{0, 1\}^*: w \text{ has an equal number of 0's and 1's}\}. Language L1L_1 consist set of string that has the equal number of zeros and ones. For example, some strings that would belong to L1L_1 are "01", "0011", "1100", "000111", and so on.
    • L2={wΣ:w=wR}L_2 = \{w \in \Sigma^*: w = w^R\}, where wRw^R is the reverse string of ww. This language consist of string where its original is equal to its reverse (i.e., a palindrome, such as racecar, poop.).
  • Language with power: A language with a power Σn\Sigma^n, where Σ={0,1}\Sigma = \{0, 1\} describe the set of all strings of length nn under language Σ\Sigma.

    • Σ0={ϵ}\Sigma^0 = \{\epsilon \} (consist of just empty string).
    • Σ1={0,1}\Sigma^1 = \{0, 1 \}.
    • and so on...

    We can also write Σ\Sigma^*, a set of all strings over an alphabet Σ\Sigma as Σ=Σ0Σ1Σ2...\Sigma^* = \Sigma^0 \cup \Sigma^1 \cup \Sigma^2 ... (infinite).

  • Language concatenation: If L1L_1 and L2L_2 are languages over alphabet Σ\Sigma, their concatenation is written as L=L1L2L = L_1 \cdot L_2, or simply L=L1L2L = L_1 L_2, where L={wΣ:w=xy for some xL1, and yL2}L = \{w \in \Sigma^*: w = x \cdot y \text{ for some } x \in L_1 \text{, and } y \in L_2\}.

    For example, given Σ={0,1}\Sigma = \{0, 1\},
    L1={wΣ:w has an even number of 0’s}L_1 = \{w \in \Sigma^*: w \text{ has an even number of 0's} \}
    L2={w:w starts with a 0 and the rest of the symbols are 1’s}L_2 = \{w: w \text{ starts with a 0 and the rest of the symbols are 1's} \}
    then L1L2={w:w has an odd number of 0’s}L_1 L_2 = \{w: w \text{ has an odd number of 0's} \}. This is true because L1L_1 itself consist of even number of 0's, concatenating it with L2L_2, which mean adding 1 zero will make any language there have the odd number of zeros.

  • Kleene Star: A language operation denoted as LL^*, which is the set of all strings obtained by concatenating zero or more strings from LL. More specifically, L={wΣ:w=w1...wk for some k0 and some w1,w2,...,wkL}L^* = \{w \in \Sigma^*: w = w_1 \cdot ... \cdot w_k \text{ for some } k \ge 0 \text{ and some } w_1, w_2, ..., w_k \in L \}.

    For example, from a language L={01,1,100}L = \{01, 1, 100\}, we can form "110001110011", since it can be formed by the concatenation of 1100011100111 \cdot 100 \cdot 01 \cdot 1 \cdot 100 \cdot 1 \cdot 1.

Boolean Logic

See boolean logic.

Grammar

A grammar is simply a mechanism to describe a language, or specifically a set of production rules that describe the structure of a formal language. A grammar consists of a quadruple (V,T,S,P)(V, T, S, P), where:

  • VV: Finite set of objects called variables or non-terminal symbols. These symbols represent different syntactic categories or components that can be combined to form valid strings in the language. Examples of non-terminal symbols can include syntactic categories like noun, verb, sentence, or expression.
  • TT or Σ\Sigma: Finite set of objects called terminal symbols. These symbols represent the basic building blocks or atomic units of the language. Terminals are the actual characters or tokens that appear in the strings of the language. Examples of terminal symbols include individual letters, digits, or punctuation marks.
  • PP, in the form of xyx \rightarrow y: Finite set of productions rule. Each production rule specifies how a non-terminal symbol can be replaced by a sequence of terminals and non-terminals. It defines the valid ways to construct strings in the language.
  • SS, where SVS \in V: A special non-terminal symbol that represents the starting point, called start variables. The goal is to generate valid strings in the language by repeatedly applying production rules, starting from the start symbol.

If we have a string ww in the form w=uxvw = uxv, and we apply the production rule xyx \rightarrow y, it means that we replace the substring xx in ww with the string yy to obtain a new string z=uyvz = uyv. We call a sequence of production rule applications that transforms the start symbol SS into a specific string ww a derivation.

tip

The set of all strings obtained by using production rules is the languages generated by the grammar.

Consider a grammar G=(V,T,S,P)G = (V, T, S, P) and L(G)={wT:Sw}L(G) = \{w \in T^*: S \xRightarrow{*} w \}. This expression represents the language generated by the grammar GG, comprising all strings that can be derived from the start symbol SS according to the production rules defined in PP.

  • TT^* represents the set of all possible strings that can be formed using the terminal symbols TT.
  • SwS \xRightarrow{*} w denotes that the start symbol SS can be derived or rewritten to produce the string ww using a sequence of production rule applications. The SwS \xRightarrow{*} w symbol indicates zero or more rewriting steps.

If a string WW belong to the grammar, or WL(G)W \in L(G), then there exists a sequence Sw1w2w3...wnwS \Rightarrow w_1 \Rightarrow w_2 \Rightarrow w_3 ... \Rightarrow w_n \Rightarrow w, which is a sequence of production rule applications (or a derivation) that transforms the start symbol SS into the string ww.

During the derivation process, we call intermediate strings produced a sentential form. Therefore, with the string SS, we call w1,w2,...,wnw_1, w_2, ..., w_n a sentential form of the derivation.

tip

See also Formal grammar.