Skip to main content

Formal Grammar

Main Source:

  • Book chapter 2.1
  • Neso Academy playlist 63-64

Formal Grammar is the formalization of grammar, which is the rules to describe languages.

Chomsky Hierarchy

There are many types of formal grammar, the types of grammar based on Noam Chomsky are regular grammar, context-free grammar, context-sensitive grammar, and unrestricted grammar.

Type of grammar
Source: https://youtu.be/WgEsPTAL55Q?si=4LOZ1qtkL1hunWmf&t=70

Every grammar generates its own language, which is recognized by a specific automaton.

Grammar

As explained in grammar, in general, a grammar is defined as quadruple: G=(V,T,S,P)G = (V, T, S, P).

With grammar G1=({S,A,B},{a,b},S,{SAB,Aa,Bb})G_1 = (\{S, A, B\}, \{a, b\}, S, \{S \rightarrow AB, A \rightarrow a, B \rightarrow b\}), we have three categories of symbol, SS (also known as the start symbol), AA, and BB. The category itself doesn't mean anything in language, the actual character that appears in the language are aa and bb. The production rules describe that the start symbol SS can be replaced with ABAB. The AA and BB itself can be replaced by aa and bb, respectively.

For example, starting from SS
SABS \rightarrow AB (by SABS \rightarrow AB)
aB\rightarrow aB (by AaA \rightarrow a)
ab\rightarrow ab (by BbB \rightarrow b)

The production rule of a general grammar is in the form of AαA \rightarrow \alpha, where AA is a non-terminal symbol, and α\alpha is a string of terminal or non-terminal symbol (formally α={VT}\alpha = \{V \cup T\}). The production rule allows the transformation from non-terminal symbols to another non-terminal or terminal symbols, making it possible to expand the language recursively.

tip

The book also notate the production rule in the form of (VT)+(VT)(V \cup T)^+ \rightarrow (V \cup T)^*. In other word, one or more terminal or non-terminal symbol can be transformed to zero or more (so includes empty string) terminal or non-terminal symbol.

For example, when the production rule generates another start symbol, such as SSABS \rightarrow SAB, then, starting from SS, we can produce SABSAB, then further apply the same production rule to obtain SABABSABAB, and continue this process until we replace all non-terminal symbol to terminal symbol.

Example

The process of generating language from grammar is also known as derivation. The set of all language generated by some grammar is denoted as L(G)L(G). The language generated by the previous grammar example, L(G1)L(G_1), consists of only a single string: {ab}\{ab\}.

Another example from the video:

Grammar example
Source: https://youtu.be/ejXgLRSIxsA?si=K8kIBdPOzhFgqsQW&t=573

This grammar has a recursive production rule. We can replace AA with aAaA, which mean we can potentially generate infinite amount of "a"'s. The language generated by this grammar creates a pattern ambna^m b^n, therefore the L(G)L(G) is a set of string with that patterns.

Regular Grammar

Regular grammar is the grammar that describes regular language, which is the languages recognized by finite automaton like DFAs and NFAs.

Left & Right Grammar

The formal description of regular grammar is same as the general grammar, except the production rule follows either left or right types of grammar. With AA, BVB \in V and xTx \in T, right and left grammar is in the form:

  • Right linear grammar: AxBxA \rightarrow xB|x, example SabSbS \rightarrow abS|b.
  • Left linear grammar: ABxxA \rightarrow Bx|x, example SSbbbS \rightarrow Sbb|b.

In right grammar, the production rule places non-terminal symbols on the right, while in a left grammar, it is the opposite.

Additionally,

info

The "|" in the notation SAaS \rightarrow A|a is a shorthand to describe two SAS \rightarrow A and SaS \rightarrow a.

Context-Free Grammar

See the next topic.