# Parsing / Context-Free Grammar

Context-Free Grammar (CFG) is a grammar where **production (or rewrite) rules** are in the form of:

$$

A \rightarrow \alpha

$$

$A$ is a **single nonterminal symbol**, and $\alpha$ is the **string of terminals or nonterminals**, it can also be empty.

A **terminal** (or **token**) is a symbol that does not appear on the left side of the arrow of any production rule.

In every production rule, a **nonterminal** on the left side of the arrow can always be replaced by everything on the right side of the arrow.

For example, the following grammar defined an integer in BNF^{◹} syntax:

```
<digit> ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
<integer> ::= ['-'] <digit> {<digit>}
```

In this grammar, `<digit>`

and `<integer>`

are nonterminals, and the symbols (-, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9) are the terminals.

Formally, a Context-Free Grammar is defined by a 4-tuple:

$$

G = (V, \Sigma, R, S)

$$

Where:

- $V$ is a set of nonterminal symbols that we have in the grammar.
- $\Sigma$ is a set of terminals, which make up the content of the sentences.
- $R$ is a set of production rules of the grammar, sometimes can be symbolized as $P$.
- $S$ is the start symbol of the grammar.

For example, to parse an algebraic expression with the variables $x$, $y$ and $z$, like this:

$$

(x + y) * x - z * y / (x + x)

$$

We have a grammar:

$$

G = (\{ S \}, \{ x, y, z, +, -, *, /, (, ) \}, R, S)

$$

With the following production rules:

```
S → x | y | z
S → S + S
S → S - S
S → S * S
S → S / S
S → (S)
```

Another example, a grammar $G$ to match all palindromes of the characters $\{ a, b \}$ like $aa$, $aabaa$, $aabbaa$, $bab$,…

$$

G = (\{ S \}, \{ a, b \}, R, S)

$$

```
S → aSa
S → bSb
S → ε
```

The last rule is called ε-production, which means $S$ be rewritten as an empty string.