Compiler Construction

The compiler is a system program used to translate source code into machine codes. The compiler is called language translator. Every programming language has its own compilers or interpreters. The designing of interpreter is not more difficult like as compiler. The compiler construction work is enough though and logical.


Every year several software’s are lunched in market, but compiler is lunched very rarely per year. The computer engineers can design it, so, it is not like as designing of small application software using Visual Basic or Delphi. It is a time consuming process. There are many types of compilers available in market. Pascal, FORTRAN, COBOL, C, C++, C#, Java and many other high level programming and their compilers are popular among software professionals.

The source program is fed into compiler. Generally compiler converts the source code into machine codes, but some compiler converts source codes into assembly language and assembler converts them into machine language.


1 . Phases of compilers


The compilation process of any compilers has some special types of procedures. These procedures are completed in pre-defined phases. When one phase is completed next phase is started. As the time of compiler designing, each phase is designed very carefully. The compiler is a system in which source program has to pass from different phases and at last converted into machine codes.

These phases are:

  • Lexical Analyzer
  • Syntax Analyzer
  • Semantic Analyzer
  • Intermediate code generator
  • Code Generator
  • Symbol table manager
  • Error Handler

2. Lexical Analysis


The lexical analysis or scanning is a process to take source program and procedure lexical tokens or tokens. The lexical analysis is efficiently handled by lexer or lexical analyzer.


amount := salary + rent;

Tokens are:

(a) Amount: Identifier

(b):=: Assignment symbol

(c) Salary: Identifier

(d) +: Operator

(e) rent : Identifier

The token is a lowest level sequence of sub-string which contains numerical constants, literal strings, operator symbols, punctuation symbols and control structures such as assignment, conditions and looping.


3. Syntax Analysis


The grouping of tokens into grammatical phrases is called syntax analysis or parsing. The context-free grammar is usually used for describing the structure of language. The BNF (Backus- Naur Form) notation is popular in computer science, used to represent and define grammar of programming language.

A syntax analyzer takes tokens as input and output error message if program syntax is wrong. There are many algorithms for parsing. The most popular types of parsing are top-down parsing and bottom-up parsing.

Parse tree:

  • Syntactic structure of a string according to some formal grammar is called Parse tree.
  • A program that creates parse is called parser.
  • Parser generates parse tree for natural and computer programming languages.


4. Semantic Analysis


The semantic analysis is a process of semantic error detection in source program. Sometimes, grammatically correct statements are not semantically correct. So, compiler is equipped with semantic error checking facilities.



5. Intermediate code generator


The parse tree is resulted by the process of syntax analysis. The intermediate code generator transforms parse tree into an intermediate language which represents source code program. Three-Address Code is a popular type of intermediate languages.


amount:= salary op rent

The amount, salary and rent are operand and op is a binary operator.


6. Code Optimizer


It is used to improve the output of intermediate code generator. It optimizes intermediate codes and produce fast running machines codes. There are two common optimizations: Local optimization and Loop optimization.


7. Code Generator


It is a final phases of compilation in which re-locatable machine codes or assembly codes are produced. The statements “amount:=salary + rent;” can be converted into assembly language.

The assembly language version of statement:

LOAD salary

ADD rent

STORE amount

Assembler converts assembly codes into machine codes for CPU because CPU understands only machine codes, not any high level programming languages.


8. Symbol-table Management


The table of identifiers and their attributes are called symbol table.


9. Error handling


The each phase of compilation contains some errors. These errors are collected and noticed at the time of compilation by error handling phases.

Related posts:

  1. What is kernel? Kernel: All the operating involving processes are controlled by a...
  2. Programming Language Language is a medium of communication. There are several languages...
  3. Computer Software Generally, software is classified into two major groups (a) System...
  4. Programming Concepts Step of Programming   The programming is the coding instruction...
  5. BIOS The BIOS (Basic Input Output System) can be defined as...