

### **SNS COLLEGE OF TECHNOLOGY An Autonomous Institution Coimbatore-35**

Accredited by NBA – AICTE and Accredited by NAAC – UGC with 'A+' Grade Approved by AICTE, New Delhi & Affiliated to Anna University, Chennai

# **DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING 19ECB212 – DIGITAL SIGNAL PROCESSING**

#### II YEAR/ IV SEMESTER

### **UNIT 4 – FINITE WORD LENGTH EFFECTS**

### **TOPIC – INTRODUCTION TO DSP PROCESSOR**

INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT

6-Jun-24





#### INTRODUCTION

Signal Processors (DSPs) are Digital In characteristics:

a) Real-time digital signal processing capabilities. DSPs typically have to process data in real time, i.e., the correctness of the operation depends heavily on the time when the data processing is completed b) High throughput. DSPs can sustain processing of high-speed streaming data, such as audio and multimedia data processing c) Deterministic operation. The execution time of DSP programs can be foreseen accurately, thus guaranteeing a repeatable, desired performance



#### microprocessors with the



#### **INTRODUCTION**

d) Re-programmability by software. Different system behavior might be obtained by re-coding the algorithm executed by the DSP instead of by hardware modifications

• DSPs appeared on the market in the early 1980s. Over the last 15 years they have been the key enabling technology for many electronics products in fields such as commn systems, multimedia, automotive, instrumentation & military • The DSP implements the audio and encode functions. Additional tasks carried out are file management, user interface control, and post-processing algorithms such as equalization and bass management





#### SPECIFIC & PPLIC & TIONS OF DSP

| h             |               |                            |
|---------------|---------------|----------------------------|
| Field         |               | Appl                       |
| Communication | Broadband     | Video conferencing / pho   |
|               |               | Voice / multimedia over    |
|               |               | Digital media gateways (   |
|               | Wireless      | Satellite phone            |
|               |               | Base station               |
| Consumer      | Security      | Biometrics                 |
|               |               | Video surveillance         |
|               | Entertainment | Digital still /video camer |
|               |               | Digital radio              |
|               |               | Portable media player / e  |
|               | Toys          | Interactive toys           |
|               |               | Video game console         |
|               |               |                            |

INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT



| ication              |
|----------------------|
| ne                   |
| [P                   |
| VOD)                 |
|                      |
|                      |
|                      |
|                      |
| a                    |
|                      |
| ntertainment console |
|                      |



INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT





### EVOLUTION OF DSP FEATURES FROM THEIR EARLY DAYS UNTIL NOW







### MAIN REQUIREMENTS AND HARDWARE IMPLEMENT&TIONS FOR RE&L-TIME DSP

| Processing requirements | Hardware implementation                         |
|-------------------------|-------------------------------------------------|
|                         | • High-bandwidth memory are                     |
| Fast data access        | Specialized addressing mode                     |
|                         | Direct Memory Access (DM                        |
|                         | • MAC-centred                                   |
| Fast computation        | • Pipelining                                    |
|                         | Parallel architectures (VLIW                    |
| Numerical fidelity      | • Wide accumulator registers,                   |
| Fast execution control  | <ul> <li>Hardware-assisted, zero-ove</li> </ul> |

INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT





### ns satisfying the requirement

#### chitectures

es

[A)

#### /, SIMD)

guard bits, etc.

erhead loops, shadow registers, etc.



### **VON NEUMANN ARCHITECTURE**



INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT







### HARVARD ARCHITECTURE





INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT







### SUPER HARVARD ARCHITECTURE



INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT







#### VON NEUMANN & HARVARD ARCHITECTURE

- Von Neumann Architecture consists of a single block of memory, containing both data and program instructions, and of a single bus (called data bus) to transfer data and instructions from/to the CPU
- In the Harvard architecture, there are separate memories for data and program instructions, and two separate buses connect them to the DSP core. This allows fetching program instructions and data at the same time, thus providing better performance at the price of an increased hardware complexity and cost
- The Harvard architecture can be improved by adding to the DSP core a small bank of fast memory, called 'instruction cache', and allowing data to be stored in the program memory





#### CACHE ARCHITECTURE

- The last-executed program instructions are relocated at run time in the instruction cache. The instructions are copied to the instruction cache the first time the DSP executes the loop.
- Cache architecture for TI TMS320C67xx DSP, including both program and data cache. There are two levels of cache, called Level 1 (L1) and Level 2 (L2). The L1 cache comprises 8 kbyte of memory divided into 4 kbyte of program cache and 4 kbyte of data cache. The L2 cache comprises 256 kbyte of memory divided into 192 kbyte mapped-SRAM memory and 64 kbyte dual cache memory







### TI DSP TMS320C67XX F&MILY TWO-LEVEL CACHE ARCHITECTURE



INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT





#### DSP HIERARCHICAL MEMORY ARCHITECTURE

| Access<br>[ns] | Hardware<br>implementation | <b>Size</b><br>[Byte] | regis           |
|----------------|----------------------------|-----------------------|-----------------|
| 1              | ~5 transistor /cell        | 16K-32K               | L1 ca           |
| 5-10           | ~2 transistor /cell        | 512K-4M               | L2 ca           |
| 10-50          |                            | /L                    | .3 cache / exte |

INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT









#### DSP HIERARCHICAL MEMORY ARCHITECTURE

- Hierarchical memory allows one to take advantage of both the speed and the capacity of different memory types. Registers are banks of very fast internal memory, typically with single-cycle access time
- The L1 cache is typically high-speed static RAM made of five or six transistors. The amount of L1 cache available thus depends directly on the available chip space. A L2 cache needs typically a smaller number of transistors hence can be present in higher quantities inside the DSPs
- A missing cache hit happens when the data or the instructions needed by the DSP are not stored in cache memory, hence they have to be fetched from a slower memory with an execution speed penalty







#### PROGRAM SEQUENCER AND ADDRESS GENERATOR UNITS LOCATION WITHIN & GENERIC DSP CORE ARCHITECTURE



INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT





#### DM& CONTROLLER



INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT





#### DM& CONTROLLER

The DMA controller is a second processor working in parallel with the DSP core and dedicated to transferring information between two memory areas or between peripherals and memory

• A DMA coprocessor can transfer data as well as program instructions, the latter transfer corresponding typically to the case of code overlay, i.e., of code stored in an external memory and moved to an internal memory (for instance L1) when needed Multiple and independent DMA channels are also available for greater flexibility

Bus arbitration between the DMA and the DSP core is needed to avoid colliding memory accesses when the DMA and the DSP core share the same bus to access peripherals and/or memories

6-Jun-24





READ-PROCESS-WRITE DATA WHEN THE DSP CORE ONLY IS PRESENT; (B) SAME ACTIVITY WHEN THE DMA TAKES CARE OF DATA TRANSFERS

Read external Process Write external memory data data memory data

| Setup | Do something                 | Process | Setup | Do so |
|-------|------------------------------|---------|-------|-------|
| DMA   | else                         | data    | DMA   | e     |
|       | Move external<br>memory data |         |       |       |







#### DMA TRANSFER CONFIGURATIONS. (A): CHAINED DMA TRANSFER; (B): MULTIDIMENSIONAL DATA TRANSFER



INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT





#### **DSP PROCESSING ARITHMETIC BLOCKS**

The basic DSP arithmetic processing blocks are a) many registers; b) one or more multipliers; c) one or more Arithmetic Logic Units (ALUs); d) one or more shifters. a) Registers: these are banks of very fast memory used to store intermediate data processing. Very often they are wider than the DSP normal word width, so as to provide a higher resolution during the processing

**b)** Multiplier: it can carry out single-cycle multiplications and very often it includes very wide accumulator registers to reduce round-off or truncation errors **c) ALU:** it carries out arithmetic and logical operations. **d)** Shifters: it shifts the input value by one or more bits, left or right. The shifter is called a barrel shifter and is useful in the implementation of floating point add and subtract operations

6-Jun-24





#### DSP PROCESSING & RITHMETIC BLOCKS



6-Jun-24

INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT







#### PIPELINING ST&GES

| Basic pipelining stages | A             |
|-------------------------|---------------|
|                         | Generate pro  |
| Fetch                   | Read op-cod   |
|                         | Route op-cod  |
| Decode                  | Decode instr  |
|                         | Read operance |
|                         | Execute instr |
| Execute                 | Write results |



# ction

# gram fetch address

#### e

- de to functional unit uction
- ds
- ruction
- back to registers



#### INSTRUCTION EXECUTION AND PROCESSING TIME GAIN OF A PIPELINED CPU WITH RESPECT TO A NON-PIPELINED ONE



INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT





### VLIW ARCHITECTURE

- VLIW architectures are based upon instruction level parallelism, i.e., many instructions are issued at the same time and are executed in parallel by multiple execution units. As a consequence, DSPs based on this architecture are also called 'multi-issue' DSP
- VLIW architecture: eight, 32-bit instructions are packed together in a 256-bit wide instruction which is fed to eight separate execution units. Characteristics of VLIW architectures include simple and regular instruction sets. Instruction scheduling is done at compile-time and not at run-time so as to guarantee a deterministic behavior

6-Jun-24





#### TI TMS320C6XXX F&MILY VLIW ARCHITECTURE



INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT



#### 8 instructions

#### Legend L: ALU S: Shifter, ALU M: Multiplier **D: Address generator**



#### SIMD & RCHITECTURE

- SIMD architectures are based on data-level parallelism, i.e., only one instruction is issued at a time but the same operation specified by the instruction is performed on multiple data sets
- DSP based upon the SIMD architecture: two 32-bit input registers provide four, 16bit each, data inputs. They are processed in parallel by two separate execution units that carry out the same operation. The two, 16-bit data outputs are packed into a 32-bit register
- Typical SIMD architecture can support multiple data width and is most effective on algorithms that require the processing of large data chunks. The SIMD operation mode can be switched ON or OFF, for instance in the ADI SHARC DSP





#### SIMPLIFIED SCHEMATICS FOR ADI SHARC DSP -SIMD & RCHITECTURE



6-Jun-24

INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT





#### 32-bit input registers: 4 data inputs

#### Same operation

#### 32-bit output register: 2 results



#### ASSESSMENT

- What are the characteristics of DSP?
- List the main requirements and hardware implementations of real time DSP. 2.
- ------ Architecture consists of a single block of memory, containing both data 3. and program instructions
- What is meant by Harvard Architecture? 4.
- 5.
- 6. Define VLIW Architecture.
- 7. List the basic pipelining stages.

6-Jun-24





# THANK YOU

6-Jun-24

INTRODUCTION TO DSP PROCESSSOR/19ECB212 – DIGITAL SIGNAL PROCESSING/J.PRABAKARAN/ECE/SNSCT



